GitHub repos

jiah-cloud/Align3R
[arXiv'24] Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Language: Python
#3d_reconstruction #depth_estimation #point_cloud_reconstruction #pose_estimation #video_depth
Stars: 140 Issues: 3 Forks: 3
https://github.com/jiah-cloud/Align3R

GitHub

GitHub - jiah-cloud/Align3R: [CVPR 2025 Highlight] Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

[CVPR 2025 Highlight] Align3R: Aligned Monocular Depth Estimation for Dynamic Videos - jiah-cloud/Align3R

👍1

1.62K views05:00

GitHub repos

GeekyWizKid/video_processing_service
Video Processing Service is an automated video processing service that supports extracting audio from videos, generating subtitles, and embedding subtitles into the video.
Language: Python
#llm #python #video_processing
Stars: 157 Issues: 0 Forks: 28
https://github.com/GeekyWizKid/video_processing_service

GitHub

GitHub - GeekyWizKid/video_processing_service: Video Processing Service is an automated video processing service that supports…

Video Processing Service is an automated video processing service that supports extracting audio from videos, generating subtitles, and embedding subtitles into the video. - GitHub - GeekyWizKid/v...

👍1

1.82K views23:00

GitHub repos

baaivision/NOVA
NOVA: Autoregressive Video Generation without Vector Quantization
Language: Python
#autoregressive_models #diffusion_models #image_generation #video_generation
Stars: 145 Issues: 1 Forks: 2
https://github.com/baaivision/NOVA

GitHub

GitHub - baaivision/NOVA: [ICLR 2025] Autoregressive Video Generation without Vector Quantization

[ICLR 2025] Autoregressive Video Generation without Vector Quantization - baaivision/NOVA

❤1

1.87K views23:00

GitHub repos

ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini

GitHub

GitHub - ictnlp/LLaVA-Mini: LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images,…

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner. - GitHub - ictnlp/LLaVA-Mini: LLaVA-Mi...

1.89K views23:00

GitHub repos

DepthAnything/Video-Depth-Anything
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Language: Python
#depth_estimation #monocular_depth_estimation #transformer #video_depth
Stars: 234 Issues: 2 Forks: 8
https://github.com/DepthAnything/Video-Depth-Anything

GitHub

GitHub - DepthAnything/Video-Depth-Anything: [CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super…

[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos - DepthAnything/Video-Depth-Anything

1.77K views11:00

GitHub repos

umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, realtime-OCR, translation, word lookup, and more!
Language: C#
#asr #csharp #flyleaf #language_learning #media_player #ocr #player #tesseract #video #video_player #whisper #wpf #yt_dlp
Stars: 253 Issues: 5 Forks: 4
https://github.com/umlx5h/LLPlayer

GitHub

GitHub - umlx5h/LLPlayer: The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation…

The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more! - umlx5h/LLPlayer

❤2👍1

1.89K views23:00

GitHub repos

FoundationVision/FlashVideo
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Language: Python
#efficient_generative_model #text_to_video #video_generation
Stars: 195 Issues: 5 Forks: 3
https://github.com/FoundationVision/FlashVideo

GitHub

GitHub - FoundationVision/FlashVideo: FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation - FoundationVision/FlashVideo

1.64K views17:00

GitHub repos

SkyworkAI/SkyReels-V1
SkyReels V1: the first and most advanced open-source human-centric video foundation model
Language: Python
#i2v #t2v #video_diffusion_transformers
Stars: 348 Issues: 5 Forks: 20
https://github.com/SkyworkAI/SkyReels-V1

GitHub

GitHub - SkyworkAI/SkyReels-V1: SkyReels V1: The first and most advanced open-source human-centric video foundation model

SkyReels V1: The first and most advanced open-source human-centric video foundation model - SkyworkAI/SkyReels-V1

1.72K views17:00

GitHub repos

liuff19/Video-T1
Official Implementation of Video-T1: Test-Time Scaling for Video Generation
Language: Python
#aigc #chain_of_thought #test_time_scaling #video #video_generation
Stars: 187 Issues: 2 Forks: 12
https://github.com/liuff19/Video-T1

GitHub

GitHub - liuff19/Video-T1: [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation

[ICCV 2025] Video-T1: Test-Time Scaling for Video Generation - liuff19/Video-T1

👍1

1.75K views22:00

GitHub repos

TencentARC/GeometryCrafter
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Language: Python
#depth_estimation #video_to_4d
Stars: 173 Issues: 0 Forks: 3
https://github.com/TencentARC/GeometryCrafter

GitHub

GitHub - TencentARC/GeometryCrafter: [ICCV 2025] GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion…

[ICCV 2025] GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors - TencentARC/GeometryCrafter

1.72K views04:00

GitHub repos

hanyang-21/VideoScene
[CVPR 2025] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Language: Python
#3d_reconstruction #video #video_generation
Stars: 154 Issues: 4 Forks: 3
https://github.com/hanyang-21/VideoScene

GitHub

GitHub - hanyang-21/VideoScene: [CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One…

[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step - hanyang-21/VideoScene

❤2

1.69K views22:00

GitHub repos

ali-vilab/UniAnimate-DiT
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
Language: Python
#human_image_animation #video_diffusion_transformers #video_generation
Stars: 225 Issues: 5 Forks: 17
https://github.com/ali-vilab/UniAnimate-DiT

GitHub

GitHub - ali-vilab/UniAnimate-DiT: UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer - ali-vilab/UniAnimate-DiT

❤1

1.63K views04:00

GitHub repos

SandAI-org/MAGI-1
MAGI-1: Autoregressive Video Generation at Scale
Language: Python
#autoregressive #diffusion_models #video_generation
Stars: 911 Issues: 7 Forks: 32
https://github.com/SandAI-org/MAGI-1

GitHub

GitHub - SandAI-org/MAGI-1: MAGI-1: Autoregressive Video Generation at Scale

MAGI-1: Autoregressive Video Generation at Scale. Contribute to SandAI-org/MAGI-1 development by creating an account on GitHub.

👍1

1.78K views16:00

GitHub repos

Tencent/HunyuanCustom
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Language: Python
#audio_driven #diffusion_models #image_to_video #image_to_video_generation #video_editing #video_generation
Stars: 360 Issues: 4 Forks: 14
https://github.com/Tencent/HunyuanCustom

GitHub

GitHub - Tencent-Hunyuan/HunyuanCustom: HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation - Tencent-Hunyuan/HunyuanCustom

❤1

1.72K views16:00

GitHub repos

Olow304/memvid
Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
Language: Python
#ai #context #embedded #faiss #knowledge_base #knowledge_graph #llm #machine_learning #memory #nlp #offline_first #opencv #python #rag #retrieval_augmented_generation #semantic_search #vector_database #video_processing
Stars: 252 Issues: 2 Forks: 25
https://github.com/Olow304/memvid

GitHub

GitHub - Olow304/memvid: Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic…

Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed. - Olow304/memvid

1.62K views16:00

GitHub repos

THUDM/GLM-4.1V-Thinking
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
Language: Python
#image2text #reasoning #video_understanding #vlm
Stars: 449 Issues: 9 Forks: 8
https://github.com/THUDM/GLM-4.1V-Thinking

GitHub

GitHub - zai-org/GLM-V: GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - zai-org/GLM-V

❤1

1.69K views10:00

GitHub repos

liuff19/LangScene-X
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Language: Python
#3d_reconstruction #diffusion #unified_model #video_generation
Stars: 197 Issues: 1 Forks: 12
https://github.com/liuff19/LangScene-X

GitHub

GitHub - liuff19/LangScene-X: [ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video…

[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion - liuff19/LangScene-X

1.62K views04:00

About

Blog

Apps

Platform