GitHub repos

LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language: Python
#agent #large_language_models #large_multimodal_models #multimodal_large_language_models #tool_use
Stars: 213 Issues: 7 Forks: 13
https://github.com/LLaVA-VL/LLaVA-Plus-Codebase

🥴2

2.19K views17:21

YangLing0818/RPG-DiffusionMaster
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Language: Python
#image_editing #large_language_models #multimodal_large_language_models #text_to_image_diffusion
Stars: 272 Issues: 5 Forks: 14
https://github.com/YangLing0818/RPG-DiffusionMaster

2.32K views05:25

X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent

2.69K views11:26

BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Language: Python
#large_language_models #large_vision_language_models #mme #multimodal_large_language_models #video #video_mme
Stars: 182 Issues: 1 Forks: 6
https://github.com/BradyFU/Video-MME

2.09K views04:00

ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language: Python
#large_language_models #multimodal_large_language_models #speech_interaction #speech_language_model #speech_to_speech #speech_to_text
Stars: 274 Issues: 1 Forks: 16
https://github.com/ictnlp/LLaMA-Omni

2.15K views16:00

ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini

1.89K views23:00

ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language: Jupyter Notebook
#cookbook #large_language_model #multimodal_large_language_models #vision_language_model
Stars: 404 Issues: 0 Forks: 3
https://github.com/ByteDance-Seed/Seed1.5-VL

👍1

1.68K views16:00