ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini
  
  LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini
GitHub
  
  GitHub - ictnlp/LLaVA-Mini: LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images,…
  LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.  - GitHub - ictnlp/LLaVA-Mini: LLaVA-Mi...
  ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language: Jupyter Notebook
#cookbook #large_language_model #multimodal_large_language_models #vision_language_model
Stars: 404 Issues: 0 Forks: 3
https://github.com/ByteDance-Seed/Seed1.5-VL
  
  Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language: Jupyter Notebook
#cookbook #large_language_model #multimodal_large_language_models #vision_language_model
Stars: 404 Issues: 0 Forks: 3
https://github.com/ByteDance-Seed/Seed1.5-VL
GitHub
  
  GitHub - ByteDance-Seed/Seed1.5-VL: Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal…
  Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks. ...
👍1
  Tencent-Hunyuan/Hunyuan3D-Omni
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
Language: Python
#3d #3d_aigc #3d_generation #hunyuan3d #image_to_3d #multimodal #shape
Stars: 181 Issues: 0 Forks: 10
https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni
  
  Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
Language: Python
#3d #3d_aigc #3d_generation #hunyuan3d #image_to_3d #multimodal #shape
Stars: 181 Issues: 0 Forks: 10
https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni
GitHub
  
  GitHub - Tencent-Hunyuan/Hunyuan3D-Omni: Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
  Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets - Tencent-Hunyuan/Hunyuan3D-Omni
  