CircleRadon/Osprey
The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language: Python
#mllm #pixel_understanding #sam #visual_instruction_tuning
Stars: 200 Issues: 1 Forks: 6
https://github.com/CircleRadon/Osprey
  
  The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language: Python
#mllm #pixel_understanding #sam #visual_instruction_tuning
Stars: 200 Issues: 1 Forks: 6
https://github.com/CircleRadon/Osprey
GitHub
  
  GitHub - CircleRadon/Osprey: [CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
  [CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning" - CircleRadon/Osprey
  X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
  
  Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
GitHub
  
  GitHub - X-PLUG/MobileAgent: Mobile-Agent: The Powerful GUI Agent Family
  Mobile-Agent: The Powerful GUI Agent Family. Contribute to X-PLUG/MobileAgent development by creating an account on GitHub.
  magic-quill/MagicQuill
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Language: Python
#aigc #image_editing #mllm
Stars: 531 Issues: 7 Forks: 32
https://github.com/magic-quill/MagicQuill
  
  Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Language: Python
#aigc #image_editing #mllm
Stars: 531 Issues: 7 Forks: 32
https://github.com/magic-quill/MagicQuill
GitHub
  
  GitHub - ant-research/MagicQuill: [CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing…
  [CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System - ant-research/MagicQuill
  SkyworkAI/Skywork-R1V
Pioneering Multimodal Reasoning with CoT
Language: Python
#deepseek_r1 #llm #mllm
Stars: 387 Issues: 5 Forks: 19
https://github.com/SkyworkAI/Skywork-R1V
  
  Pioneering Multimodal Reasoning with CoT
Language: Python
#deepseek_r1 #llm #mllm
Stars: 387 Issues: 5 Forks: 19
https://github.com/SkyworkAI/Skywork-R1V
GitHub
  
  GitHub - SkyworkAI/Skywork-R1V: Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing…
  Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning. - SkyworkAI/Skywork-R1V
  manycore-research/SpatialLM
SpatialLM: Large Language Model for Spatial Understanding
Language: Python
#mllm #point_clouds #scene_understanding #spatial_intelligence
Stars: 643 Issues: 2 Forks: 33
https://github.com/manycore-research/SpatialLM
  
  SpatialLM: Large Language Model for Spatial Understanding
Language: Python
#mllm #point_clouds #scene_understanding #spatial_intelligence
Stars: 643 Issues: 2 Forks: 33
https://github.com/manycore-research/SpatialLM
GitHub
  
  GitHub - manycore-research/SpatialLM: [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
  [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling - manycore-research/SpatialLM
👍1
  