GitHub repos – Telegram

GitHub repos

26.3K subscribers

18 photos

2 videos

11.8K links

Welcome to GitHub repos. Here you'll find valuable information on the latest trending projects. Subscribe to stay informed and gain insights from the thriving GitHub community.

Download Telegram

About

Blog

Apps

Platform

26.3K subscribers

NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer

GitHub - NVlabs/prismer: The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts". - NVlabs/prismer

🔥3

3.55K views23:07

roboflow/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
Language: Python
#cross_modal #gpt_4 #gpt_4_vision #instance_segmentation #llava #lmm #multimodality #object_detection #prompt_engineering #segment_anything #vision_language_model #visual_prompting
Stars: 367 Issues: 1 Forks: 23
https://github.com/roboflow/multimodal-maestro

GitHub - roboflow/maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL - roboflow/maestro

2.43K views11:22

ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini

GitHub - ictnlp/LLaVA-Mini: LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images,…

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner. - GitHub - ictnlp/LLaVA-Mini: LLaVA-Mi...

1.91K views23:00

ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language: Jupyter Notebook
#cookbook #large_language_model #multimodal_large_language_models #vision_language_model
Stars: 404 Issues: 0 Forks: 3
https://github.com/ByteDance-Seed/Seed1.5-VL

GitHub - ByteDance-Seed/Seed1.5-VL: Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal…

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks. ...

👍1

1.71K views16:00