Code Stars

YangLing0818/RPG-DiffusionMaster
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Language:Python
Total stars: 315
Stars trend:

23 Jan 2024
11pm ▊ +6
24 Jan 2024
12am ▋ +5
 1am █▎ +10
 2am █ +8
 3am ██▋ +21
 4am █▏ +9
 5am ██▎ +18
 6am ██▋ +21
 7am █▊ +14
 8am █▉ +15
 9am █▋ +13
10am █▋ +13

#python
#imageediting, #largelanguagemodels, #multimodallargelanguagemodels, #texttoimagediffusion

223 views11:21

Code Stars

BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Language:Python
Total stars: 76
Stars trend:

2 Jun 2024
 5pm ▏ +2
 6pm  +0
 7pm  +1
 8pm  +0
 9pm  +0
10pm  +0
11pm  +0
3 Jun 2024
12am  +1
 1am  +1
 2am █▎ +16
 3am ██▋ +33
 4am ██████████████████▊ +231

#python
#largelanguagemodels, #largevisionlanguagemodels, #mme, #multimodallargelanguagemodels, #video, #videomme

119 views05:17

Code Stars

cambrian-mllm/cambrian

Language:Python
Total stars: 85
Stars trend:

25 Jun 2024
 2am █ +8
 3am ██▉ +23
 4am █▌ +12
 5am █▏ +9
 6am ▍ +3
 7am ▉ +7
 8am ▉ +7
 9am ▉ +7
10am ▍ +3

#python
#chatbot, #clip, #computervision, #dino, #instructiontuning, #largelanguagemodels, #llms, #mllm, #multimodallargelanguagemodels, #representationlearning

108 views11:18

Code Stars

ictnlp/LLaMA-Omni
Low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct.
Language:Python
Total stars: 109
Stars trend:

11 Sep 2024
 4am ▍ +3
 5am  +0
 6am █ +8
 7am █▎ +10
 8am █▎ +10
 9am ▌ +4
10am ▊ +6
11am ▍ +3
12pm █▏ +9
 1pm ▋ +5
 2pm █▏ +9
 3pm █▏ +9

#python
#largelanguagemodels, #multimodallargelanguagemodels, #speechinteraction, #speechlanguagemodel, #speechtospeech, #speechtotext

155 views16:19

Code Stars

VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Language:Python
Total stars: 1191
Stars trend:

6 Jan 2025
 2am ▉ +7
 3am ▉ +7
 4am ▋ +5
 5am ▉ +7
 6am █▌ +12
 7am █▉ +15
 8am █▋ +13
 9am █▉ +15

#python
#largemultimodalmodels, #multimodallargelanguagemodels

123 views10:18

Code Stars

joanrod/star-vector
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.
Language:Python
Total stars: 2854
Stars trend:

30 Mar 2025
 2pm ▎ +2
 3pm  +0
 4pm ▏ +1
 5pm ▎ +2
 6pm ▎ +2
 7pm ▏ +1
 8pm ▎ +2
 9pm ▌ +4
10pm █▍ +11
11pm ██▋ +21
31 Mar 2025
12am ██▌ +20
 1am ██▋ +21

#python
#llm, #multimodallargelanguagemodels, #svg, #vlm

85 views02:18

Code Stars

ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language:Jupyter Notebook
Total stars: 150
Stars trend:

12 May 2025
 7am ██▋ +21
 8am ██▍ +19
 9am ██▌ +20
10am █▎ +10
11am █▉ +15
12pm ███ +24

#jupyternotebook
#cookbook, #largelanguagemodel, #multimodallargelanguagemodels, #visionlanguagemodel

113 views13:18

About

Blog

Apps

Platform