EleutherAI/DALLE-mtf
Open-AI's DALL-E for large scale training in mesh-tensorflow.
Language: Python
#artificial_intelligence #autoregressive #multimodal #text_to_image #transformers #variational_autoencoder
Stars: 106 Issues: 2 Forks: 11
https://github.com/EleutherAI/DALLE-mtf
  
  Open-AI's DALL-E for large scale training in mesh-tensorflow.
Language: Python
#artificial_intelligence #autoregressive #multimodal #text_to_image #transformers #variational_autoencoder
Stars: 106 Issues: 2 Forks: 11
https://github.com/EleutherAI/DALLE-mtf
GitHub
  
  GitHub - EleutherAI/DALLE-mtf: Open-AI's DALL-E for large scale training in mesh-tensorflow.
  Open-AI's DALL-E for large scale training in mesh-tensorflow. - EleutherAI/DALLE-mtf
  lucidrains/CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #contrastive_learning #deep_learning #image_to_text #multimodal #transformers
Stars: 90 Issues: 0 Forks: 2
https://github.com/lucidrains/CoCa-pytorch
  
  Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #contrastive_learning #deep_learning #image_to_text #multimodal #transformers
Stars: 90 Issues: 0 Forks: 2
https://github.com/lucidrains/CoCa-pytorch
GitHub
  
  GitHub - lucidrains/CoCa-pytorch: Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
  Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch - lucidrains/CoCa-pytorch
👍1
  jina-ai/discoart
Create Disco Diffusion artworks in one line
Language: Python
#creative_ai #cross_modal #dalle #diffusion #disco_diffusion #generative_art #multimodal #prompts
Stars: 213 Issues: 2 Forks: 11
https://github.com/jina-ai/discoart
  
  Create Disco Diffusion artworks in one line
Language: Python
#creative_ai #cross_modal #dalle #diffusion #disco_diffusion #generative_art #multimodal #prompts
Stars: 213 Issues: 2 Forks: 11
https://github.com/jina-ai/discoart
GitHub
  
  GitHub - jina-ai/discoart: 🪩 Create Disco Diffusion artworks in one line
  🪩 Create Disco Diffusion artworks in one line. Contribute to jina-ai/discoart development by creating an account on GitHub.
🔥2
  clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Language: Python
#computer_vision #document_ai #eccv_2022 #multimodal_pre_trained_model #nlp #ocr
Stars: 98 Issues: 2 Forks: 5
https://github.com/clovaai/donut
  
  Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Language: Python
#computer_vision #document_ai #eccv_2022 #multimodal_pre_trained_model #nlp #ocr
Stars: 98 Issues: 2 Forks: 5
https://github.com/clovaai/donut
GitHub
  
  GitHub - clovaai/donut: Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator…
  Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022 - clovaai/donut
❤1
  ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX
#academic_publications #awesome_list #multimodal_data #multimodal_deep_learning #multimodal_learning #music_ai #music_information_retrieval #music_research #resources
Stars: 123 Issues: 1 Forks: 7
https://github.com/ilaria-manco/multimodal-ml-music
  
  List of academic resources on Multimodal ML for Music
Language: TeX
#academic_publications #awesome_list #multimodal_data #multimodal_deep_learning #multimodal_learning #music_ai #music_information_retrieval #music_research #resources
Stars: 123 Issues: 1 Forks: 7
https://github.com/ilaria-manco/multimodal-ml-music
GitHub
  
  GitHub - ilaria-manco/multimodal-ml-music: List of academic resources on Multimodal ML for Music
  List of academic resources on Multimodal ML for Music - ilaria-manco/multimodal-ml-music
👍1
  SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Language: Python
#computer_vision #deep_learning #deep_neural_networks #machine_learning #mlops #multimodal #natural_language_processing #nlp #transformers #tutorial
Stars: 323 Issues: 0 Forks: 29
https://github.com/SkalskiP/courses
  
  This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Language: Python
#computer_vision #deep_learning #deep_neural_networks #machine_learning #mlops #multimodal #natural_language_processing #nlp #transformers #tutorial
Stars: 323 Issues: 0 Forks: 29
https://github.com/SkalskiP/courses
GitHub
  
  GitHub - SkalskiP/courses: This repository is a curated collection of links to various courses and resources about Artificial Intelligence…
  This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) - SkalskiP/courses
👍1
  haotian-liu/LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
Language: Python
#chatbot #chatgpt #gpt_4 #llama #llava #multimodal
Stars: 716 Issues: 14 Forks: 34
https://github.com/haotian-liu/LLaVA
  
  Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
Language: Python
#chatbot #chatgpt #gpt_4 #llama #llava #multimodal
Stars: 716 Issues: 14 Forks: 34
https://github.com/haotian-liu/LLaVA
GitHub
  
  GitHub - haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
  [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - haotian-liu/LLaVA
👍4
  open-mmlab/Multimodal-GPT
Multimodal-GPT
Language: Python
#flamingo #gpt #gpt_4 #llama #multimodal #transformer #vision_and_language
Stars: 244 Issues: 1 Forks: 12
https://github.com/open-mmlab/Multimodal-GPT
  
  Multimodal-GPT
Language: Python
#flamingo #gpt #gpt_4 #llama #multimodal #transformer #vision_and_language
Stars: 244 Issues: 1 Forks: 12
https://github.com/open-mmlab/Multimodal-GPT
GitHub
  
  GitHub - open-mmlab/Multimodal-GPT: Multimodal-GPT
  Multimodal-GPT. Contribute to open-mmlab/Multimodal-GPT development by creating an account on GitHub.
👎1
  X-PLUG/mPLUG-Owl
mPLUG-Owl🦉: Modularization Empowers Large Language Models with Multimodality
Language: Python
#alpaca #chatbot #chatgpt #computer_vision #damo #gpt #gpt4 #gpt4_api #huggingface #instruction_tuning #large_language_models #llama #mplug #mplug_owl #multimodal #pretraining #pytorch #transformer #visual_reasoning #visual_recognition
Stars: 209 Issues: 1 Forks: 9
https://github.com/X-PLUG/mPLUG-Owl
  
  mPLUG-Owl🦉: Modularization Empowers Large Language Models with Multimodality
Language: Python
#alpaca #chatbot #chatgpt #computer_vision #damo #gpt #gpt4 #gpt4_api #huggingface #instruction_tuning #large_language_models #llama #mplug #mplug_owl #multimodal #pretraining #pytorch #transformer #visual_reasoning #visual_recognition
Stars: 209 Issues: 1 Forks: 9
https://github.com/X-PLUG/mPLUG-Owl
GitHub
  
  GitHub - X-PLUG/mPLUG-Owl: mPLUG-Owl: The Powerful Multi-modal Large Language Model  Family
  mPLUG-Owl: The Powerful Multi-modal Large Language Model  Family - X-PLUG/mPLUG-Owl
  OpenGVLab/InternChat
InternChat allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
Language: Python
#chatgpt #click #foundation_model #gpt #gpt_4 #gradio #husky #image_captioning #internimage #langchain #llama #llm #multimodal #ocr #sam #segment_anything #vicuna #video #video_generation #vqa
Stars: 231 Issues: 1 Forks: 10
https://github.com/OpenGVLab/InternChat
  
  InternChat allows you to interact with ChatGPT by clicking, dragging and drawing using a pointing device.
Language: Python
#chatgpt #click #foundation_model #gpt #gpt_4 #gradio #husky #image_captioning #internimage #langchain #llama #llm #multimodal #ocr #sam #segment_anything #vicuna #video #video_generation #vqa
Stars: 231 Issues: 1 Forks: 10
https://github.com/OpenGVLab/InternChat
GitHub
  
  GitHub - OpenGVLab/InternGPT: InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now…
  InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin...
  kyegomez/tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Language: Python
#artificial_intelligence #chatgpt #deep_learning #gpt4 #multimodal #prompt #prompt_engineering #prompt_learning #prompt_tuning
Stars: 366 Issues: 7 Forks: 31
https://github.com/kyegomez/tree-of-thoughts
  
  Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Language: Python
#artificial_intelligence #chatgpt #deep_learning #gpt4 #multimodal #prompt #prompt_engineering #prompt_learning #prompt_tuning
Stars: 366 Issues: 7 Forks: 31
https://github.com/kyegomez/tree-of-thoughts
GitHub
  
  GitHub - kyegomez/tree-of-thoughts: Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large…
  Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%  - GitHub - kyegomez/tree-of-thoughts: Plug i...
👍1
  OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
  
  A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
GitHub
  
  GitHub - OFA-Sys/ONE-PEACE: A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring…
  A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - OFA-Sys/ONE-PEACE
  google/break-a-scene
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
Language: Python
#deep_learning #diffusion_models #generative_ai #multimodal #text_to_image
Stars: 164 Issues: 1 Forks: 4
https://github.com/google/break-a-scene
  
  Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
Language: Python
#deep_learning #diffusion_models #generative_ai #multimodal #text_to_image
Stars: 164 Issues: 1 Forks: 4
https://github.com/google/break-a-scene
GitHub
  
  GitHub - google/break-a-scene: Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH…
  Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023] - google/break-a-scene
👍2
  lxe/llavavision
A simple "Be My Eyes" web app with a llama.cpp/llava backend
Language: JavaScript
#ai #artificial_intelligence #computer_vision #llama #llamacpp #llm #local_llm #machine_learning #multimodal #webapp
Stars: 284 Issues: 0 Forks: 7
https://github.com/lxe/llavavision
  
  A simple "Be My Eyes" web app with a llama.cpp/llava backend
Language: JavaScript
#ai #artificial_intelligence #computer_vision #llama #llamacpp #llm #local_llm #machine_learning #multimodal #webapp
Stars: 284 Issues: 0 Forks: 7
https://github.com/lxe/llavavision
GitHub
  
  GitHub - lxe/llavavision: A simple "Be My Eyes" web app with a llama.cpp/llava backend
  A simple "Be My Eyes" web app with a llama.cpp/llava backend - lxe/llavavision
  LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language: Python
#agent #large_language_models #large_multimodal_models #multimodal_large_language_models #tool_use
Stars: 213 Issues: 7 Forks: 13
https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
  
  LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language: Python
#agent #large_language_models #large_multimodal_models #multimodal_large_language_models #tool_use
Stars: 213 Issues: 7 Forks: 13
https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
GitHub
  
  GitHub - LLaVA-VL/LLaVA-Plus-Codebase: LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
  LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills - LLaVA-VL/LLaVA-Plus-Codebase
🥴2
  YangLing0818/RPG-DiffusionMaster
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Language: Python
#image_editing #large_language_models #multimodal_large_language_models #text_to_image_diffusion
Stars: 272 Issues: 5 Forks: 14
https://github.com/YangLing0818/RPG-DiffusionMaster
  
  Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Language: Python
#image_editing #large_language_models #multimodal_large_language_models #text_to_image_diffusion
Stars: 272 Issues: 5 Forks: 14
https://github.com/YangLing0818/RPG-DiffusionMaster
GitHub
  
  GitHub - YangLing0818/RPG-DiffusionMaster: [ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating…
  [ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG) - YangLing0818/RPG-DiffusionMaster
  X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
  
  Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
GitHub
  
  GitHub - X-PLUG/MobileAgent: Mobile-Agent: The Powerful GUI Agent Family
  Mobile-Agent: The Powerful GUI Agent Family. Contribute to X-PLUG/MobileAgent development by creating an account on GitHub.
  