davidbau/rewriting
Rewriting a Deep Generative Model, ECCV 2020 (oral). Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered. Change StyleGANv2 to make extravagant eyebrows, or horses wearing hats.
Language: Python
#deep_learning #gans #graphics #hci #machine_learning #research #vision
Stars: 107 Issues: 0 Forks: 10
https://github.com/davidbau/rewriting
  
  Rewriting a Deep Generative Model, ECCV 2020 (oral). Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered. Change StyleGANv2 to make extravagant eyebrows, or horses wearing hats.
Language: Python
#deep_learning #gans #graphics #hci #machine_learning #research #vision
Stars: 107 Issues: 0 Forks: 10
https://github.com/davidbau/rewriting
GitHub
  
  GitHub - davidbau/rewriting: Rewriting a Deep Generative Model, ECCV 2020 (oral).  Interactive tool to directly edit the rules…
  Rewriting a Deep Generative Model, ECCV 2020 (oral).  Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered.  Change StyleGANv2 to make e...
  lucidrains/bottleneck-transformer-pytorch
Implementation of Bottleneck Transformer - Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #deep_learning #image_classification #transformers #vision
Stars: 122 Issues: 1 Forks: 7
https://github.com/lucidrains/bottleneck-transformer-pytorch
  
  Implementation of Bottleneck Transformer - Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #deep_learning #image_classification #transformers #vision
Stars: 122 Issues: 1 Forks: 7
https://github.com/lucidrains/bottleneck-transformer-pytorch
GitHub
  
  GitHub - lucidrains/bottleneck-transformer-pytorch: Implementation of Bottleneck Transformer in Pytorch
  Implementation of Bottleneck Transformer in Pytorch - lucidrains/bottleneck-transformer-pytorch
  zihangJiang/TokenLabeling
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"
Language: Python
#imagenet #transformer #vision
Stars: 110 Issues: 1 Forks: 6
https://github.com/zihangJiang/TokenLabeling
  
  Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"
Language: Python
#imagenet #transformer #vision
Stars: 110 Issues: 1 Forks: 6
https://github.com/zihangJiang/TokenLabeling
GitHub
  
  GitHub - zihangJiang/TokenLabeling: Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
  Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers" - zihangJiang/TokenLabeling
  lucidrains/mlp-mixer-pytorch
An All-MLP solution for Vision, from Google AI
Language: Python
#deep_learning #vision
Stars: 159 Issues: 1 Forks: 8
https://github.com/lucidrains/mlp-mixer-pytorch
  
  An All-MLP solution for Vision, from Google AI
Language: Python
#deep_learning #vision
Stars: 159 Issues: 1 Forks: 8
https://github.com/lucidrains/mlp-mixer-pytorch
GitHub
  
  GitHub - lucidrains/mlp-mixer-pytorch: An All-MLP solution for Vision, from Google AI
  An All-MLP solution for Vision, from Google AI. Contribute to lucidrains/mlp-mixer-pytorch development by creating an account on GitHub.
  rishikksh20/MLP-Mixer-pytorch
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
Language: Python
#computer_vision #transformer #vision #image_classification #mlp_vision
Stars: 101 Issues: 0 Forks: 9
https://github.com/rishikksh20/MLP-Mixer-pytorch
  
  Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
Language: Python
#computer_vision #transformer #vision #image_classification #mlp_vision
Stars: 101 Issues: 0 Forks: 9
https://github.com/rishikksh20/MLP-Mixer-pytorch
GitHub
  
  GitHub - rishikksh20/MLP-Mixer-pytorch: Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
  Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision - rishikksh20/MLP-Mixer-pytorch
  hustvl/YOLOS
You Only Look at One Sequence (https://arxiv.org/abs/2106.00666)
Language: Python
#computer_vision #transformer #object_detection #vision_transformer
Stars: 128 Issues: 0 Forks: 4
https://github.com/hustvl/YOLOS
  
  You Only Look at One Sequence (https://arxiv.org/abs/2106.00666)
Language: Python
#computer_vision #transformer #object_detection #vision_transformer
Stars: 128 Issues: 0 Forks: 4
https://github.com/hustvl/YOLOS
GitHub
  
  GitHub - hustvl/YOLOS: [NeurIPS 2021] You Only Look at One Sequence
  [NeurIPS 2021] You Only Look at One Sequence. Contribute to hustvl/YOLOS development by creating an account on GitHub.
  czczup/ViT-Adapter
Vision Transformer Adapter for Dense Predictions
#adapter #object_detection #semantic_segmentation #vision_transformer
Stars: 89 Issues: 1 Forks: 3
https://github.com/czczup/ViT-Adapter
  
  Vision Transformer Adapter for Dense Predictions
#adapter #object_detection #semantic_segmentation #vision_transformer
Stars: 89 Issues: 1 Forks: 3
https://github.com/czczup/ViT-Adapter
GitHub
  
  GitHub - czczup/ViT-Adapter: [ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
  [ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions - czczup/ViT-Adapter
  OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python
#chinese #computer_vision #multi_modal_learning #nlp #pytorch #vision_and_language_pre_training
Stars: 80 Issues: 0 Forks: 7
https://github.com/OFA-Sys/Chinese-CLIP
  
  Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python
#chinese #computer_vision #multi_modal_learning #nlp #pytorch #vision_and_language_pre_training
Stars: 80 Issues: 0 Forks: 7
https://github.com/OFA-Sys/Chinese-CLIP
GitHub
  
  GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
  Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - OFA-Sys/Chinese-CLIP
👍1🔥1
  NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
  
  The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
GitHub
  
  GitHub - NVlabs/prismer: The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
  The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts". - NVlabs/prismer
🔥3
  open-mmlab/Multimodal-GPT
Multimodal-GPT
Language: Python
#flamingo #gpt #gpt_4 #llama #multimodal #transformer #vision_and_language
Stars: 244 Issues: 1 Forks: 12
https://github.com/open-mmlab/Multimodal-GPT
  
  Multimodal-GPT
Language: Python
#flamingo #gpt #gpt_4 #llama #multimodal #transformer #vision_and_language
Stars: 244 Issues: 1 Forks: 12
https://github.com/open-mmlab/Multimodal-GPT
GitHub
  
  GitHub - open-mmlab/Multimodal-GPT: Multimodal-GPT
  Multimodal-GPT. Contribute to open-mmlab/Multimodal-GPT development by creating an account on GitHub.
👎1
  OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
  
  A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
GitHub
  
  GitHub - OFA-Sys/ONE-PEACE: A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring…
  A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - OFA-Sys/ONE-PEACE
  roboflow/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
Language: Python
#cross_modal #gpt_4 #gpt_4_vision #instance_segmentation #llava #lmm #multimodality #object_detection #prompt_engineering #segment_anything #vision_language_model #visual_prompting
Stars: 367 Issues: 1 Forks: 23
https://github.com/roboflow/multimodal-maestro
  
  Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
Language: Python
#cross_modal #gpt_4 #gpt_4_vision #instance_segmentation #llava #lmm #multimodality #object_detection #prompt_engineering #segment_anything #vision_language_model #visual_prompting
Stars: 367 Issues: 1 Forks: 23
https://github.com/roboflow/multimodal-maestro
GitHub
  
  GitHub - roboflow/maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
  streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL - roboflow/maestro
  aishwaryanr/awesome-generative-ai-guide
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
#awesome #awesome_list #generative_ai #interview_questions #large_language_models #llms #notebook_jupyter #vision_and_language
Stars: 332 Issues: 0 Forks: 57
https://github.com/aishwaryanr/awesome-generative-ai-guide
  
  A one stop repository for generative AI research updates, interview resources, notebooks and much more!
#awesome #awesome_list #generative_ai #interview_questions #large_language_models #llms #notebook_jupyter #vision_and_language
Stars: 332 Issues: 0 Forks: 57
https://github.com/aishwaryanr/awesome-generative-ai-guide
GitHub
  
  GitHub - aishwaryanr/awesome-generative-ai-guide: A one stop repository for generative AI research updates, interview resources…
  A one stop repository for generative AI research updates, interview resources, notebooks and much more! - aishwaryanr/awesome-generative-ai-guide
🔥5👍1
  mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language: Python
#conversation #llama_3_llava #llama_3_vision #llama3 #llama3_llava #llama3_vision #llava #llava_llama3 #llava_phi3 #llm #lmms #phi_3_llava #phi_3_vision #phi3 #phi3_llava #phi3_vision #vision_language
Stars: 297 Issues: 2 Forks: 13
https://github.com/mbzuai-oryx/LLaVA-pp
  
  🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language: Python
#conversation #llama_3_llava #llama_3_vision #llama3 #llama3_llava #llama3_vision #llava #llava_llama3 #llava_phi3 #llm #lmms #phi_3_llava #phi_3_vision #phi3 #phi3_llava #phi3_vision #vision_language
Stars: 297 Issues: 2 Forks: 13
https://github.com/mbzuai-oryx/LLaVA-pp
GitHub
  
  GitHub - mbzuai-oryx/LLaVA-pp: 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
  🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3) - mbzuai-oryx/LLaVA-pp
  ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini
  
  LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python
#efficient #gpt4o #gpt4v #large_language_models #large_multimodal_models #llama #llava #multimodal #multimodal_large_language_models #video #vision #vision_language_model #visual_instruction_tuning
Stars: 173 Issues: 7 Forks: 11
https://github.com/ictnlp/LLaVA-Mini
GitHub
  
  GitHub - ictnlp/LLaVA-Mini: LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images,…
  LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.  - GitHub - ictnlp/LLaVA-Mini: LLaVA-Mi...
  bytedance/UI-TARS-desktop
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
Language: TypeScript
#agent #browser_use #computer_use #electron #gui_agents #vision #vite #vlm
Stars: 505 Issues: 8 Forks: 35
https://github.com/bytedance/UI-TARS-desktop
  
  A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
Language: TypeScript
#agent #browser_use #computer_use #electron #gui_agents #vision #vite #vlm
Stars: 505 Issues: 8 Forks: 35
https://github.com/bytedance/UI-TARS-desktop
GitHub
  
  GitHub - bytedance/UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
  The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra - bytedance/UI-TARS-desktop
❤1
  