bytedance/UI-TARS-desktop
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
Language: TypeScript
#agent #browser_use #computer_use #electron #gui_agents #vision #vite #vlm
Stars: 505 Issues: 8 Forks: 35
https://github.com/bytedance/UI-TARS-desktop
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
Language: TypeScript
#agent #browser_use #computer_use #electron #gui_agents #vision #vite #vlm
Stars: 505 Issues: 8 Forks: 35
https://github.com/bytedance/UI-TARS-desktop
GitHub
GitHub - bytedance/UI-TARS-desktop: The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra. - bytedance/UI-TARS-desktop
THUDM/GLM-4.1V-Thinking
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
Language: Python
#image2text #reasoning #video_understanding #vlm
Stars: 449 Issues: 9 Forks: 8
https://github.com/THUDM/GLM-4.1V-Thinking
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
Language: Python
#image2text #reasoning #video_understanding #vlm
Stars: 449 Issues: 9 Forks: 8
https://github.com/THUDM/GLM-4.1V-Thinking
GitHub
GitHub - THUDM/GLM-4.1V-Thinking: GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning. - THUDM/GLM-4.1V-Thinking