GitHub Trends

#cplusplus #artificial_intelligence #computer_vision #document #document_analysis #document_intelligence #document_recognition #document_understanding #documentai #end_to_end_ocr #multimodal #multimodal_deep_learning #ocr #scene_text_detection #scene_text_detection_recognition #scene_text_recognition #text_detection #text_recognition #vision_language #vision_language_model #vision_language_transformer

https://github.com/AlibabaResearch/AdvancedLiterateMachinery

GitHub

GitHub - AlibabaResearch/AdvancedLiterateMachinery: A collection of original, innovative ideas and algorithms towards Advanced…

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group. ...

👍1

1.6K views11:56

GitHub Trends

#python #foundation_models #vision_language_model #vision_language_pretraining

DeepSeek-VL is a powerful, open-source Vision-Language (VL) Model that helps you understand and interact with both images and text. It can process various types of data like logical diagrams, web pages, scientific literature, and natural images. You can use it for different applications, such as describing images, recognizing formulas, and more. The model is available in different sizes and variants, making it flexible for various needs. You can download and use the models freely, even for commercial purposes, under the specified licenses. This tool makes it easier to integrate vision and language understanding into your projects.

https://github.com/deepseek-ai/DeepSeek-VL

GitHub

GitHub - deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek-VL: Towards Real-World Vision-Language Understanding - deepseek-ai/DeepSeek-VL

👍1

396 views13:30

GitHub Trends

#python #agent #context_engineering #electron #embedding_models #memory #proactive_ai #python #python3 #rag #react #vector_database #vision_language_model

MineContext is a special AI tool that helps you work more efficiently. It collects information from your computer screen and other sources, then uses this data to give you useful insights, summaries, and reminders. This helps you stay organized and focused on important tasks. MineContext is also very private because it stores all your data on your local device, not in the cloud. It's like having a personal assistant that helps you manage your digital life better.

https://github.com/volcengine/MineContext

GitHub

GitHub - volcengine/MineContext: MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse）

MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse） - volcengine/MineContext

482 views13:30

GitHub Trends

#python #apple_silicon #florence2 #idefics #llava #llm #local_ai #mlx #molmo #paligemma #pixtral #vision_framework #vision_language_model #vision_transformer

MLX-VLM lets you run, chat with, and fine-tune Vision Language Models (VLMs) plus audio/video models on your Mac using MLX—install easily with `pip install -U mlx-vlm`. Use CLI for quick text/image/audio generation (e.g., `mlx_vlm.generate --model ... --image photo.jpg`), Gradio UI for chats, Python scripts, or a FastAPI server with OpenAI-compatible endpoints supporting multi-images/videos. Features like TurboQuant cut KV cache memory by 76%, and LoRA/QLoRA fine-tuning works on consumer hardware. You benefit by experimenting with powerful multimodal AI locally—fast, memory-efficient, no cloud costs, perfect for Mac users tweaking models affordably.

https://github.com/Blaizzy/mlx-vlm

GitHub

GitHub - Blaizzy/mlx-vlm: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using…

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX. - Blaizzy/mlx-vlm

737 views11:30

About

Blog

Apps

Platform