#python #glm #image2text #ocr
GLM-OCR is a top 0.9B-parameter model for accurate OCR on complex documents like tables, code, formulas, seals, and receipts, scoring 94.62 on OmniDocBench V1.5. Install via `pip install glmocr`, use cloud API (no GPU needed) or self-host with vLLM/SGLang for fast, low-cost inference, and get JSON/Markdown outputs easily via CLI or Python. You benefit from quick, robust document parsing that saves time, cuts compute costs, and integrates simply into your apps for real-world tasks.
https://github.com/zai-org/GLM-OCR
GLM-OCR is a top 0.9B-parameter model for accurate OCR on complex documents like tables, code, formulas, seals, and receipts, scoring 94.62 on OmniDocBench V1.5. Install via `pip install glmocr`, use cloud API (no GPU needed) or self-host with vLLM/SGLang for fast, low-cost inference, and get JSON/Markdown outputs easily via CLI or Python. You benefit from quick, robust document parsing that saves time, cuts compute costs, and integrates simply into your apps for real-world tasks.
https://github.com/zai-org/GLM-OCR
GitHub
GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive
GLM-OCR: Accurate × Fast × Comprehensive. Contribute to zai-org/GLM-OCR development by creating an account on GitHub.
#python #apple_silicon #florence2 #idefics #llava #llm #local_ai #mlx #molmo #paligemma #pixtral #vision_framework #vision_language_model #vision_transformer
MLX-VLM lets you run, chat with, and fine-tune Vision Language Models (VLMs) plus audio/video models on your Mac using MLX—install easily with `pip install -U mlx-vlm`. Use CLI for quick text/image/audio generation (e.g., `mlx_vlm.generate --model ... --image photo.jpg`), Gradio UI for chats, Python scripts, or a FastAPI server with OpenAI-compatible endpoints supporting multi-images/videos. Features like TurboQuant cut KV cache memory by 76%, and LoRA/QLoRA fine-tuning works on consumer hardware. You benefit by experimenting with powerful multimodal AI locally—fast, memory-efficient, no cloud costs, perfect for Mac users tweaking models affordably.
https://github.com/Blaizzy/mlx-vlm
MLX-VLM lets you run, chat with, and fine-tune Vision Language Models (VLMs) plus audio/video models on your Mac using MLX—install easily with `pip install -U mlx-vlm`. Use CLI for quick text/image/audio generation (e.g., `mlx_vlm.generate --model ... --image photo.jpg`), Gradio UI for chats, Python scripts, or a FastAPI server with OpenAI-compatible endpoints supporting multi-images/videos. Features like TurboQuant cut KV cache memory by 76%, and LoRA/QLoRA fine-tuning works on consumer hardware. You benefit by experimenting with powerful multimodal AI locally—fast, memory-efficient, no cloud costs, perfect for Mac users tweaking models affordably.
https://github.com/Blaizzy/mlx-vlm
GitHub
GitHub - Blaizzy/mlx-vlm: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using…
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX. - Blaizzy/mlx-vlm