#jupyter_notebook #chinese_llm #chinese_nlp #finetune #generative_ai #instruct_gpt #instruction_set #llama #llm #lora #open_models #open_source #open_source_models #qlora
AirLLM is a tool that lets you run very large AI models on computers with limited memory by using a smart layer-by-layer loading technique instead of traditional compression methods. You can run a 70-billion-parameter model on just 4GB of GPU memory, or even a 405-billion-parameter model on 8GB, without losing model quality. The benefit is that you can use powerful AI models on affordable hardware without expensive upgrades, and the tool also offers optional compression features that can speed up performance by up to 3 times while maintaining accuracy.
https://github.com/lyogavin/airllm
AirLLM is a tool that lets you run very large AI models on computers with limited memory by using a smart layer-by-layer loading technique instead of traditional compression methods. You can run a 70-billion-parameter model on just 4GB of GPU memory, or even a 405-billion-parameter model on 8GB, without losing model quality. The benefit is that you can use powerful AI models on affordable hardware without expensive upgrades, and the tool also offers optional compression features that can speed up performance by up to 3 times while maintaining accuracy.
https://github.com/lyogavin/airllm
GitHub
GitHub - lyogavin/airllm: AirLLM 70B inference with single 4GB GPU
AirLLM 70B inference with single 4GB GPU. Contribute to lyogavin/airllm development by creating an account on GitHub.