hpcaitech/SwiftInfer
Efficient AI Inference & Serving
Language: Python
#artificial_intelligence #deep_learning #gpt #inference #llama #llama2 #llm_inference #llm_serving
Stars: 299 Issues: 3 Forks: 14
https://github.com/hpcaitech/SwiftInfer
  
  Efficient AI Inference & Serving
Language: Python
#artificial_intelligence #deep_learning #gpt #inference #llama #llama2 #llm_inference #llm_serving
Stars: 299 Issues: 3 Forks: 14
https://github.com/hpcaitech/SwiftInfer
GitHub
  
  GitHub - hpcaitech/SwiftInfer: Efficient AI Inference & Serving
  Efficient AI Inference & Serving. Contribute to hpcaitech/SwiftInfer development by creating an account on GitHub.
🔥1
  zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
  
  A highly optimized inference acceleration engine for Llama and its variants.
Language: C++
#cpm #cuda #gpt #inference_engine #llama #llm #llm_serving #minicpm #pytorch #qwen
Stars: 192 Issues: 1 Forks: 16
https://github.com/zhihu/ZhiLight
GitHub
  
  GitHub - zhihu/ZhiLight: A highly optimized LLM inference acceleration engine for Llama and its variants.
  A highly optimized LLM inference acceleration engine for Llama and its variants. - zhihu/ZhiLight
👍1
  MoonshotAI/MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
Language: Python
#flash_attention #llm #llm_serving #llm_training #moe #pytorch #transformer
Stars: 521 Issues: 2 Forks: 16
https://github.com/MoonshotAI/MoBA
  
  MoBA: Mixture of Block Attention for Long-Context LLMs
Language: Python
#flash_attention #llm #llm_serving #llm_training #moe #pytorch #transformer
Stars: 521 Issues: 2 Forks: 16
https://github.com/MoonshotAI/MoBA
GitHub
  
  GitHub - MoonshotAI/MoBA: MoBA: Mixture of Block Attention for Long-Context LLMs
  MoBA: Mixture of Block Attention for Long-Context LLMs - MoonshotAI/MoBA
  