thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Python
#attention #inference_acceleration #llm #quantization
Stars: 145 Issues: 6 Forks: 3
https://github.com/thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language: Python
#attention #inference_acceleration #llm #quantization
Stars: 145 Issues: 6 Forks: 3
https://github.com/thu-ml/SageAttention
GitHub
GitHub - thu-ml/SageAttention: Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers,…
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models. - thu-ml/SageAttention
👍3
mit-han-lab/nunchaku
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language: Cuda
#diffusion_models #flux #genai #lora #mlsys #quantization
Stars: 249 Issues: 10 Forks: 13
https://github.com/mit-han-lab/nunchaku
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language: Cuda
#diffusion_models #flux #genai #lora #mlsys #quantization
Stars: 249 Issues: 10 Forks: 13
https://github.com/mit-han-lab/nunchaku
GitHub
GitHub - mit-han-lab/nunchaku: [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - mit-han-lab/nunchaku