lucidrains/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python
#artificial_intelligence #attention #deep_learning #sparse_attention
Stars: 341 Issues: 3 Forks: 9
https://github.com/lucidrains/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Language: Python
#artificial_intelligence #attention #deep_learning #sparse_attention
Stars: 341 Issues: 3 Forks: 9
https://github.com/lucidrains/native-sparse-attention-pytorch
GitHub
GitHub - lucidrains/native-sparse-attention-pytorch: Implementation of the sparse attention pattern proposed by the Deepseek team…
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper - lucidrains/native-sparse-attention-pytorch
👍1
therealoliver/Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
Language: Jupyter Notebook
#attention #attention_mechanism #gpt #inference #kv_cache #language_model #llama #llm_configuration #llms #mask #multi_head_attention #positional_encoding #residuals #rms #rms_norm #rope #rotary_position_encoding #swiglu #tokenizer #transformer
Stars: 388 Issues: 0 Forks: 28
https://github.com/therealoliver/Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
Language: Jupyter Notebook
#attention #attention_mechanism #gpt #inference #kv_cache #language_model #llama #llm_configuration #llms #mask #multi_head_attention #positional_encoding #residuals #rms #rms_norm #rope #rotary_position_encoding #swiglu #tokenizer #transformer
Stars: 388 Issues: 0 Forks: 28
https://github.com/therealoliver/Deepdive-llama3-from-scratch
GitHub
GitHub - therealoliver/Deepdive-llama3-from-scratch: Achieve the llama3 inference step-by-step, grasp the core concepts, master…
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code. - therealoliver/Deepdive-llama3-from-scratch
👍1