Skip to content

Commit 17dd566

Browse files
authored
[README] Update model list
1 parent be882f3 commit 17dd566

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ This repo aims at providing a collection of efficient Triton-based implementatio
3232

3333
- **$\texttt{[2025-07]}$:** 🐳 Add MLA implementation to `fla` ([paper](https://arxiv.org/abs/2405.04434)).
3434
- **$\texttt{[2025-07]}$:** 🛣️ Added PaTH Attention to fla ([paper](https://arxiv.org/abs/2505.16381)).
35+
- **$\texttt{[2025-06]}$:** 🎉 Added MesaNet to fla ([paper](https://arxiv.org/abs/2506.05233)).
3536
- **$\texttt{[2025-06]}$:** 🐍 Add Comba implementation to `fla` ([paper](https://arxiv.org/abs/2506.02475)).
3637
- **$\texttt{[2025-05]}$:** 🎉 Add Rodimus* implementation to `fla` ([paper](https://arxiv.org/abs/2410.06577)).
3738
- **$\texttt{[2025-04]}$:** 🎉 Add DeltaProduct implementation to `fla` ([paper](https://arxiv.org/abs/2502.10297)).
@@ -74,12 +75,12 @@ Roughly sorted according to the timeline supported in `fla`. The recommended tra
7475
| 2025 | ICLR | Gated DeltaNet | [Gated Delta Networks: Improving Mamba2 with Delta Rule](https://arxiv.org/abs/2412.06464) | [official](https://github.com/NVlabs/GatedDeltaNet) | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/ops/gated_delta_rule) |
7576
| 2025 | | RWKV7 | [RWKV-7 "Goose" with Expressive Dynamic State Evolution](https://arxiv.org/abs/2503.14456) | [official](https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v7) | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/ops/rwkv7) |
7677
| 2025 | | NSA | [Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention](https://arxiv.org/abs/2502.11089) | | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/ops/nsa) |
77-
| 2025 | | FoX | [Forgetting Transformer: Softmax Attention with a Forget Gate](https://arxiv.org/abs/2503.02130) | [official](https://github.com/zhixuan-lin/forgetting-transformer) | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/ops/forgetting_attn) |
78+
| 2025 | ICLR | FoX | [Forgetting Transformer: Softmax Attention with a Forget Gate](https://arxiv.org/abs/2503.02130) | [official](https://github.com/zhixuan-lin/forgetting-transformer) | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/ops/forgetting_attn) |
7879
| 2025 | | DeltaProduct | [DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products](https://arxiv.org/abs/2502.10297) | | [fla](https://github.com/fla-org/flash-linear-attention/tree/main/fla/layers/gated_deltaproduct.py) |
7980
| 2025 | ICLR | Rodimus* | [Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions](https://arxiv.org/abs/2410.06577) | [official](https://github.com/codefuse-ai/rodimus) | [fla](https://github.com/fla-org/flash-linear-attention/blob/main/fla/layers/rodimus.py) |
8081
| 2025 | | MesaNet | [MesaNet: Sequence Modeling by Locally Optimal Test-Time Training](https://arxiv.org/abs/2506.05233) | | [fla](https://github.com/fla-org/flash-linear-attention/blob/main/fla/layers/mesa_net.py) |
8182
| 2025 | | Comba | [Comba: Improving Bilinear RNNs with Closed-loop Control](https://arxiv.org/abs/2506.02475) | [official](https://github.com/AwesomeSeq/Comba-triton) | [fla](https://github.com/fla-org/flash-linear-attention/blob/main/fla/layers/comba.py) |
82-
| 2025 | | PaTH | [PaTH Attention: Position Encoding via Accumulating Householder Transformations](https://arxiv.org/abs/2505.16381) | | [fla](https://github.com/fla-org/flash-linear-attention/blob/main/fla/layers/path_attn.py) |
83+
| 2025 | | PaTH | [PaTH Attention: Position Encoding via Accumulating Householder Transformations](https://arxiv.org/abs/2505.16381) | | [fla](https://github.com/fla-org/flash-linear-attention/blob/main/fla/layers/path_attn.py) |
8384

8485
## Installation
8586

0 commit comments

Comments
 (0)