🐈⬛
Focusing
"Simple systems work and
complex don’t."
-
Microsoft Research
- Beijing
- https://baotonglu.github.io/
Pinned Loading
-
microsoft/RetrievalAttention
microsoft/RetrievalAttention Public[VLDB 26, NeurIPS 25] Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



