Skip to content

Conversation

@l-bat
Copy link
Contributor

@l-bat l-bat commented Nov 13, 2025

Description

CVS-173857

Copilot AI review requested due to automatic review settings November 13, 2025 16:19
@github-actions github-actions bot added the category: GH Pages Docs Github Pages documentation label Nov 13, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive documentation for the XAttention sparse attention algorithm, expanding the previously placeholder "TBA" section with detailed implementation descriptions and visual illustrations.

Key changes:

  • Detailed explanation of XAttention's two-stage importance estimation procedure
  • Addition of visual diagram illustrating the algorithm's operation
  • Configuration parameter references for customizing XAttention behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@l-bat
Copy link
Contributor Author

l-bat commented Nov 14, 2025

@peterchen-intel @ceciliapeng2011 @WeldonWangwang could you please help with reviewing the XAttention documentation?


The prompt processing occurs as usual until at least two KV cache blocks have been completely filled (`t = 0, 1`). Once the block-level importance scores have been computed (`t = 2-4`), only the subset of KV blocks with cumulative attention mass exceeding the `xattention_threshold` are retained for attention computation, effectively introducing sparsity in the attention computation.

Upon reaching the tail of the prompt, the KV cache corresponding to the entire prompt becomes visible again, reverting to dense attention mode (`t = 5`). This transition ensures that the model attends to the complete prompt context before entering the generation stage. Similar to the tri-shape algorithm, the final dense portion of the prefill can be configured using the `SparseAttentionConfig.num_last_dense_tokens_in_prefill` field. Due to the block-wise cache organization and scheduler chunking, the actual number of prompt tokens processed with dense attention may slightly exceed the specified value, potentially extending across a full block or subsequence chunk depending on the hardware configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we have a link here to documentation how to switch on XAttention in OpenVINO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t found any related OpenVINO documentation. As far as I understand, XAttention is enabled via OV GenAI, without any need to switch it on specifically in Runtime. Example of the scheduler config:

cb_config = SchedulerConfig(​​
 use_sparse_attention=True,​
 SparseAttentionConfig(​
  mode=SparseAttentionMode.XATTENTION,​
  xattention_threshold=0.9​
 )​
)​

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. If we have OV GenAI documentation on it, let us refer it. Or just mention somehow this mode value existence that we should speicify it via SchedulerConfig.

Copilot AI review requested due to automatic review settings November 14, 2025 09:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MaximProshin
Copy link
Collaborator

@l-bat . please also update the list of optimization methods in Readme.

Copilot AI review requested due to automatic review settings November 14, 2025 14:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 14, 2025 14:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants