Skip to content

Conversation

@AsyaPronina
Copy link
Contributor

@AsyaPronina AsyaPronina commented Oct 15, 2025

Details:

  • Added Phi3SlidingMask pass that replaces formula :
1. Range(0, atten_mask_len) <= (Range(past_kv_len, atten_mask_len).T - sliding_window)
2. Range(0, atten_mask_len) > (Range(past_kv_len, atten_mask_len).T - sliding_window)
3. Result = 1 | 2

to

 1. Range(0, atten_mask_len) <= (position_ids.T - sliding_window)
 2. Range(0, atten_mask_len) > (Range(past_kv_len, atten_mask_len).T - sliding_window)
 3. Intermediate_result_1 = 1 | 2
 4. Range(0, atten_mask_len) <= (Range(past_kv_len, atten_mask_len).T - sliding_window)
 5. Range(0, atten_mask_len) >= past_kv_len
 6.  Intermediate_result_2 = 4 & 5
 7. Result = Intermediate_result_1 | Intermediate_result_2
 8. To clean from attention to padding tokens: Result |= !(attention_mask[past_kv_len, atten_mask_len]).T

to create correct final sliding window mask for left-padded present tokens and right-padded past ones

Tickets:

@AsyaPronina AsyaPronina requested review from a team as code owners October 15, 2025 17:40
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 15, 2025
@AsyaPronina AsyaPronina force-pushed the npuw_fix_phi3_sliding_mask branch from 2a1bacc to 62a8876 Compare October 15, 2025 17:48
@dmatveev dmatveev added this to the 2025.4 milestone Oct 15, 2025
@dmatveev dmatveev self-assigned this Oct 15, 2025
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to minimize the diff

public:
OPENVINO_MATCHER_PASS_RTTI("npuw::LLMCompiledModel::Phi3SlidingMask");

explicit Phi3SlidingMask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems in this particular case explicit is not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Comment on lines 480 to 405
// Searching for Phi3 sliding mask pattern to extend it to work with right-padded
// past tokens and left-padded present tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Searching for Phi3 sliding mask pattern to extend it to work with right-padded
// past tokens and left-padded present tokens.
// Search for the Phi3 sliding mask pattern to extend it to work with right-padded
// past tokens and left-padded present tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Comment on lines +495 to +420
// past tokens = 3
// present tokens = 5 (starting with current_pos_id = 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, how can we get 5 present tokens with 3 past tokens?

Copy link
Contributor Author

@AsyaPronina AsyaPronina Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done in speculative decoding case, for example
However, 3 and 5 case was described as general simple case for dynamic shapes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you introduced a new pass, why there are changes to the existing code? Have you also moved stuff around?
Since the changes are quite complex nowadays, I'd avoid the unnecessary diffs to focus only on the relevant ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@dmatveev dmatveev changed the title [NPUW] Patch Phi-3 Sliding window mask to work with different paddings of past and present tokens [NPUW] Phi-3 2K accuracy fix Oct 17, 2025
@AsyaPronina AsyaPronina force-pushed the npuw_fix_phi3_sliding_mask branch 3 times, most recently from f7e27e9 to 4fed7fb Compare October 20, 2025 09:01

LOG_DEBUG("Try patch Phi-3 sliding window mask, if it exists.");
patch_phi3_sliding_mask(kvcache_model);
ov::save_model(kvcache_model, "swa_patched.xml");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@AsyaPronina AsyaPronina force-pushed the npuw_fix_phi3_sliding_mask branch 4 times, most recently from 6fadef6 to 8b1a295 Compare October 21, 2025 14:17
@AsyaPronina AsyaPronina force-pushed the npuw_fix_phi3_sliding_mask branch from c4bb342 to b142dc0 Compare October 21, 2025 15:26
@dmatveev dmatveev added this pull request to the merge queue Oct 22, 2025
Merged via the queue into openvinotoolkit:master with commit 31ba12a Oct 22, 2025
182 checks passed
@dmatveev dmatveev deleted the npuw_fix_phi3_sliding_mask branch October 22, 2025 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants