-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[GPU] XAttention as a preview feature #32064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ceciliapeng2011
wants to merge
101
commits into
openvinotoolkit:master
Choose a base branch
from
ceciliapeng2011:cecilia/pa_cm_xattention
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+5,814
−572
Open
Changes from 93 commits
Commits
Show all changes
101 commits
Select commit
Hold shift + click to select a range
e030c80
Init PA CM Impl(1st/2nd token and kvcache update)
riverlijunjie 435a7ac
enabled simple pa unit tests pass
riverlijunjie 8947906
Fix 2nd_token issue
riverlijunjie 83dba29
Fixed pipeline output corruption issue
riverlijunjie 2743aab
Fix 2nd non-16 alignment accuracy issue
riverlijunjie 65b9cc7
Set best partition size for 2nd
riverlijunjie c4a1659
update KV_BLOCK_SIZE to 256
ceciliapeng2011 62a222f
initiate xattention integration
ceciliapeng2011 ac882ab
qwen2.5-1.5b 4k trunk works with xatten.
ceciliapeng2011 0621e4b
4k aligned works.
ceciliapeng2011 98a4ecd
fix block_mask not fully initialized issue.
ceciliapeng2011 5af3330
fix of find_block
ceciliapeng2011 4f9ed28
xatten: fix accuacy problem caused by debug
ceciliapeng2011 d35f4fb
use int32 to store float INV_S to align python version accuracy
luo-cheng2021 4e25a4a
OV_GPU_XATTN_BLOCK_SIZE and OV_GPU_XATTN_THRESH
ceciliapeng2011 c3c87b7
fix building error on windows.
usstq 76685f0
process tail in find_block
ceciliapeng2011 c5bdcf9
Fix f16 accuracy issue and optimize 2nd token to improve 5%
riverlijunjie 95a2da1
fix waring_as_error on CI Windows.
ceciliapeng2011 36bee72
dump block mask with DUMP_XATTN_BLOCK_MASK for debug
ceciliapeng2011 4fa97be
Support kv cache u8 precision
riverlijunjie 55ba7c3
refactor: split into pa_common and sdpa_common, which include attenti…
ceciliapeng2011 a06adef
integrate xattn_post_proc kernel and FP16 kernel works. TODOto verify…
ceciliapeng2011 4b391be
update partition size
riverlijunjie f2f2126
enable int8 kvcache for xatten, but accuracy fails.
ceciliapeng2011 89c8577
fix xattn kvcache u8 accuracy issue.
ceciliapeng2011 024b71a
Fix 2nd accuracy issue
riverlijunjie 033304f
Fix 2nd accuracy issue
ceciliapeng2011 a6e72d0
fix xattn tailing issue: Q_blocks < K_blocks, as K_blocks is aligned …
ceciliapeng2011 f7ddc68
decide pa block size based whether use xattntion
rnwang04 29cdabb
fix bloxk size logic
rnwang04 5048081
fix partition size
rnwang04 0c8c029
fix condition of xattn stages
rnwang04 6fbf07b
Add xAttention reference operation and test
WeldonWangwang 13b1122
Optimize single_token_finalization kernel with fixed unroll
riverlijunjie 24d6b80
Fix win build
peterchen-intel 326fc4d
Fix win build
peterchen-intel 508fab3
Fix win build
peterchen-intel 73669d3
Enable CM PA only in case of XAttention been enabled.
ceciliapeng2011 45bedf3
pass xattention threshold from genai
ceciliapeng2011 b7a9a8b
xattention_block_size unconfigurable
ceciliapeng2011 703dca6
Merge branch 'cecilia/pa_cm_xattention_bridge' into cecilia/pa_cm_xat…
ceciliapeng2011 f9f58be
invalidate sparse atten process if threshold is larger than 1.0.
ceciliapeng2011 f7fa94f
Merge branch 'master' into cecilia/pa_cm_xattention
ceciliapeng2011 3afbdb5
cpplint error fixes
ceciliapeng2011 2c37d0d
Define ENABLE_PA_CM_PATH for build
peterchen-intel cae516a
Fix worning as error issues on windows with VS2022
zhaixuejun1993 010b6e7
Merge pull request #56 from zhaixuejun1993/xuejun/fix-warning-as-error
ceciliapeng2011 808a789
[WA] clean unused kvcache buffer
riverlijunjie 22f0459
Fix format issues
zhaixuejun1993 780f55a
disable XAttention for legacy platforms (XAttention kernels are imple…
ceciliapeng2011 d21c4f6
reset left V cache block rather than 16 rows
riverlijunjie 6b9b4c2
Remove debug code
riverlijunjie eb9765e
revert code change to ocl_v2
ceciliapeng2011 1418daa
cleanup debug code
ceciliapeng2011 21c3193
Limit head_num/kv_head_num not excceed 8
riverlijunjie 8a7a380
streamline block_size head_size in both cases of fp16 and u8/i8 kvcache
ceciliapeng2011 472f774
Remove CM PA tests
zhaixuejun1993 1fdcd3c
refactor: use paged_attention::block_size_xattn instead of hardcode n…
ceciliapeng2011 a62fd1b
worksgit status git status
WeldonWangwang 52aad92
Merge pull request #57 from ceciliapeng2011/river/pa_nan_debug
ceciliapeng2011 3da8a34
Fix the KV cache padding with Nan issue for 1st token.
luweizhou2016 2dd7a81
Fix nan issue for 2nd token
riverlijunjie bbf17ed
Clean code
WeldonWangwang 147063f
Clean code
WeldonWangwang c02fb34
Clean code
WeldonWangwang 314bd71
Add CMXAttentionBlockSelector
WeldonWangwang fdbba78
Clean code
WeldonWangwang 2ade1e1
Clean code
WeldonWangwang f402a14
refactor: check single suquence condition
ceciliapeng2011 342ae59
Avoid 2nd token perf drop due to cleanup unused K cache
riverlijunjie 8e8b74c
fix: if kvcache config is dynamic, which may occurs with a typo error…
ceciliapeng2011 4a82167
Clean code
WeldonWangwang 6f7dd8d
Clean code
WeldonWangwang 326ee44
Add more test cases
WeldonWangwang cfa1f3a
Clean code
WeldonWangwang f795152
Merge pull request #55 from WeldonWangwang/wangwang/add_xattention_tests
WeldonWangwang 35267d3
Fix build errors and code style (#59)
WeldonWangwang 2dfbb19
Fix test cases and skip testing on unsupported platforms (#60)
WeldonWangwang cca1528
bypas xattn when thresh>=1.0 and q_len<STRIDE.
ceciliapeng2011 618e575
throw exception if xattn is not supported by either GPU archieture or…
ceciliapeng2011 b2afd6e
Merge branch 'master' into cecilia/pa_cm_xattention
WeldonWangwang 522a503
add OV_GPU_DUMP_SRC_TENSORS_AFTER_EXEC
ceciliapeng2011 3e527be
code cleanup, unused code
ceciliapeng2011 1e243fc
throw exception for unsupported cases.
ceciliapeng2011 b45062c
fix dump... intermediates tensor may empty.
ceciliapeng2011 50628c5
fix
ceciliapeng2011 1073002
Ww/pa cm xattention 1019 (#61)
WeldonWangwang 5eff824
Ww/pa cm xattention 1020 (#62)
WeldonWangwang d164bba
Merge branch 'master' into cecilia/pa_cm_xattention
WeldonWangwang 853b562
PagedAttentionInternBuffIdx
ceciliapeng2011 0870cbb
refactor xattention kernel impls by reusing RT parameters, instead of…
ceciliapeng2011 c2bde5b
fix clang-format style issues
ceciliapeng2011 554ebf4
merge xattention tests into paged_attention tests (#63)
WeldonWangwang e794f5b
Fix build error (#64)
WeldonWangwang 5ff7d32
Ww/cm xattention (#65)
WeldonWangwang 26c4f2f
Remove debug messages (#66)
WeldonWangwang 1ec3dfd
fix the place to check kvcache precision
ceciliapeng2011 a6e4bbb
useless code cleanup.
ceciliapeng2011 bdf2e89
fix lint error
ceciliapeng2011 8ba831a
fix throw check
ceciliapeng2011 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.