- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.8k
[GPU] XAttention as a preview feature #32064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Closed
      
      
            ceciliapeng2011
  wants to merge
  105
  commits into
  openvinotoolkit:master
from
ceciliapeng2011:cecilia/pa_cm_xattention
  
      
      
   
      
    
  
     Closed
                    Changes from 100 commits
      Commits
    
    
            Show all changes
          
          
            105 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      e030c80
              
                Init PA CM Impl(1st/2nd token and kvcache update)
              
              
                riverlijunjie 435a7ac
              
                enabled simple pa unit tests pass
              
              
                riverlijunjie 8947906
              
                Fix 2nd_token issue
              
              
                riverlijunjie 83dba29
              
                Fixed pipeline output corruption issue
              
              
                riverlijunjie 2743aab
              
                Fix 2nd non-16 alignment accuracy issue
              
              
                riverlijunjie 65b9cc7
              
                Set best partition size for 2nd
              
              
                riverlijunjie c4a1659
              
                update KV_BLOCK_SIZE to 256
              
              
                ceciliapeng2011 62a222f
              
                initiate xattention integration
              
              
                ceciliapeng2011 ac882ab
              
                qwen2.5-1.5b 4k trunk works with xatten.
              
              
                ceciliapeng2011 0621e4b
              
                4k aligned works.
              
              
                ceciliapeng2011 98a4ecd
              
                fix block_mask not fully initialized issue.
              
              
                ceciliapeng2011 5af3330
              
                fix of find_block
              
              
                ceciliapeng2011 4f9ed28
              
                xatten: fix accuacy problem caused by debug
              
              
                ceciliapeng2011 d35f4fb
              
                use int32 to store float INV_S to align python version accuracy
              
              
                luo-cheng2021 4e25a4a
              
                OV_GPU_XATTN_BLOCK_SIZE and OV_GPU_XATTN_THRESH
              
              
                ceciliapeng2011 c3c87b7
              
                fix building error on windows.
              
              
                usstq 76685f0
              
                process tail in find_block
              
              
                ceciliapeng2011 c5bdcf9
              
                Fix f16 accuracy issue and optimize 2nd token to improve 5%
              
              
                riverlijunjie 95a2da1
              
                fix waring_as_error on CI Windows.
              
              
                ceciliapeng2011 36bee72
              
                dump block mask with DUMP_XATTN_BLOCK_MASK for debug
              
              
                ceciliapeng2011 4fa97be
              
                Support kv cache u8 precision
              
              
                riverlijunjie 55ba7c3
              
                refactor: split into pa_common and sdpa_common, which include attenti…
              
              
                ceciliapeng2011 a06adef
              
                integrate xattn_post_proc kernel and FP16 kernel works. TODOto verify…
              
              
                ceciliapeng2011 4b391be
              
                update partition size
              
              
                riverlijunjie f2f2126
              
                enable int8 kvcache for xatten, but accuracy fails.
              
              
                ceciliapeng2011 89c8577
              
                fix xattn kvcache u8 accuracy issue.
              
              
                ceciliapeng2011 024b71a
              
                Fix 2nd accuracy issue
              
              
                riverlijunjie 033304f
              
                Fix 2nd accuracy issue
              
              
                ceciliapeng2011 a6e72d0
              
                fix xattn tailing issue: Q_blocks < K_blocks, as K_blocks is aligned …
              
              
                ceciliapeng2011 f7ddc68
              
                decide pa block size based whether use xattntion
              
              
                rnwang04 29cdabb
              
                fix bloxk size logic
              
              
                rnwang04 5048081
              
                fix partition size
              
              
                rnwang04 0c8c029
              
                fix condition of xattn stages
              
              
                rnwang04 6fbf07b
              
                Add xAttention reference operation and test
              
              
                WeldonWangwang 13b1122
              
                Optimize single_token_finalization kernel with fixed unroll
              
              
                riverlijunjie 24d6b80
              
                Fix win build
              
              
                peterchen-intel 326fc4d
              
                Fix win build
              
              
                peterchen-intel 508fab3
              
                Fix win build
              
              
                peterchen-intel 73669d3
              
                Enable CM PA only in case of XAttention been enabled.
              
              
                ceciliapeng2011 45bedf3
              
                pass xattention threshold from genai
              
              
                ceciliapeng2011 b7a9a8b
              
                xattention_block_size unconfigurable
              
              
                ceciliapeng2011 703dca6
              
                Merge branch 'cecilia/pa_cm_xattention_bridge' into cecilia/pa_cm_xat…
              
              
                ceciliapeng2011 f9f58be
              
                invalidate sparse atten process if threshold is larger than 1.0.
              
              
                ceciliapeng2011 f7fa94f
              
                Merge branch 'master' into cecilia/pa_cm_xattention
              
              
                ceciliapeng2011 3afbdb5
              
                cpplint error fixes
              
              
                ceciliapeng2011 2c37d0d
              
                Define ENABLE_PA_CM_PATH for build
              
              
                peterchen-intel cae516a
              
                Fix worning as error issues on windows with VS2022
              
              
                zhaixuejun1993 010b6e7
              
                Merge pull request #56 from zhaixuejun1993/xuejun/fix-warning-as-error
              
              
                ceciliapeng2011 808a789
              
                [WA] clean unused kvcache buffer
              
              
                riverlijunjie 22f0459
              
                Fix format issues
              
              
                zhaixuejun1993 780f55a
              
                disable XAttention for legacy platforms (XAttention kernels are imple…
              
              
                ceciliapeng2011 d21c4f6
              
                reset left V cache block rather than 16 rows
              
              
                riverlijunjie 6b9b4c2
              
                Remove debug code
              
              
                riverlijunjie eb9765e
              
                revert code change to ocl_v2
              
              
                ceciliapeng2011 1418daa
              
                cleanup debug code
              
              
                ceciliapeng2011 21c3193
              
                Limit head_num/kv_head_num not excceed 8
              
              
                riverlijunjie 8a7a380
              
                streamline block_size head_size in both cases of fp16 and u8/i8 kvcache
              
              
                ceciliapeng2011 472f774
              
                Remove CM PA tests
              
              
                zhaixuejun1993 1fdcd3c
              
                refactor: use paged_attention::block_size_xattn instead of hardcode n…
              
              
                ceciliapeng2011 a62fd1b
              
                worksgit status git status
              
              
                WeldonWangwang 52aad92
              
                Merge pull request #57 from ceciliapeng2011/river/pa_nan_debug
              
              
                ceciliapeng2011 3da8a34
              
                Fix the KV cache padding with Nan issue for 1st token.
              
              
                luweizhou2016 2dd7a81
              
                Fix nan issue for 2nd token
              
              
                riverlijunjie bbf17ed
              
                Clean code
              
              
                WeldonWangwang 147063f
              
                Clean code
              
              
                WeldonWangwang c02fb34
              
                Clean code
              
              
                WeldonWangwang 314bd71
              
                Add CMXAttentionBlockSelector
              
              
                WeldonWangwang fdbba78
              
                Clean code
              
              
                WeldonWangwang 2ade1e1
              
                Clean code
              
              
                WeldonWangwang f402a14
              
                refactor: check single suquence condition
              
              
                ceciliapeng2011 342ae59
              
                Avoid 2nd token perf drop due to cleanup unused K cache
              
              
                riverlijunjie 8e8b74c
              
                fix: if kvcache config is dynamic, which may occurs with a typo error…
              
              
                ceciliapeng2011 4a82167
              
                Clean code
              
              
                WeldonWangwang 6f7dd8d
              
                Clean code
              
              
                WeldonWangwang 326ee44
              
                Add more test cases
              
              
                WeldonWangwang cfa1f3a
              
                Clean code
              
              
                WeldonWangwang f795152
              
                Merge pull request #55 from WeldonWangwang/wangwang/add_xattention_tests
              
              
                WeldonWangwang 35267d3
              
                Fix build errors and code style (#59)
              
              
                WeldonWangwang 2dfbb19
              
                Fix test cases and skip testing on unsupported platforms (#60)
              
              
                WeldonWangwang cca1528
              
                bypas xattn when thresh>=1.0 and q_len<STRIDE.
              
              
                ceciliapeng2011 618e575
              
                throw exception if xattn is not supported by either GPU archieture or…
              
              
                ceciliapeng2011 b2afd6e
              
                Merge branch 'master' into cecilia/pa_cm_xattention
              
              
                WeldonWangwang 522a503
              
                add OV_GPU_DUMP_SRC_TENSORS_AFTER_EXEC
              
              
                ceciliapeng2011 3e527be
              
                code cleanup, unused code
              
              
                ceciliapeng2011 1e243fc
              
                throw exception for unsupported cases.
              
              
                ceciliapeng2011 b45062c
              
                fix dump... intermediates tensor may empty.
              
              
                ceciliapeng2011 50628c5
              
                fix
              
              
                ceciliapeng2011 1073002
              
                Ww/pa cm xattention 1019 (#61)
              
              
                WeldonWangwang 5eff824
              
                Ww/pa cm xattention 1020 (#62)
              
              
                WeldonWangwang d164bba
              
                Merge branch 'master' into cecilia/pa_cm_xattention
              
              
                WeldonWangwang 853b562
              
                PagedAttentionInternBuffIdx
              
              
                ceciliapeng2011 0870cbb
              
                refactor xattention kernel impls by reusing RT parameters, instead of…
              
              
                ceciliapeng2011 c2bde5b
              
                fix clang-format style issues
              
              
                ceciliapeng2011 554ebf4
              
                merge xattention tests into paged_attention tests (#63)
              
              
                WeldonWangwang e794f5b
              
                Fix build error (#64)
              
              
                WeldonWangwang 5ff7d32
              
                Ww/cm xattention (#65)
              
              
                WeldonWangwang 26c4f2f
              
                Remove debug messages (#66)
              
              
                WeldonWangwang 1ec3dfd
              
                fix the place to check kvcache precision
              
              
                ceciliapeng2011 a6e4bbb
              
                useless code cleanup.
              
              
                ceciliapeng2011 bdf2e89
              
                fix lint error
              
              
                ceciliapeng2011 8ba831a
              
                fix throw check
              
              
                ceciliapeng2011 5201cdf
              
                add allow_bypass_xattn
              
              
                 7825960
              
                fix rt_params q_block_pad_merged not assigned issue
              
              
                ceciliapeng2011 d870554
              
                Merge branch 'master' into cecilia/pa_cm_xattention
              
              
                ceciliapeng2011 14e57f9
              
                Merge branch 'master' into cecilia/pa_cm_xattention
              
              
                peterchen-intel File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
      
      Oops, something went wrong.
        
    
  
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.