-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[GPU] XAttention as a preview feature #32064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[GPU] XAttention as a preview feature #32064
Conversation
623f524 to
50a8290
Compare
1. kvcache update's k/v offset issue 2. 2nd token lse data overflow issue
* Tests support num_kv_heads * Update test cases * Fix code style * Fix code style
* Fix code style * Clean code
… recomputing them.
src/plugins/intel_gpu/tests/unit/test_cases/xattention_gpu_test.cpp
Outdated
Show resolved
Hide resolved
| if (past_len != 0) { | ||
| int blocks_num = ceil_div(past_len, block_size); | ||
| int blocks_num = ceil_div(past_len + 1, block_size); | ||
| int start_block_idx = block_indices[block_indices_begins[i]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeldonWangwang May I know why do we need +1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take into account the block where the current token is located
* throw exception if k_head_size != v_head_size and has_xattn * Add more test cases
|
ie_tests_win_gpu_vs2022_release this seems a real crash. Please check |
| bool use_xattention = false; | ||
| const auto& parameters = func->get_parameters(); | ||
| for (const auto& param : parameters) { | ||
| if (param->get_friendly_name() == "xattention_block_size") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't user turn off by config? Currently Xattention is not fully supported. (e.g., only supporting num seqs = 1 & by_token compression, cm is unavailable, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XAttention is disabled by default. Its activation is controlled via the GenAI scheduler configuration using the --cb_config parameter.
When enabled through the scheduler config, GenAI creates model Parameter nodes with names starting with "xattention_". Otherwise, these nodes are created as empty Constant nodes. For more details, refer to the OpenVINO implementation.
To learn how to enable or disable XAttention, please see the command line reference.
Here, GPU plugin follows the same logic to determine whether use_xattention is enabled by the user.
The GPU plugin performs additional checks to ensure XAttention is supported. It throws an exception if any of the following conditions are met:
- num_seqs > 1
- channel-level kvcache compression is used
- CM is unavailable
- Not Xe2~Xe3 GPUs
- Other unsupported configurations
| kv_stop = (wg_id + 1) * wg_seq_len + past_q_lens; | ||
| if (kv_stop > kv_seq_len) kv_stop = kv_seq_len; | ||
| } | ||
| // printf("###########wg:%d.%d q: %d, +%d kv: %d, +%d, kvstop:%d\n", wg_id, wg_local_id, q_start_sg, q_len_sg, kv_start, kv_seq_len, kv_stop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(random point)
Please clean up the cm codes not to have unused comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all done
...common/transformations/src/transformations/common_optimizations/convert_pagedattn_inputs.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, for common optimization part.
Details:
This PR should work along with openvinotoolkit/openvino.genai#2764.
Tickets: