Skip to content

Conversation

carryyu
Copy link
Collaborator

@carryyu carryyu commented Sep 1, 2025

Set kv_cache_quant_type to block_wise_fp8 in config to enable it.

For example
"quantization_config":{
"quantization" : "mix_quant",
"kv_cache_quant_type" : "block_wise_fp8",
"dense_quant_type" : "block_wise_fp8",
"moe_quant_type" : "block_wise_fp8"
}

Copy link

paddle-bot bot commented Sep 1, 2025

Thanks for your contribution!

Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 加下单测
  2. 动态量化也可以通过命令行参数 --quantization参数支持设置下,用起来比较方便

@carryyu
Copy link
Collaborator Author

carryyu commented Sep 4, 2025

  1. 加下单测
  2. 动态量化也可以通过命令行参数 --quantization参数支持设置下,用起来比较方便
  1. 单测已添加;
  2. 这里如果只指定kv量化,看起来改动链路较长,建议暂不放在该PR中修改。

@yuanlehome yuanlehome merged commit af49b81 into PaddlePaddle:develop Sep 8, 2025
44 of 50 checks passed
EmmonsCurse added a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants