Skip to content

Conversation

@wuhuikx
Copy link

@wuhuikx wuhuikx commented Nov 8, 2025

This PR has been merge into vllm main and rocm/vllm dev/perf vllm-project#27224

Then we can support block_size > 1.

@wuhuikx
Copy link
Author

wuhuikx commented Nov 8, 2025

Copy link

@zejunchen-zejun zejunchen-zejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wuhuikx
Copy link
Author

wuhuikx commented Nov 8, 2025

vLLM dev/perf branch Version: 0.11.1rc2.dev0+ge9fce7brocm701
AITER dev/perf branch Version: 0.1.5.post5.dev5+gd0a40f55c

  1. MI308 DS blockscale. launch_deepseekr1.sh with VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9204 ± 0.0075
strict-match 5 exact_match 0.9189 ± 0.0075
  1. MI308 DS blockscale. launch_deepseekr1.sh with VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9212 ± 0.0074
strict-match 5 exact_match 0.9212 ± 0.0074
  1. MI308 DS PTPC. launch_deepseekr1_ptpc_fp8.sh with VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=0
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.0091 ± 0.0026
strict-match 5 exact_match 0.0000 ± 0.0000
  1. MI308 DS PTPC. launch_deepseekr1_ptpc_fp8.sh with VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9333 ± 0.0069
strict-match 5 exact_match 0.9356 ± 0.0068

@wuhuikx
Copy link
Author

wuhuikx commented Nov 8, 2025

@tjtanaavllm @kliuae-amd it's strange that the blockscale has accuracy regression. It should be 0.95 right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants