[ROCm] Enable Triton ScaledMM fallback + kernel selection fix #26668

shivampr · 2025-10-13T01:06:44Z

Purpose

Fixes #14397 — triton_scaled_mm was never used on ROCm due to missing dispatch and checks.
This PR:

Enables Triton fallback for ROCm when AITriton is unavailable
Adds Triton fallback after CUTLASS on CUDA
Implements is_supported() checks for kernel selection
Adds a lightweight integration test validating ROCm dispatch logic

Test Plan

1. Mocked test (no GPU)

python3 mini_tests/select_triton_rocm.py

Result

Selected kernel: TritonScaledMMLinearKernel
OK: TritonScaledMMLinearKernel chosen on ROCm fallback.

2. MI300X (ROCm 7.0, vLLM built from this PR)

(a) Triton kernel functional test

max_abs_err≈2.5e-01, max_rel_err≈3.9e-03

(b) OpenAI-compatible API test

python3 -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-0.5B-Instruct --dtype bfloat16 --host 0.0.0.0 --port 8000

Then:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"user","content":"Say hi from MI300X."}]}'

Response

"Hello! How can I assist you today?"

Confirms successful end-to-end inference on ROCm.

mergify · 2025-10-13T01:07:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @shivampr.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request addresses an issue where triton_scaled_mm was not being used on ROCm by fixing the kernel selection logic. It correctly adds TritonScaledMMLinearKernel as a fallback for both ROCm and CUDA, and introduces an is_supported check to ensure kernels are compatible with the current platform. The changes are accompanied by a new integration test to verify the fix.

My review focuses on improving the robustness of the kernel selection. I've suggested making the get_min_capability check in the Triton kernel platform-aware to prevent it from being selected on unsupported ROCm hardware. Additionally, I've pointed out a confusing try-except block in the new test file that should be simplified for clarity and to avoid masking potential errors.

mini_tests/select_triton_rocm.py

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py

vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py

mini_tests/select_triton_rocm.py

… entry Signed-off-by: Shivam <[email protected]> Signed-off-by: Shivam <[email protected]>

ProExpertProg · 2025-11-06T22:13:04Z

Is this ready for review again?

shivampr · 2025-11-07T03:25:57Z

@ProExpertProg yes!
Sorry will ping you directly from now on if its review ready.

shivampr requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 13, 2025 01:06

mergify bot added the rocm Related to AMD ROCm label Oct 13, 2025

mergify bot added the needs-rebase label Oct 13, 2025

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

mini_tests/select_triton_rocm.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 13, 2025

View reviewed changes

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py Show resolved Hide resolved

shivampr force-pushed the rocm-triton-fallback branch 3 times, most recently from 99018da to 4d3a612 Compare October 13, 2025 05:09

mergify bot removed the needs-rebase label Oct 13, 2025

shivampr force-pushed the rocm-triton-fallback branch 4 times, most recently from d0d088d to 9036316 Compare October 13, 2025 05:50

gshtras reviewed Oct 20, 2025

View reviewed changes

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py Outdated Show resolved Hide resolved

shivampr force-pushed the rocm-triton-fallback branch from 9036316 to 2a6c86c Compare October 24, 2025 05:11

shivampr requested a review from pavanimajety as a code owner October 24, 2025 05:11

gshtras reviewed Oct 24, 2025

View reviewed changes

vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py Show resolved Hide resolved

ProExpertProg reviewed Nov 3, 2025

View reviewed changes

vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py Outdated Show resolved Hide resolved

ProExpertProg reviewed Nov 3, 2025

View reviewed changes

mini_tests/select_triton_rocm.py Outdated Show resolved Hide resolved

feat(rocm): enable TritonScaledMM fallback on ROCm; add CUDA fallback…

d2591bf

… entry Signed-off-by: Shivam <[email protected]> Signed-off-by: Shivam <[email protected]>

shivampr force-pushed the rocm-triton-fallback branch from be28ac6 to d2591bf Compare November 4, 2025 15:07

shivampr requested a review from WoosukKwon as a code owner November 4, 2025 15:07

Merge branch 'main' into rocm-triton-fallback

2bc030e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm] Enable Triton ScaledMM fallback + kernel selection fix #26668

[ROCm] Enable Triton ScaledMM fallback + kernel selection fix #26668

shivampr commented Oct 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg commented Nov 6, 2025

Uh oh!

shivampr commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[ROCm] Enable Triton ScaledMM fallback + kernel selection fix #26668

Are you sure you want to change the base?

[ROCm] Enable Triton ScaledMM fallback + kernel selection fix #26668

Conversation

shivampr commented Oct 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

1. Mocked test (no GPU)

2. MI300X (ROCm 7.0, vLLM built from this PR)

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg commented Nov 6, 2025

Uh oh!

shivampr commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shivampr commented Oct 13, 2025 •

edited by github-actions bot

Loading