Skip to content

Conversation

@shivampr
Copy link
Contributor

@shivampr shivampr commented Oct 13, 2025

Purpose

Fixes #14397triton_scaled_mm was never used on ROCm due to missing dispatch and checks.
This PR:

  • Enables Triton fallback for ROCm when AITriton is unavailable

  • Adds Triton fallback after CUTLASS on CUDA

  • Implements is_supported() checks for kernel selection

  • Adds a lightweight integration test validating ROCm dispatch logic


Test Plan

1. Mocked test (no GPU)

python3 mini_tests/select_triton_rocm.py

Result

Selected kernel: TritonScaledMMLinearKernel
OK: TritonScaledMMLinearKernel chosen on ROCm fallback.

2. MI300X (ROCm 7.0, vLLM built from this PR)

(a) Triton kernel functional test

max_abs_err≈2.5e-01, max_rel_err≈3.9e-03

(b) OpenAI-compatible API test

python3 -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-0.5B-Instruct --dtype bfloat16 --host 0.0.0.0 --port 8000

Then:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct","messages":[{"role":"user","content":"Say hi from MI300X."}]}'

Response

"Hello! How can I assist you today?"

Confirms successful end-to-end inference on ROCm.

@mergify mergify bot added the rocm Related to AMD ROCm label Oct 13, 2025
@mergify
Copy link

mergify bot commented Oct 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @shivampr.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where triton_scaled_mm was not being used on ROCm by fixing the kernel selection logic. It correctly adds TritonScaledMMLinearKernel as a fallback for both ROCm and CUDA, and introduces an is_supported check to ensure kernels are compatible with the current platform. The changes are accompanied by a new integration test to verify the fix.

My review focuses on improving the robustness of the kernel selection. I've suggested making the get_min_capability check in the Triton kernel platform-aware to prevent it from being selected on unsupported ROCm hardware. Additionally, I've pointed out a confusing try-except block in the new test file that should be simplified for clarity and to avoid masking potential errors.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@shivampr shivampr force-pushed the rocm-triton-fallback branch 3 times, most recently from 99018da to 4d3a612 Compare October 13, 2025 05:09
@mergify mergify bot removed the needs-rebase label Oct 13, 2025
@shivampr shivampr force-pushed the rocm-triton-fallback branch 4 times, most recently from d0d088d to 9036316 Compare October 13, 2025 05:50
@shivampr shivampr force-pushed the rocm-triton-fallback branch from 9036316 to 2a6c86c Compare October 24, 2025 05:11
@shivampr shivampr force-pushed the rocm-triton-fallback branch from be28ac6 to d2591bf Compare November 4, 2025 15:07
@shivampr shivampr requested a review from WoosukKwon as a code owner November 4, 2025 15:07
@ProExpertProg
Copy link
Collaborator

Is this ready for review again?

@shivampr
Copy link
Contributor Author

shivampr commented Nov 7, 2025

@ProExpertProg yes!
Sorry will ping you directly from now on if its review ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: triton_scaled_mm never used on ROCm

3 participants