[DO NOT LAND] Prototype Helion kernel in vLLM #29051

gmagogsfm · 2025-11-20T00:35:06Z

Prototype Helion kernels in vLLM

gmagogsfm · 2025-11-21T23:29:22Z

After some optimization, Helion silu_mul_fp8 outperforms the original cuda kernel. Biggest factor is enabling cudagraph to eliminate launch overhead.

Benchmarking script is included in this branch.

============================================================
Summary Statistics
============================================================
Total configurations tested: 242

Speedup:
  Average: 2.64x
  Median:  2.45x
  Min:     1.01x
  Max:     6.25x

Latency (ms):
  Baseline - Avg: 0.0129, Min: 0.0016, Max: 0.2469
  Helion   - Avg: 0.0063, Min: 0.0015, Max: 0.1570
============================================================

gmagogsfm · 2025-11-22T05:35:46Z

RMS Norm Quant 8

Summary Statistics
============================================================
Total configurations tested: 199

Speedup:
  Average: 1.62x
  Median:  1.59x
  Min:     0.96x
  Max:     2.58x

Latency (ms):
  Baseline - Avg: 0.0040, Min: 0.0022, Max: 0.0190
  Helion   - Avg: 0.0024, Min: 0.0014, Max: 0.0125
============================================================

gmagogsfm · 2025-11-26T03:20:01Z

Added allreduce_add_rmsnorm Helion kernel

2xH100 test, without any comms optimization, compared against flashinfer comm with fusion.

(results are flaky on my machine, with average speedup ranging from 0.99x to over 1.6x, maybe because this machine is a shared dev box)

============================================================
Summary Statistics
============================================================
Total configurations tested: 78

Speedup:
  Average: 1.36x
  Median:  1.37x
  Min:     0.91x
  Max:     2.23x

Latency (ms):
  Baseline - Avg: 0.1202, Min: 0.0985, Max: 0.2095
  Helion   - Avg: 0.0946, Min: 0.0510, Max: 0.2102
============================================================

- This prorotype implements a naive silu_mul_fp8 kernel and integrates it in vLLM's custom fusion pass in the form of a custom op - Numerical accuracy is verified - There is on average about 4x slow down compared to vLLM's custom silu_mul_fp8 CUDA kernel Signed-off-by: Yanan Cao <[email protected]>

Signed-off-by: Yanan Cao <[email protected]>

This commit introduces comprehensive improvements to Helion kernel configuration: ## Major Changes ### ConfigManager Consolidation - **Created centralized ConfigManager**: Extracted duplicated config logic from HelionCustomOp and autotune script into dedicated class - **Standardized naming**: Config files now use exact kernel names (helion_silu_mul_fp8_helion_4096.json) instead of normalized names - **Smart directory detection**: Auto-finds vLLM repo root for config storage - **Renamed existing configs**: Migrated 8 config files to new naming standard ### Architecture Improvements - **Separated concerns**: HelionCustomOp only handles autotuning, script handles saving - **Pure function design**: get_best_config() now takes available configs dict instead of doing I/O - **Method to property**: Converted _get_helion_kernel() to helion_kernel property - **Removed dead code**: Eliminated unused find_best_config() method violating SRP - **Fixed critical bug**: Config filtering logic now properly handles partial configs ### Type Safety Enhancements - **Specific type annotations**: Replaced generic Union[str, type] with KernelIdentifier = Union[str, "type[HelionCustomOp]"] - **TYPE_CHECKING imports**: Added proper forward references to avoid circular imports - **Type alias**: Introduced KernelIdentifier for better code readability ### API Changes - **autotune() signature**: Now requires autotune_inputs parameter instead of calling get_autotune_inputs() internally - **get_best_config()**: Takes available_configs dict parameter for pure function behavior - **Logging namespace**: Fixed script logging to use proper vLLM namespace ## Files Modified - NEW: vllm/compilation/helion/config_manager.py - Centralized config management - NEW: scripts/autotune_helion_kernels.py - Orchestration script with proper separation - MODIFIED: vllm/compilation/helion/custom_op.py - Refactored base class - MODIFIED: All 3 kernel implementations - Updated to use new architecture - RENAMED: 8 config files from normalized to exact kernel names 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Convert f-string logging to proper format strings (G004) - Fix line length violations (E501) - Use dict iteration instead of .keys() (SIM118) - Break long help text into multiple lines All pre-commit checks now pass successfully. Signed-off-by: Yanan Cao <[email protected]>

Signed-off-by: Yanan Cao <[email protected]>

gmagogsfm force-pushed the helion branch 3 times, most recently from b6a7fa2 to ff1634b Compare November 21, 2025 23:24

mergify bot added the performance Performance-related issues label Nov 21, 2025

gmagogsfm force-pushed the helion branch from ff1634b to be15f51 Compare November 22, 2025 00:02

mgoin self-requested a review November 25, 2025 14:52

gmagogsfm force-pushed the helion branch from 298dc89 to ce5ffbc Compare November 26, 2025 03:18

gmagogsfm and others added 10 commits December 1, 2025 23:24

add helion kernel benchmarking infra

fb37ce1

Signed-off-by: Yanan Cao <[email protected]>

Add RMS Norm Quant fp8 Helion Kernel

602b918

Signed-off-by: Yanan Cao <[email protected]>

Add allreduce_add_rmsnorm

78aeaf1

Signed-off-by: Yanan Cao <[email protected]>

distributed benchmark infra

fed7789

Signed-off-by: Yanan Cao <[email protected]>

remove unintended changes in test_fusion_all_reduce.py

80d8b6c

Signed-off-by: Yanan Cao <[email protected]>

[WIP] Helion custom op enablement/testing infra

8eff281

Signed-off-by: Yanan Cao <[email protected]>

Add regsiter_kernel decorator

578ac03

Signed-off-by: Yanan Cao <[email protected]>

gmagogsfm force-pushed the helion branch from eab686a to 578ac03 Compare December 3, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DO NOT LAND] Prototype Helion kernel in vLLM #29051

[DO NOT LAND] Prototype Helion kernel in vLLM #29051

gmagogsfm commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

gmagogsfm commented Nov 21, 2025 •

edited

Loading

Uh oh!

gmagogsfm commented Nov 22, 2025

Uh oh!

gmagogsfm commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[DO NOT LAND] Prototype Helion kernel in vLLM #29051

Are you sure you want to change the base?

[DO NOT LAND] Prototype Helion kernel in vLLM #29051

Conversation

gmagogsfm commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmagogsfm commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmagogsfm commented Nov 22, 2025

Uh oh!

gmagogsfm commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gmagogsfm commented Nov 20, 2025 •

edited by github-actions bot

Loading

gmagogsfm commented Nov 21, 2025 •

edited

Loading