-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[DO NOT LAND] Prototype Helion kernel in vLLM #29051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b6a7fa2 to
ff1634b
Compare
|
After some optimization, Helion silu_mul_fp8 outperforms the original cuda kernel. Biggest factor is enabling cudagraph to eliminate launch overhead. Benchmarking script is included in this branch. |
|
RMS Norm Quant 8 |
|
Added allreduce_add_rmsnorm Helion kernel 2xH100 test, without any comms optimization, compared against flashinfer comm with fusion. (results are flaky on my machine, with average speedup ranging from 0.99x to over 1.6x, maybe because this machine is a shared dev box) |
- This prorotype implements a naive silu_mul_fp8 kernel and integrates it in vLLM's custom fusion pass in the form of a custom op - Numerical accuracy is verified - There is on average about 4x slow down compared to vLLM's custom silu_mul_fp8 CUDA kernel Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
This commit introduces comprehensive improvements to Helion kernel configuration: ## Major Changes ### ConfigManager Consolidation - **Created centralized ConfigManager**: Extracted duplicated config logic from HelionCustomOp and autotune script into dedicated class - **Standardized naming**: Config files now use exact kernel names (helion_silu_mul_fp8_helion_4096.json) instead of normalized names - **Smart directory detection**: Auto-finds vLLM repo root for config storage - **Renamed existing configs**: Migrated 8 config files to new naming standard ### Architecture Improvements - **Separated concerns**: HelionCustomOp only handles autotuning, script handles saving - **Pure function design**: get_best_config() now takes available configs dict instead of doing I/O - **Method to property**: Converted _get_helion_kernel() to helion_kernel property - **Removed dead code**: Eliminated unused find_best_config() method violating SRP - **Fixed critical bug**: Config filtering logic now properly handles partial configs ### Type Safety Enhancements - **Specific type annotations**: Replaced generic Union[str, type] with KernelIdentifier = Union[str, "type[HelionCustomOp]"] - **TYPE_CHECKING imports**: Added proper forward references to avoid circular imports - **Type alias**: Introduced KernelIdentifier for better code readability ### API Changes - **autotune() signature**: Now requires autotune_inputs parameter instead of calling get_autotune_inputs() internally - **get_best_config()**: Takes available_configs dict parameter for pure function behavior - **Logging namespace**: Fixed script logging to use proper vLLM namespace ## Files Modified - NEW: vllm/compilation/helion/config_manager.py - Centralized config management - NEW: scripts/autotune_helion_kernels.py - Orchestration script with proper separation - MODIFIED: vllm/compilation/helion/custom_op.py - Refactored base class - MODIFIED: All 3 kernel implementations - Updated to use new architecture - RENAMED: 8 config files from normalized to exact kernel names 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Convert f-string logging to proper format strings (G004) - Fix line length violations (E501) - Use dict iteration instead of .keys() (SIM118) - Break long help text into multiple lines All pre-commit checks now pass successfully. Signed-off-by: Yanan Cao <[email protected]>
Signed-off-by: Yanan Cao <[email protected]>
Prototype Helion kernels in vLLM