You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance benchmark_moe.py: vLLM Version Compatibility Fixes
This PR introduces comprehensive compatibility fixes to support multiple vLLM
versions and prevent runtime import/parameter errors:
1. ImportError: cannot import name '_get_config_dtype_str'
- Added multi-level import fallback with proper function signature
- Implemented correct fallback logic matching original function behavior
2. TypeError: FusedMoEQuantConfig.make() parameter incompatibility
- Created make_quant_config_compatible() with multiple parameter combinations
- Handles quant_dtype/dtype variations across vLLM versions
3. TypeError: fused_experts() parameter incompatibility
- Implemented fused_experts_compatible() with signature inspection
- Only passes supported parameters (quant_config, allow_deep_gemm, etc.)
4. Fixed PR_DESCRIPTION.md markdown formatting
- Proper H1 heading and 4-space list indentation
- Complies with markdownlint requirements
5. Fixed line length violations (E501)
- Split long import statements and function calls
- All lines now comply with 88 character limit
Features:
- No changes to benchmark algorithm logic
- Production-ready English output messages
- Supports vLLM 0.6.0+ through 0.10.0+ releases
- Comprehensive error handling and graceful fallbacks
Signed-off-by: Alfred <[email protected]>
# Enhance benchmark_moe.py: vLLM Version Compatibility Fixes
2
+
3
+
## Description
4
+
5
+
This PR introduces compatibility fixes to `benchmarks/kernels/benchmark_moe.py` to support multiple vLLM versions and prevent runtime import/parameter errors. The following issues are addressed:
6
+
7
+
1. ImportError: cannot import name '_get_config_dtype_str'
8
+
9
+
- Added a multi-level import fallback that searches possible module locations and class methods for `_get_config_dtype_str` and provides a fallback implementation when unavailable.
- Implemented `fused_experts_compatible()` which inspects `fused_experts` signature and only passes supported parameters (`quant_config`, `allow_deep_gemm`, etc.).
18
+
19
+
## Notes
20
+
21
+
- No change to the benchmark algorithm logic.
22
+
- All output messages are in English and suitable for production logs.
23
+
- These fixes aim to support vLLM 0.6.0+ through 0.10.0+ releases.
24
+
25
+
Please review and let me know if you'd like additional cleanups or unit tests included.
0 commit comments