v0.9.2
What's Changed
🚀 Features
- [Feature] metrics support by @CUHKSZzxy in #3534
- Relax FP8 TP requirement by @lzhangzz in #3697
- FA3 by @zhaochaoxing in #3623
- support qwen2/2.5-vl in turbomind by @irexyc in #3744
- feat: add pytorch_engine_qwen2_5vl_sm120 by @kolmogorov-quyet in #3750
- Internvl pt by @RunningLeon in #3765
- Improve internvl for turbomind engine by @lvhan028 in #3769
💥 Improvements
- Refactor linear by @grimoire in #3653
- remove python3.8 support and add python3.13 support by @lvhan028 in #3638
- refactor vl inputs split by @grimoire in #3699
- [Fix]: Replace mutable default with default_factory for scheduler_stats by @ConvolutedDog in #3730
- Fix the logic of calculating max_new_tokens and determining finish_reason by @lvhan028 in #3727
- Override HF config.json via CLI by @CUHKSZzxy in #3722
- feat(build): Integrate and build turbomind backend directly in setup.py by @windreamer in #3726
- Generate the benchmark output filename with given arguments by @lvhan028 in #3740
- Make loading llm without vlm as an option by @grimoire in #3745
🐞 Bug fixes
- add ray to ascend requirements by @sigma-plus in #3713
- fix accessing undefined attribute
seq_auxof deepseek-r1-0528 by @lvhan028 in #3728 - [Fix]: Avoid quantize qk norm for qwen3 dense models by @taishan1994 in #3733
- fix py313 env creation failed when building lmdeploy-builder image by @lvhan028 in #3739
- [Fix]: kernel meta retrieval for SM7X does not work by @xiaoajie738 in #3746
- limit max_session_len by @grimoire in #3751
- fix internvl norm by @grimoire in #3756
- support qwen3 moe yarn and vlm hf_overrides by @grimoire in #3757
- [PD Disaggregation] fix double unshelf by @JimyMa in #3762
- fix(build): fix version parse regex to support post-release versions by @windreamer in #3764
- adapt transformers>=v4.52.0 to loading qwen2.5-vl with turbomind by @irexyc in #3771
- fix chat template with tool call by @RunningLeon in #3773
- fix vl nothink mode by @RunningLeon in #3776
📚 Documentations
- update reward model docs by @CUHKSZzxy in #3721
🌐 Other
- update twomicrobatch by @SHshenhao in #3651
- [CI]: Upgrade to py310 for ut by @RunningLeon in #3718
- [ci] update dailytest environment and scripts by @zhulinJulia24 in #3716
- Preliminary Blackwell (sm_120a, RTX 50 series) support by @lzhangzz in #3701
- [ci] add fp8 evaluation workflow by @zhulinJulia24 in #3729
- Add VRAM bandwidth utilization stat to attention test by @lzhangzz in #3731
- doc: fix dead links to MindX DL to recover CI. by @windreamer in #3741
- fix free cache in MPEngine branch by @JimyMa in #3670
- fix: make RelWithDebInfo default cmake build type by @windreamer in #3774
- bump version to v0.9.2 by @lvhan028 in #3770
New Contributors
- @sigma-plus made their first contribution in #3713
- @ConvolutedDog made their first contribution in #3730
- @windreamer made their first contribution in #3726
- @taishan1994 made their first contribution in #3733
- @xiaoajie738 made their first contribution in #3746
- @kolmogorov-quyet made their first contribution in #3750
Full Changelog: v0.9.1...v0.9.2