Release v0.9.2 · InternLM/lmdeploy

What's Changed

Refactor linear by @grimoire in #3653
remove python3.8 support and add python3.13 support by @lvhan028 in #3638
refactor vl inputs split by @grimoire in #3699
[Fix]: Replace mutable default with default_factory for scheduler_stats by @ConvolutedDog in #3730
Fix the logic of calculating max_new_tokens and determining finish_reason by @lvhan028 in #3727
Override HF config.json via CLI by @CUHKSZzxy in #3722
feat(build): Integrate and build turbomind backend directly in setup.py by @windreamer in #3726
Generate the benchmark output filename with given arguments by @lvhan028 in #3740
Make loading llm without vlm as an option by @grimoire in #3745

add ray to ascend requirements by @sigma-plus in #3713
fix accessing undefined attribute seq_aux of deepseek-r1-0528 by @lvhan028 in #3728
[Fix]: Avoid quantize qk norm for qwen3 dense models by @taishan1994 in #3733
fix py313 env creation failed when building lmdeploy-builder image by @lvhan028 in #3739
[Fix]: kernel meta retrieval for SM7X does not work by @xiaoajie738 in #3746
limit max_session_len by @grimoire in #3751
fix internvl norm by @grimoire in #3756
support qwen3 moe yarn and vlm hf_overrides by @grimoire in #3757
[PD Disaggregation] fix double unshelf by @JimyMa in #3762
fix(build): fix version parse regex to support post-release versions by @windreamer in #3764
adapt transformers>=v4.52.0 to loading qwen2.5-vl with turbomind by @irexyc in #3771
fix chat template with tool call by @RunningLeon in #3773
fix vl nothink mode by @RunningLeon in #3776

Full Changelog: v0.9.1...v0.9.2