v0.7.3
What's Changed
🚀 Features
- Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
- [Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
- [ascend]support deepseekv2 by @yao-fengchen in #3206
- support ascend w8a8 graph_mode by @yao-fengchen in #3267
- support Llama4 by @grimoire in #3408
💥 Improvements
- Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
- add env var to control timeout by @CUHKSZzxy in #3291
- optimize mla, remove load
vby @grimoire in #3334 - refactor dlinfer rope by @yao-fengchen in #3326
- enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
- Optimize ascend moe by @yao-fengchen in #3364
- find port by @grimoire in #3429
🐞 Bug fixes
- fix activation grid oversize by @grimoire in #3282
- Set ensure_ascii=False for tool calling by @AllentDan in #3295
- add
vcheck by @grimoire in #3307 - Fix Qwen3MoE config parsing by @lzhangzz in #3336
- Fix finish reasons by @AllentDan in #3338
- remove think_end_token_id in streaming content by @AllentDan in #3327
- Fix the finish_reason by @AllentDan in #3350
- support List[dict] prompt input without do_preprocess by @irexyc in #3385
- fix tensor dispatch in dynamo by @wanfengcxz in #3417
📚 Documentations
- update ascend doc by @yao-fengchen in #3420
🌐 Other
- bump version to v0.7.2.post1 by @lvhan028 in #3298
- Optimize internvit by @caikun-pjlab in #3316
- bump version to v0.7.3 by @lvhan028 in #3416
New Contributors
- @wanfengcxz made their first contribution in #3417
- @caikun-pjlab made their first contribution in #3316
Full Changelog: v0.7.2...v0.7.3