Releases · InternLM/lmdeploy

18 Dec 12:10

lvhan028

v0.1.0

477f2db

LMDeploy Release V0.1.0

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685
convert model with hf repo_id by @irexyc in #774
Support turbomind bf16 by @grimoire in #803
support image_embs input by @irexyc in #799
Add api.py by @AllentDan in #805

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715
Set the default value of max_context_token_num 1 by @lvhan028 in #761
add triton server test and workflow yml by @RunningLeon in #760
improvement(build): enable ninja and gold linker by @tpoisonooo in #767
Report first-token-latency and token-latency percentiles by @lvhan028 in #736
Unify prefill & decode passes by @lzhangzz in #775
add cuda12.1 build check ci by @irexyc in #782
auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
Report the inference benchmark of models with different size by @lvhan028 in #794
Simplify block manager by @lzhangzz in #812
Disable attention mask when it is not needed by @lzhangzz in #813
FIFO pipe strategy for api_server by @AllentDan in #795
simplify the header of the benchmark table by @lvhan028 in #820
add encode for opencompass by @AllentDan in #828
fix: awq should save bin files by @hscspring in #793
Support building docker image manually in CI by @RunningLeon in #825

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747
[Fix] build docker image failed since packaging is missing by @lvhan028 in #753
[Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
fix turbomind build on sm<80 by @grimoire in #754
Fix early-exit condition in attention kernel by @lzhangzz in #788
Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
fix extra colon in InternLMChat7B template by @C1rN09 in #796
Fix local kv head num by @lvhan028 in #806
Fix out-of-bound access by @lzhangzz in #809
Set smem size for repetition penalty kernel by @lzhangzz in #818
Fix cache verification by @lzhangzz in #821
fix finish_reason by @AllentDan in #816
fix turbomind awq by @grimoire in #847
Fix stop requests by await before turbomind queue.get() by @AllentDan in #850
[Fix] Fix meta tensor error by @pppppM in #848
Fix cuda reinitialization in a multiprocessing setting by @grimoire in #862
launch gradio server directly with hf model by @AllentDan in #856
fix typo by @grimoire in #769
Add chat template for Yi by @AllentDan in #779
fix api_server stop_session and end_session by @AllentDan in #835
Return the iterator after erasing it from a map by @irexyc in #864

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680
Update benchmark user guide by @lvhan028 in #763
[Docs] Fix typo in restful_api user guide by @maxchiron in #858
[Docs] Fix typo in restful_api user guide by @maxchiron in #859

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709
bump version to 0.1.0a1 by @lvhan028 in #776
bump version to v0.1.0a2 by @lvhan028 in #807
bump version to v0.1.0 by @lvhan028 in #834

New Contributors

@zhouzaida made their first contribution in #715
@C1rN09 made their first contribution in #796
@maxchiron made their first contribution in #858

Full Changelog: v0.0.14...v0.1.0

Contributors

grimoire, lvhan028, and 11 other contributors

Assets 10

06 Dec 06:50

lvhan028

v0.1.0a2

fddad30

LMDeploy Release V0.1.0a2

What's Changed

💥 Improvements

Unify prefill & decode passes by @lzhangzz in #775
add cuda12.1 build check ci by @irexyc in #782
auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
Report the inference benchmark of models with different size by @lvhan028 in #794
Add chat template for Yi by @AllentDan in #779

🐞 Bug fixes

Fix early-exit condition in attention kernel by @lzhangzz in #788
Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
fix extra colon in InternLMChat7B template by @C1rN09 in #796
Fix local kv head num by @lvhan028 in #806

📚 Documentations

Update benchmark user guide by @lvhan028 in #763

🌐 Other

bump version to v0.1.0a2 by @lvhan028 in #807

New Contributors

@C1rN09 made their first contribution in #796

Full Changelog: v0.1.0a1...v0.1.0a2

Contributors

lvhan028, irexyc, and 3 other contributors

Assets 10

29 Nov 13:51

lvhan028

v0.1.0a1

9c46b27

LMDeploy Release V0.1.0a1

What's Changed

💥 Improvements

Set the default value of max_context_token_num 1 by @lvhan028 in #761
add triton server test and workflow yml by @RunningLeon in #760
improvement(build): enable ninja and gold linker by @tpoisonooo in #767
Report first-token-latency and token-latency percentiles by @lvhan028 in #736
convert model with hf repo_id by @irexyc in #774

🐞 Bug fixes

[Fix] build docker image failed since packaging is missing by @lvhan028 in #753
[Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
fix turbomind build on sm<80 by @grimoire in #754
fix typo by @grimoire in #769

🌐 Other

bump version to 0.1.0a1 by @lvhan028 in #776

Full Changelog: v0.1.0a0...v0.1.0a1

Contributors

grimoire, lvhan028, and 3 other contributors

Assets 2

23 Nov 13:05

lvhan028

v0.1.0a0

a7c5007

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709

New Contributors

@zhouzaida made their first contribution in #715

Full Changelog: v0.0.14...v0.1.0a0

Contributors

grimoire, lvhan028, and 7 other contributors

Assets 2

09 Nov 12:13

lvhan028

v0.0.14

7b20cfd

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

Improve api_server and webui usage by @AllentDan in #544
fix: gradio gr.Button.update deprecated after 4.0.0 by @hscspring in #637
add cli to list the supported model names by @RunningLeon in #639
Refactor model conversion by @irexyc in #296
[Enchance] internlm message to prompt by @Harold-lkk in #499
update turbomind session_len with model.session_len by @AllentDan in #634
Manage session id using random int for gradio local mode by @aisensiy in #553
Add UltraCM and WizardLM chat templates by @AllentDan in #599
Add check env sub command by @RunningLeon in #654

🐞 Bug fixes

[Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by @pppppM in #605
FIX: fix stop_session func bug by @yunzhongyan0 in #578
fix benchmark serving computation mistake by @AllentDan in #630
fix Tokenizer load error when the path of the being-converted model is not writable by @irexyc in #669
fix tokenizer_info when convert the model by @irexyc in #661

🌐 Other

bump version to v0.0.14 by @lvhan028 in #663

New Contributors

@hscspring made their first contribution in #637
@yunzhongyan0 made their first contribution in #578

Full Changelog: v0.0.13...v0.0.14

Contributors

aisensiy, lvhan028, and 7 other contributors

Assets 2

30 Oct 06:35

lvhan028

v0.0.13

56942c4

LMDeploy Release V0.0.13

What's Changed

🚀 Features

Add more user-friendly CLI by @RunningLeon in #541

💥 Improvements

support inference a batch of prompts by @AllentDan in #467

📚 Documentations

Add "build from docker" section by @lvhan028 in #602

🌐 Other

bump version to v0.0.13 by @lvhan028 in #620

Full Changelog: v0.0.12...v0.0.13

Contributors

lvhan028, RunningLeon, and AllentDan

Assets 2

24 Oct 04:23

lvhan028

v0.0.12

96f1b8e

LMDeploy Release V0.0.12

What's Changed

🚀 Features

add solar chat template by @AllentDan in #576 and #587

💥 Improvements

change model_format to qwen when model_name starts with qwen by @lvhan028 in #575
robust incremental decode for leading space by @AllentDan in #581

🐞 Bug fixes

avoid splitting chinese characters during decoding by @AllentDan in #566
Revert "[Docs] Simplify build.md" by @pppppM in #586
Fix crash and remove sys_instruct from chat.py and client.py by @irexyc in #591

🌐 Other

bump version to v0.0.12 by @lvhan028 in #604

Full Changelog: v0.0.11...v0.0.12

Contributors

lvhan028, irexyc, and 2 other contributors

Assets 2

17 Oct 06:19

lvhan028

v0.0.11

bb3cce9

LMDeploy Release V0.0.11

What's Changed

🚀 Features

Support CORS for openai api server by @aisensiy in #481

💥 Improvements

make IPv6 compatible, safe run for coroutine interrupting by @AllentDan in #487
support deploy qwen-14b-chat by @irexyc in #482
add tp hint for deployment by @irexyc in #555
Move tokenizer.py to the folder of lmdeploy by @grimoire in #543

🐞 Bug fixes

Change shared_instance type from weakptr to shared_ptr by @lvhan028 in #507
[Fix] Set the default value of step being 0 by @lvhan028 in #532
[bug] fix mismatched shape for decoder output tensor by @akhoroshev in #517
Fix typing of openai protocol. by @mokeyish in #554

📚 Documentations

Fix typo in docs/en/pytorch.md by @shahrukhx01 in #539
[Doc] update huggingface internlm-chat-7b model url by @AllentDan in #546
[doc] Update benchmark command in w4a16.md by @del-zhenwu in #500

🌐 Other

free runner disk by @irexyc in #552
bump version to v0.0.11 by @lvhan028 in #567

New Contributors

@shahrukhx01 made their first contribution in #539
@mokeyish made their first contribution in #554

Full Changelog: v0.0.10...v0.0.11

Contributors

aisensiy, grimoire, and 7 other contributors

Assets 2

26 Sep 12:52

lvhan028

v0.0.10

b58a9df

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

[feature] Graceful termination of background threads in LlamaV2 by @akhoroshev in #458
expose stop words and filter eoa by @AllentDan in #352

🐞 Bug fixes

Fix side effect brought by supporting codellama: sequence_start is always true when calling model.get_prompt by @lvhan028 in #466
Miss meta instruction of internlm-chat model by @lvhan028 in #470
[bug] Fix race condition by @akhoroshev in #460
Fix compatibility issues with Pydantic 2 by @aisensiy in #465
fix benchmark serving cannot use Qwen tokenizer by @AllentDan in #443
Fix memory leak by @lvhan028 in #488

📚 Documentations

Fix typo in README.md by @eltociear in #462

🌐 Other

bump version to v0.0.10 by @lvhan028 in #474

New Contributors

@eltociear made their first contribution in #462
@akhoroshev made their first contribution in #458
@aisensiy made their first contribution in #465

Full Changelog: v0.0.9...v0.0.10

Contributors

aisensiy, lvhan028, and 3 other contributors

Assets 2

20 Sep 08:10

lvhan028

v0.0.9

0be9e7a

LMDeploy Release V0.0.9

Highlight

Support InternLM 20B, including FP16, W4A16, and W4KV8

What's Changed

🚀 Features

Support InternLM 20B by @lvhan028 in #440

💥 Improvements

Reduce gil switching by @irexyc in #407
Profile token generation with more settings by @AllentDan in #364

🐞 Bug fixes

Fix disk space limit for building docker image by @RunningLeon in #404
more general pypi ci by @irexyc in #412
Fix build.md by @pangsg in #411
Fix memory leak by @irexyc in #415
Fix token count bug by @AllentDan in #416
[Fix] Support actual seqlen in flash-attention2 by @grimoire in #418
[Fix] output[-1] when output is empty by @wangruohui in #405

🌐 Other

rename readthedocs config file by @RunningLeon in #429
bump version to v0.0.9 by @lvhan028 in #428

New Contributors

@pangsg made their first contribution in #411

Full Changelog: v0.0.8...v0.0.9

Contributors

grimoire, lvhan028, and 5 other contributors

Assets 2

Releases: InternLM/lmdeploy

LMDeploy Release V0.1.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.1.0a2

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.1.0a1

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.0.13

What's Changed

🚀 Features

💥 Improvements

📚 Documentations

🌐 Other

Contributors

Uh oh!

LMDeploy Release V0.0.12

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

LMDeploy Release V0.0.11

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!