Support SDAR #3922

grimoire · 2025-09-02T02:59:52Z

Since we are refactoring chat template, we will use template of internlm2 for now.

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig, ChatTemplateConfig


from lmdeploy.pytorch.tools.utils import Timer, visualize_pipe_out


if __name__ == '__main__':
    model_path = 'JetLM/SDAR-1.7B-Chat'
    chat_template_config = ChatTemplateConfig('internlm2')

    log_level = 'WARNING'

    dllm_unmasking_strategy='low_confidence_dynamic'
    # dllm_unmasking_strategy='sequential'

    prompts = [
        'hakuna matata!',
        'The quick brown fox jumps over the lazy dog.'
        ]

    backend_config = PytorchEngineConfig(
        tp=1,
        dllm_block_length=4,
        dllm_unmasking_strategy=dllm_unmasking_strategy,
    )

    gen_config = GenerationConfig(
        max_new_tokens=512,
    )


    with pipeline(model_path, backend_config=backend_config,
                  chat_template_config=chat_template_config, log_level=log_level) as pipe:
        outputs = pipe(prompts, gen_config=gen_config)
        print(outputs)

lmdeploy/pytorch/strategies/dllm/engine.py

lmdeploy/pytorch/config.py

lmdeploy/pytorch/engine/model_agent.py

lmdeploy/pytorch/configurations/sdar.py

lmdeploy/pytorch/strategies/dllm/engine.py

benchmark/profile_throughput.py

lmdeploy/pytorch/config.py

lmdeploy/pytorch/configurations/sdar.py

lvhan028 · 2025-09-18T09:55:06Z

lmdeploy/pytorch/paging/scheduler.py

@@ -179,7 +183,7 @@ def _reorder_migrating():
        return running_migration

    @logging_timer('SchedulePrefilling', logger)
-    def _schedule_prefill(self):
+    def _schedule_prefill(self, prealloc_size: int = 0):


What's the purpose of prealloc_size? I see it's always passed as 0 from the schedule method. Is this intended for future use or should it be removed?

Over allocate kv cache might be useful for future models.

lvhan028 · 2025-09-18T10:30:52Z

Does SDAR conflict with --quant-policy fp8? I tried it, it repeated meaningless words.
FYI, --quant-policy for this model is not required in this PR

lvhan028 · 2025-09-18T10:45:35Z

@zhulinJulia24 may put JetLM/SDAR-8B-Chat, JetLM/SDAR-30B-A3B-Chat into CI

RunningLeon · 2025-09-18T12:24:43Z

lmdeploy/pytorch/strategies/base/model_inputs.py

+        1,
+        num_tokens,
+    ), dtype=torch.long, device=device)
+    seq_length = torch.ones((batch_size, ), dtype=torch.long, device=device)


seq_length should be all max_q_seqlen, not all 1

grimoire added 23 commits August 26, 2025 14:30

refactor SchedulerSequence

07c667c

block sparse attn

af01586

Merge branch 'refactor-seqs' into support-SDAR

e6e440d

Merge branch 'block-sparse-attn' into support-SDAR

4301864

support SDAR

e328c5d

Merge branch 'main' into support-SDAR

63efa34

fix max_new_tokens;update profiler

48a0137

add args

6e8f4c5

fix multiround stop words

42f4582

fix sampling step

9a68f1a

optimize position_ids

0fa2e7e

fix long context

85255d2

fix vlm

b65afc5

fix stopping

da2f403

move args into logitsprocessor

e6b5bdd

rename

2b0e607

Merge branch 'main' into support-SDAR

f7c7cd8

fix pd

a660a43

rename

b23d962

strategy + abstruct factory

34e41aa

update seqs

de49bb5

add moe support

3890cfe

bind block length

c1e4cde

grimoire changed the title ~~[POC]Support SDAR~~ Support SDAR Sep 8, 2025

grimoire marked this pull request as ready for review September 8, 2025 07:14

lvhan028 self-requested a review September 9, 2025 07:21

solve conflict

d9d688c

lvhan028 added the enhancement New feature or request label Sep 12, 2025

lvhan028 requested a review from RunningLeon September 12, 2025 07:44

RunningLeon reviewed Sep 12, 2025

View reviewed changes

lmdeploy/pytorch/strategies/dllm/engine.py Outdated Show resolved Hide resolved

fix num loops

26f4c2d

RunningLeon reviewed Sep 15, 2025

View reviewed changes

lmdeploy/pytorch/config.py Outdated Show resolved Hide resolved

RunningLeon reviewed Sep 15, 2025

View reviewed changes

lmdeploy/pytorch/engine/model_agent.py Outdated Show resolved Hide resolved

grimoire added 2 commits September 15, 2025 11:35

enum unmasking type

11674bf

typo fixing

8fce74a

RunningLeon reviewed Sep 15, 2025

View reviewed changes

lmdeploy/pytorch/configurations/sdar.py Outdated Show resolved Hide resolved

warning

94c3013

RunningLeon reviewed Sep 16, 2025

View reviewed changes

lmdeploy/pytorch/strategies/dllm/engine.py Show resolved Hide resolved

grimoire added 3 commits September 16, 2025 12:22

fix metric

c74b535

limit batch size

bbd1489

merge main

11d3c2e

lvhan028 mentioned this pull request Sep 17, 2025

Support InternVL3.5-Flash #3952

Merged

merge main

cc67ff6