[Runtime] StepAudio2 Streaming DiT Token2Wav Integration #1598

yuekaizhang · 2025-10-08T14:05:40Z

This PR supports Streaming DiT Token2wav module from https://github.com/stepfun-ai/Step-Audio2

The following results were obtained by decoding 26 sentences (172 secs total) on a single L20 GPU with the yuekai/seed_tts_cosy2 dataset.

Offline TTS (Cosyvoice2 0.5B LLM + StepAudio2 DiT Token2Wav)

Backend	Batch Size	llm_time_seconds	total_time_seconds	RTF
TRTLLM	16	2.01	5.03	0.0292

Streaming TTS (Cosyvoice2 0.5B LLM + StepAudio2 DiT Token2Wav) First Chunk Latency

Concurrent Tasks	Average (ms)	50th Percentile (ms)	90th Percentile (ms)	95th Percentile (ms)	99th Percentile (ms)
1	197.50	196.13	214.65	215.96	229.21
2	281.15	278.20	345.18	361.79	395.97
4	510.65	530.50	630.13	642.44	666.65
6	921.54	918.86	1079.97	1265.22	1524.41
8	1019.95	1085.26	1371.05	1402.24	1410.66
10	1214.98	1293.54	1575.36	1654.51	2161.76

boji123 · 2025-10-13T08:46:40Z

巧妙的实现，通过设置推理优先级来降低首包耗时

yuekaizhang · 2025-10-16T08:06:23Z

Disaggreated Deployment Using one L20 GPU for LLM

First chunk latency:

token2wav_num_gpu	concurrent_tasks_per_gpu	avg (ms)	p50 (ms)	p90 (ms)	p99 (ms)
3	3.00	308.09	275.48	385.22	521.45
2	4.00	403.48	394.80	481.24	507.75
3	6.00	538.23	508.33	687.62	736.96
2	8.00	748.31	753.94	873.59	1007.14

yuekaiz and others added 14 commits September 18, 2025 19:07

init step-audio2 token2wav

b207c60

fix cache shallow copy

444b7ff

add streaming dit

482464e

mark stateless token2wav

31a0adc

remove cache router

79116ac

mark multi client

988d395

clean code

f186ec3

clean code

a019a25

add docker compose for streaming tts

7cbd490

fix bug

aceede5

add dit results

807bb6e

fix white space

8811e9f

fix lint

33aee03

fix lint

a224be6

add disaggregated deployment

1fc8435

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Runtime] StepAudio2 Streaming DiT Token2Wav Integration #1598

[Runtime] StepAudio2 Streaming DiT Token2Wav Integration #1598

Uh oh!

yuekaizhang commented Oct 8, 2025

Uh oh!

boji123 commented Oct 13, 2025

Uh oh!

yuekaizhang commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Runtime] StepAudio2 Streaming DiT Token2Wav Integration #1598

Are you sure you want to change the base?

[Runtime] StepAudio2 Streaming DiT Token2Wav Integration #1598

Uh oh!

Conversation

yuekaizhang commented Oct 8, 2025

Offline TTS (Cosyvoice2 0.5B LLM + StepAudio2 DiT Token2Wav)

Streaming TTS (Cosyvoice2 0.5B LLM + StepAudio2 DiT Token2Wav) First Chunk Latency

Uh oh!

boji123 commented Oct 13, 2025

Uh oh!

yuekaizhang commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants