[Feat] shared expert dp for deepseek_mtp #3811

dragondream-chen · 2025-10-28T02:56:46Z

What this PR does / why we need it?

Support shared expert DP for deepseek_mtp feature.
shared_expert_dp requires SP==True, with corresponding parameter restrictions.
Previously, due to the coupling between shared_expert_dp and torchair, and the removal of deepseek_mtp in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed.
Currently, by performing the reduce_scatter on the input of deepssek_mtp in mtp_proposer.py, we ensure that it matches the dimensions of input_embedding, and then perform the all_gather on the output of mtp.

How was this patch tested?

baseline:

enable shared_expert_dp and multistream_overlap_shared_expert:

TPOT: 48ms -> 45.4ms
Average TPS per rank: 117.6 -> 126.1

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-10-28T02:56:55Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for shared expert data parallelism for DeepSeek models, a valuable enhancement. The changes span configuration, attention mechanisms, model layers, and custom operators to accommodate sequence parallelism. The strategy to decouple this feature from the torchair framework is logical and consistently implemented. My review has uncovered a critical bug in vllm_ascend/ascend_config.py where a crucial dependency check is flawed, which could result in runtime issues. I have included a specific comment with a recommended solution for this issue. The rest of the changes seem to be well-aligned with the PR's objectives.

gemini-code-assist · 2025-10-28T02:58:45Z

vllm_ascend/ascend_config.py

+            if not enable_sp:
+                self.enable_shared_expert_dp = False
+                logger.info(
+                    f"enable_shared_expert_dp is {self.enable_shared_expert_dp}"
+                    "Enable enable_shared_expert_dp must enable_sp"
+                )


The condition if not enable_sp: on line 73 is incorrect. enable_sp is a function object, which always evaluates to True in a boolean context. Therefore, not enable_sp is always False, and the code block that disables enable_shared_expert_dp is never executed. This means the check to ensure enable_sp is active when using shared expert DP is bypassed, which can lead to runtime errors or incorrect behavior. The function should be called: if not enable_sp():.

Additionally, the log message on lines 75-78 is malformed due to a missing space between concatenated strings and should use logger.warning to alert the user that their configuration is being overridden.

Suggested change

if not enable_sp:

self.enable_shared_expert_dp = False

logger.info(

f"enable_shared_expert_dp is {self.enable_shared_expert_dp}"

"Enable enable_shared_expert_dp must enable_sp"

)

if not enable_sp():

self.enable_shared_expert_dp = False

logger.warning(

"Disabling 'enable_shared_expert_dp' because 'enable_sp' is not active. "

"'enable_sp' is required for shared expert data parallelism."

)

whx-sjtu · 2025-10-28T08:43:32Z

vllm_ascend/attention/mla_v1.py

        if (self.enable_mlapo and
            (attn_metadata is None or not forward_context.with_prefill)):
+            hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
+                hidden_states.contiguous(), need_gather_q_kv)


Why we have to all gather hidden_states?

all_gather is performed before Attention because the fusion operator MLAPO does not include the all_gather operation, so it needs to be done in advance. And it has little impact on MLA.

dragondream-chen · 2025-10-28T08:45:13Z

vllm_ascend/ascend_config.py

+            from vllm_ascend.utils import enable_sp
+            if not enable_sp(vllm_config):
+                self.enable_shared_expert_dp = False
+                logger.info(


单算子下测试 P节点性能功能

Currently, the scenario where only the shared expert DP is enabled under node P without enabling SP is not considered.

realliujiaxu · 2025-10-28T09:05:40Z

vllm_ascend/utils.py

 # TODO remove it after vllm has this func
 def shared_expert_dp_enabled() -> bool:
-    return get_ascend_config().enable_shared_expert_dp or enable_sp()
+    return get_ascend_config().enable_shared_expert_dp and enable_sp()


delete shared_expert_dp_enabled maybe better, as there is currently no scenario enable_sp is True and enable_shared_expert_dp is False.

Currently, due to limitations in the communication process, enabling SP requires enabling the shared expert DP. An assert restriction will be added to the configuration parameters.

dragondream-chen · 2025-10-28T09:09:02Z

vllm_ascend/patch/worker/patch_deepseek_mtp.py

+    inputs_embeds[positions == 0] = 0
+    inputs_embeds = self.enorm(inputs_embeds)
+    previous_hidden_states = self.hnorm(previous_hidden_states)
+


可以参考主模型的sp切分

The modifications to deepseek_mtp have been made using a non-patch method when enabling shared expert DP.

github-actions · 2025-10-30T08:55:18Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

whx-sjtu

LGTM

github-actions · 2025-11-17T14:59:13Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

linfeng-yuan

LGTM. And there are two questions:

enable_shared_expert_dp and flash_comm should significantly reduce the TTFT. Could you please update your experimental results with 1P1D or PD-mix deployments with these features?
Please evaluate whether we can arrange these features properly? (enable_shared_expert_dp, flash_comm, and multistream_overlap_shared_expert)

github-actions · 2025-11-24T09:10:14Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zengzengran · 2025-11-27T01:13:35Z

Resolve conflicts and verification outcome：
1.Single-node hybrid deployment: "multistream_overlap_shared_expert":false, "enable_shared_expert_dp": false

2.Single-node hybrid deployment: "multistream_overlap_shared_expert":true, "enable_shared_expert_dp": true

whx-sjtu · 2025-11-29T07:03:36Z

vllm_ascend/utils.py


+        if not _ENABLE_SP and enable_shared_expert_dp:
+            _ENABLE_SP = True
+            logger.info(


logger.warning

github-actions · 2025-12-01T02:25:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: chenmenglong <[email protected]>

Signed-off-by: zengran <[email protected]>

Ronald1995 · 2025-11-06T08:06:03Z

vllm_ascend/ascend_config.py

+                self.enable_shared_expert_dp = False
+                logger.info(
+                    f"enable_shared_expert_dp is {self.enable_shared_expert_dp}"
+                    "Enable enable_shared_expert_dp must enable sp")


don't use f string in logger, please use % string， fstring will cause performance problem

Ronald1995 · 2025-11-06T08:08:30Z

vllm_ascend/ops/register_custom_ops.py

+    if x.size(0) != residual.size(0):
+        sp_enabled = forward_context.sp_enabled
+        assert sp_enabled is True, ("Currently, this situation only occurs "
+                                    "when sp is enabled")


assert sp_enabled, ("Currently, this situation only occurs " "when sp is enabled") will be good

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]>

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]> Signed-off-by: Che Ruan <[email protected]>

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]>

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]> Signed-off-by: tanqingshan (A) <[email protected]>

### What this PR does / why we need it? Support shared expert DP for deepseek_mtp feature. `shared_expert_dp` requires `SP==True`, with corresponding parameter restrictions. Previously, due to the coupling between `shared_expert_dp` and torchair, and the removal of `deepseek_mtp` in vllm_ascend, shared expert dp of deepseek_mtp was temporarily removed. Currently, by performing the `reduce_scatter` on the input of deepssek_mtp in `mtp_proposer.py`, we ensure that it matches the dimensions of `input_embedding`, and then perform the `all_gather` on the output of mtp. ### How was this patch tested? baseline: <img width="1184" height="692" alt="image" src="https://github.com/user-attachments/assets/9680d53a-7b1d-481a-accc-b8f3dae2b9e3" /> enable shared_expert_dp and multistream_overlap_shared_expert: <img width="1167" height="687" alt="image" src="https://github.com/user-attachments/assets/2531d06b-dfda-4e24-8628-6f4b0f677ddc" /> TPOT: 48ms -> 45.4ms Average TPS per rank: 117.6 -> 126.1 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: chenmenglong <[email protected]> Signed-off-by: zengran <[email protected]> Co-authored-by: zengran <[email protected]>

github-actions bot added module:ops module:core labels Oct 28, 2025

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

dragondream-chen force-pushed the main branch 2 times, most recently from 9526aeb to 02f4ddf Compare October 28, 2025 06:24

whx-sjtu reviewed Oct 28, 2025

View reviewed changes

dragondream-chen commented Oct 28, 2025

View reviewed changes

realliujiaxu reviewed Oct 28, 2025

View reviewed changes

dragondream-chen commented Oct 28, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Oct 30, 2025

dragondream-chen force-pushed the main branch from ffbd40c to 6a51f45 Compare November 5, 2025 07:43

github-actions bot added module:tests and removed merge-conflicts labels Nov 5, 2025

whx-sjtu added ready read for review ready-for-test start test by label for PR labels Nov 10, 2025

dragondream-chen changed the title ~~[Feat] shared expert dp for deepseek and deepseek_mtp~~ [Feat] shared expert dp for deepseek_mtp Nov 11, 2025

weijinqian0 approved these changes Nov 11, 2025

View reviewed changes

whx-sjtu approved these changes Nov 11, 2025

View reviewed changes

dragondream-chen force-pushed the main branch from a654ff3 to b606b0a Compare November 17, 2025 12:19

github-actions bot added the merge-conflicts label Nov 17, 2025

dragondream-chen force-pushed the main branch 2 times, most recently from 14049e7 to e8336d0 Compare November 18, 2025 01:54

github-actions bot removed the merge-conflicts label Nov 18, 2025

dragondream-chen force-pushed the main branch 2 times, most recently from 0159a1c to 4d862d5 Compare November 18, 2025 07:26

wangxiyuan approved these changes Nov 19, 2025

View reviewed changes

linfeng-yuan reviewed Nov 19, 2025

View reviewed changes

linfeng-yuan approved these changes Nov 24, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Nov 24, 2025

zengzengran force-pushed the main branch from 12e2795 to cb59a6e Compare November 26, 2025 12:53

github-actions bot removed the merge-conflicts label Nov 26, 2025

zengzengran force-pushed the main branch 2 times, most recently from 3bab4ac to 7cc6c39 Compare November 27, 2025 01:10

zengzengran force-pushed the main branch 3 times, most recently from cd5522f to 063e683 Compare November 29, 2025 06:44

whx-sjtu reviewed Nov 29, 2025

View reviewed changes

vllm_ascend/utils.py

if not _ENABLE_SP and enable_shared_expert_dp:

_ENABLE_SP = True

logger.info(

Copy link

Collaborator

whx-sjtu Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.warning

whx-sjtu added the ready read for review label Nov 29, 2025

zengzengran force-pushed the main branch from 063e683 to 7eaf94a Compare November 29, 2025 11:21

github-actions bot added the merge-conflicts label Dec 1, 2025

dragondream-chen and others added 2 commits December 1, 2025 10:38

[Feat] shared expert dp for deepseek_mtp and e2e

b633b10

Signed-off-by: chenmenglong <[email protected]>

[Feat] Repair shared expert dp

6319665

Signed-off-by: zengran <[email protected]>

zengzengran force-pushed the main branch from 7eaf94a to 6319665 Compare December 1, 2025 02:44

github-actions bot removed the merge-conflicts label Dec 1, 2025

Ronald1995 reviewed Dec 1, 2025

View reviewed changes

yiz-liu merged commit 143e1f4 into vllm-project:main Dec 1, 2025
22 checks passed

linfeng-yuan mentioned this pull request Dec 9, 2025

Drop torchair #4814

Merged

[Feat] shared expert dp for deepseek_mtp #3811

[Feat] shared expert dp for deepseek_mtp #3811

Uh oh!

Conversation

dragondream-chen commented Oct 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

linfeng-yuan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

zengzengran commented Nov 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

dragondream-chen commented Oct 28, 2025 •

edited by github-actions bot

Loading

linfeng-yuan left a comment •

edited

Loading