[Model] Add MoE support for NemotronH #25863

tomeras91 · 2025-09-29T08:15:33Z

Purpose

Add support for an MoE module in the NemotronH architecture.
This MoE module is relatively unique (to the best of my knowledge, comparable only to nomic-ai/nomic-embed-text-v2-moe), as it uses a non-gated Squared ReLU activation function.

In this PR:

Add an NemotronHMoE module to the NemotronH modeling file
Add the option to use non-gated MoE from the FusedMoE class (in addition to by calling the fused_moe function directly)
Add support for the Squared ReLU activation function in the MoE triton path
Add support for Squared ReLU non-gated FP8 MoE in ModelOptFp8MoEMethod quant_method, currently only in the triton path

…edReLu activation - adapt the FusedMoE object to support is_act_and_mul=False Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

…s an attribute in FusedMoE Signed-off-by: Tomer Asida <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for a non-gated Squared ReLU MoE module in the NemotronH architecture, which is a valuable enhancement. The changes are mostly well-implemented across the fused MoE layers and model definition. However, I've identified a critical bug in the forward pass of the new NemotronHMoE module related to incorrect floating-point computation and a potential UnboundLocalError. I've provided a detailed comment with a suggested fix for this issue. Addressing this is crucial for the correctness of the model's output.

vllm/model_executor/models/nemotron_h.py

tomeras91 · 2025-09-29T08:27:35Z

vllm/model_executor/models/nemotron_h.py

To the reviewer(s)

NemotronHForCausalLM now optionally has an MoE block. I was wondering if it should implement the MixtureOfExperts interface or not. Do you have any guidance?

We might need to something similar to this PR #25311 (comment), where is_mixture_of_experts depends on an attribute of the model. I don't know all the cases where this is used though

Signed-off-by: Tomer Asida <[email protected]>

…xperts Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Tomer Asida <[email protected]>

mergify · 2025-10-14T04:32:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tomeras91.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Tomer Asida <[email protected]>

tlrmchlsmth · 2025-10-15T20:59:23Z

vllm/model_executor/layers/fused_moe/layer.py

+        if not self.moe_config.is_act_and_mul:
+            # Avoid circular import
+            from vllm.model_executor.layers.quantization.modelopt import (
+                ModelOptFp8MoEMethod,
+            )
+
+            if not isinstance(
+                quant_method, (UnquantizedFusedMoEMethod, ModelOptFp8MoEMethod)
+            ):
+                raise NotImplementedError(
+                    "is_act_and_mul=False is supported only for unquantized "
+                    "and ModelOpt FP8 moe for now"
+                )
+            if not current_platform.is_cuda():
+                raise NotImplementedError(
+                    "is_act_and_mul=False is supported only for CUDA for now"
+                )


What are the blockers for supporting is_act_and_mul = False more generally?

Creating the relevant kernels :) We plan to follow up with that

tlrmchlsmth · 2025-10-15T21:02:46Z

vllm/model_executor/layers/quantization/modelopt.py

+        if (
+            envs.VLLM_USE_FLASHINFER_MOE_FP8
+            and has_flashinfer_moe()
+            and self.moe.is_act_and_mul
+        ):


For NemotronH, self.flashinfer_moe_backend will end up being None. What implementation ends up getting used in this case?

triton kernels. This is currently the only code path available with is_act_and_mul=False

I suspect this is going to be very complicated to add to all the quant and kernel backends

tlrmchlsmth · 2025-10-15T21:06:00Z

vllm/model_executor/models/nemotron_h.py

We might need to something similar to this PR #25311 (comment), where is_mixture_of_experts depends on an attribute of the model. I don't know all the cases where this is used though

tlrmchlsmth · 2025-10-15T21:10:31Z

vllm/model_executor/models/nemotron_h.py

+                num_redundant_experts=self.n_redundant_experts,
+            )
+
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:


For DP+TP cases, we should use the sequence parallel trick like in #24982 to avoid duplicate work in the expert layers

tomeras91 added 5 commits September 18, 2025 15:47

Add option for MoE layers in NemotronH, with non-gated MoE with squar…

14b2105

…edReLu activation - adapt the FusedMoE object to support is_act_and_mul=False Signed-off-by: Tomer Asida <[email protected]>

Add support for non-gated moe in triton path for ModelOptFp8MoEMethod

e5ad365

Signed-off-by: Tomer Asida <[email protected]>

Merge branch 'main' into add-nemotronH-moe

1142be2

(1) fix weight_scale shape (2) avoid circular import

6b77e40

Signed-off-by: Tomer Asida <[email protected]>

Add is_act_and_mul to FusedMoEConfig instead of keeping it directly a…

7cb22e8

…s an attribute in FusedMoE Signed-off-by: Tomer Asida <[email protected]>

tomeras91 requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 29, 2025 08:15

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

vllm/model_executor/models/nemotron_h.py Outdated Show resolved Hide resolved

tomeras91 commented Sep 29, 2025

View reviewed changes

tomeras91 added 4 commits October 8, 2025 19:02

Merge branch 'main' into add-nemotronH-moe

76a12cf

Signed-off-by: Tomer Asida <[email protected]>

router logits and bias in FP32

d9258af

Signed-off-by: Tomer Asida <[email protected]>

use SharedFusedMoE to overlap shared expert computation with routed e…

404c4a4

…xperts Signed-off-by: Tomer Asida <[email protected]>

fix ruff according to CI

7fff9a8

Signed-off-by: Tomer Asida <[email protected]>

tomeras91 force-pushed the add-nemotronH-moe branch from bf2285e to 7fff9a8 Compare October 8, 2025 18:14

mergify bot added the needs-rebase label Oct 14, 2025

Merge branch 'main' into add-nemotronH-moe

f93c300

Signed-off-by: Tomer Asida <[email protected]>

mergify bot removed the needs-rebase label Oct 15, 2025

fix import

0090a48

Signed-off-by: Tomer Asida <[email protected]>

tlrmchlsmth reviewed Oct 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Add MoE support for NemotronH #25863

[Model] Add MoE support for NemotronH #25863

tomeras91 commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

tomeras91 Sep 29, 2025

Uh oh!

tlrmchlsmth Oct 15, 2025

Uh oh!

mergify bot commented Oct 14, 2025

Uh oh!

tlrmchlsmth Oct 15, 2025

Uh oh!

tomeras91 Oct 15, 2025

Uh oh!

tlrmchlsmth Oct 15, 2025

Uh oh!

tomeras91 Oct 15, 2025

Uh oh!

mgoin Oct 15, 2025

Uh oh!

tlrmchlsmth Oct 15, 2025

Uh oh!

tlrmchlsmth Oct 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Model] Add MoE support for NemotronH #25863

Are you sure you want to change the base?

[Model] Add MoE support for NemotronH #25863

Conversation

tomeras91 commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomeras91 commented Sep 29, 2025 •

edited by github-actions bot

Loading

tlrmchlsmth Oct 15, 2025 •

edited

Loading