Skip to content

Conversation

@danielvegamyhre
Copy link
Contributor

  • Add bench script to benchmark MoE layer computation in isolation, without any distributed/comms aspects.
  • Configurable options:
    • num_experts
    • dim
    • hidden_dim
    • seq_len
    • local_batch_size
  • This is useful for benchmarking the computation portion of the MoE specifically, to iterate on quickly, without having to look at a trace to exclude all2all comms etc.
  • Profiling is included though, so the developer can quickly break down the specific quantization kernels, GEMMs etc

Llama4 17bx16e shapes

CUDA_VISIBLE_DEVICES=6 python benchmarks/prototype/moe_training/bench_moe_layer.py --recipe mxfp8 --local_batch_size=16 --dim=5120 --hidden_dim=8192 --local_num_experts=8
total_M: 131072, N: 8192, K: 5120
bf16 time: 275.270 ms
mxfp8 time: 192.420 ms
speedup: 1.431x

DeepSeekV3 671b shapes

CUDA_VISIBLE_DEVICES=6 python benchmarks/prototype/moe_training/bench_moe_layer.py --recipe mxfp8 --local_batch_size=16 --dim=7168 --hidden_dim=2048 --local_num_experts=8
total_M: 131072, N: 2048, K: 7168
bf16 time: 92.032 ms
mxfp8 time: 80.182 ms
speedup: 1.148x

@danielvegamyhre danielvegamyhre added topic: not user facing Use this tag if you don't want this PR to show up in release notes moe labels Oct 6, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3126

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 1 Pending

As of commit 11ade5c with merge base cd21d0e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025
@danielvegamyhre danielvegamyhre force-pushed the single-moe branch 2 times, most recently from 02d9bde to 787aaee Compare October 7, 2025 16:42
@danielvegamyhre danielvegamyhre changed the title [moe training] bench script for single device moe layer [BE] [moe training] bench script for single device moe layer Oct 11, 2025
import copy
import logging

import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, thanks. copy-pasta-ed a test as a starting point for this

@danielvegamyhre danielvegamyhre merged commit fb1450d into main Oct 13, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. moe topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants