Add batch invariance tests for multi-GPU fused MoE operations with attention #1

Copilot · 2025-09-30T20:09:30Z

This PR adds comprehensive batch invariance tests for multi-GPU operations, specifically for fused Mixture of Experts (MoE) layers with and without attention mechanisms.

Overview

Batch invariance is crucial for distributed computing correctness - operations should produce identical results whether processing a full batch or split batches. These tests verify this property for fused MoE operations in multi-GPU setups.

Changes

Created new test infrastructure under tests/v1/generation/batch_invariance/:

Test Coverage

_test_fused_moe() - Base test for fused MoE batch invariance
- Processes tokens through fused MoE layers
- Splits batch in half and processes separately
- Verifies concatenated split results match full batch results
_test_fused_moe_with_attention() - Extended test with multi-head attention
- Implements complete transformer-like pipeline:
  - Q, K, V projections
  - Multi-head self-attention computation
  - Attention output projection with residual connection
  - Fused MoE processing with expert routing
  - Final residual connection
- Tests batch invariance across the combined attention + MoE operations
- Ensures split/concat consistency for the full pipeline

Both tests include public wrappers (test_fused_moe_multi_gpu and test_fused_moe_with_attention_multi_gpu) that use the @multi_gpu_test(num_gpus=2) decorator for proper distributed testing.

Implementation Details

Uses current_platform.seed_everything(42) for reproducibility
Configures VllmConfig with reasonable defaults (max_num_seqs=128, max_model_len=8192)
Employs torch.testing.assert_close with rtol=1e-3, atol=1e-3 for numerical comparison
Follows repository conventions: SPDX headers, import style, naming patterns
Comprehensive docstrings explaining test purpose and parameters

Testing

Tests are parameterized with different batch sizes and configurations:

Batch sizes: 32, 64 tokens
Hidden dimension: 512
Number of experts: 8
Top-k routing: 2 experts per token
Attention heads: 8

Note: These tests require 2 CUDA GPUs to run.

pytest tests/v1/generation/batch_invariance/test_multi_gpu_ops.py

Original prompt

add another test to tests/v1/generation/batch_invariance/test_multi_gpu_ops.py that includes attention. copy the _test_fused_moe one

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

github-actions · 2025-09-30T20:10:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Co-authored-by: bwasti <[email protected]>

Initial plan

9632898

Copilot AI assigned Copilot and bwasti Sep 30, 2025

Copilot started work on behalf of bwasti September 30, 2025 20:09 View session

Copilot AI and others added 2 commits September 30, 2025 20:17

Create batch_invariance test directory and multi-GPU ops tests

35f5ce3

Co-authored-by: bwasti <[email protected]>

Remove unused imports from test_multi_gpu_ops.py

069283d

Co-authored-by: bwasti <[email protected]>

Copilot AI changed the title ~~[WIP] add another test to tests/v1/generation/batch_invariance/test_multi_gpu_ops.py that includes attention. copy the _test_fused_moe one~~ Add batch invariance tests for multi-GPU fused MoE operations with attention Sep 30, 2025

Copilot AI requested a review from bwasti September 30, 2025 20:23

Copilot finished work on behalf of bwasti September 30, 2025 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add batch invariance tests for multi-GPU fused MoE operations with attention #1

Add batch invariance tests for multi-GPU fused MoE operations with attention #1

Uh oh!

Copilot AI commented Sep 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add batch invariance tests for multi-GPU fused MoE operations with attention #1

Are you sure you want to change the base?

Add batch invariance tests for multi-GPU fused MoE operations with attention #1

Uh oh!

Conversation

Copilot AI commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Test Coverage

Implementation Details

Testing

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Sep 30, 2025 •

edited

Loading