Skip to content

Conversation

Copy link

Copilot AI commented Sep 30, 2025

This PR adds comprehensive batch invariance tests for multi-GPU operations, specifically for fused Mixture of Experts (MoE) layers with and without attention mechanisms.

Overview

Batch invariance is crucial for distributed computing correctness - operations should produce identical results whether processing a full batch or split batches. These tests verify this property for fused MoE operations in multi-GPU setups.

Changes

Created new test infrastructure under tests/v1/generation/batch_invariance/:

Test Coverage

  1. _test_fused_moe() - Base test for fused MoE batch invariance

    • Processes tokens through fused MoE layers
    • Splits batch in half and processes separately
    • Verifies concatenated split results match full batch results
  2. _test_fused_moe_with_attention() - Extended test with multi-head attention

    • Implements complete transformer-like pipeline:
      • Q, K, V projections
      • Multi-head self-attention computation
      • Attention output projection with residual connection
      • Fused MoE processing with expert routing
      • Final residual connection
    • Tests batch invariance across the combined attention + MoE operations
    • Ensures split/concat consistency for the full pipeline

Both tests include public wrappers (test_fused_moe_multi_gpu and test_fused_moe_with_attention_multi_gpu) that use the @multi_gpu_test(num_gpus=2) decorator for proper distributed testing.

Implementation Details

  • Uses current_platform.seed_everything(42) for reproducibility
  • Configures VllmConfig with reasonable defaults (max_num_seqs=128, max_model_len=8192)
  • Employs torch.testing.assert_close with rtol=1e-3, atol=1e-3 for numerical comparison
  • Follows repository conventions: SPDX headers, import style, naming patterns
  • Comprehensive docstrings explaining test purpose and parameters

Testing

Tests are parameterized with different batch sizes and configurations:

  • Batch sizes: 32, 64 tokens
  • Hidden dimension: 512
  • Number of experts: 8
  • Top-k routing: 2 experts per token
  • Attention heads: 8

Note: These tests require 2 CUDA GPUs to run.

pytest tests/v1/generation/batch_invariance/test_multi_gpu_ops.py
Original prompt

add another test to tests/v1/generation/batch_invariance/test_multi_gpu_ops.py that includes attention. copy the _test_fused_moe one


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copilot AI changed the title [WIP] add another test to tests/v1/generation/batch_invariance/test_multi_gpu_ops.py that includes attention. copy the _test_fused_moe one Add batch invariance tests for multi-GPU fused MoE operations with attention Sep 30, 2025
Copilot AI requested a review from bwasti September 30, 2025 20:23
Copilot finished work on behalf of bwasti September 30, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants