Check consistency of multiplier #5708

zasdfgbnm · 2025-12-18T07:58:20Z

No description provided.

zasdfgbnm · 2025-12-18T07:58:27Z

!test

github-actions · 2025-12-18T07:59:12Z

Description

Added static map to track extent-to-multiplier relationships
Implemented consistency check for multiplier calculations
Throws error if same extent has different multipliers
Prevents inconsistent sharding behavior across device mesh

Changes walkthrough

Relevant files

Bug fix

execution_utils.cpp `Add multiplier consistency validation` csrc/multidevice/execution_utils.cpp Added static unordered_map to track extent->multiplier relationships Added consistency validation after multiplier calculation Throws NVF_ERROR if multiplier inconsistency detected for same extent Ensures deterministic sharding behavior across multi-device execution	+19/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Thread Safety

The static unordered_map extent_to_multiplier_map is not thread-safe. If unshardedSizes() can be called concurrently from multiple threads, this could lead to race conditions during map access and modification. Consider adding mutex protection or using thread-safe alternatives.

static std::unordered_map<Val*, int64_t> extent_to_multiplier_map;

Memory Management

The map stores Val* pointers without any cleanup mechanism. If Val objects are destroyed while the static map still references them, this could lead to dangling pointer dereferences. Consider using smart pointers or ensuring proper lifetime management.

static std::unordered_map<Val*, int64_t> extent_to_multiplier_map;

auto multiplier = [&]() -> int64_t {
  if (parallel_type == ParallelType::Stream) {
    // TODO(#5525): hack for MultiDeviceExecutor.  MultiDeviceExecutor looks
    // for ParallelType::Stream only in logical domains and assumes a
    // stream-parallelized dimension is always fully allocated.  So we set
    // the multiplier to 1 when `sharded_id` is a logical IterDomain. This
    // will have to change when FusionExecutorCache requires a logical
    // dimension to be stream-parallelized, both loop and allocation. Refer
    // to
    // https://github.com/NVIDIA/Fuser/blob/f8e84e52296cdecd318dd2ce904139616d7bd434/tests/cpp/test_overlap.cpp#L155
    // for an example. An alternative to consider is to create a new
    // ParallelType for stream parallelization and use it in
    // FusionExecutorCache.
    if (std::find(
            tv->getLogicalDomain().begin(),
            tv->getLogicalDomain().end(),
            sharded_id) != tv->getLogicalDomain().end()) {
      return 1;
    }

    NVF_ERROR(
        sharded_id->extent()->isConstInt(),
        "DIDs/Stream extent is expected to be constant: ",
        sharded_id);
    return sharded_id->extent()->evaluate().as<int64_t>();
  }

  if (isParallelTypeDeviceDim(parallel_type)) {
    return tv->getDeviceMesh().size(parallel_type);
  }

  NVF_THROW("Unexpected parallel type: ", parallel_type);
}();

// Check consistency: for the same extent, we should always get the same multiplier
Val* extent = sharded_id->extent();
auto it = extent_to_multiplier_map.find(extent);
if (it != extent_to_multiplier_map.end()) {
  NVF_ERROR(
      it->second == multiplier,
      "Inconsistent multiplier for extent ",
      extent->toString(),
      ": expected ",
      it->second,
      " but got ",
      multiplier);
} else {
  extent_to_multiplier_map[extent] = multiplier;

Static Lifetime Concerns

The static map will persist for the entire program lifetime, which may not be appropriate if Val objects have different lifetimes. This could lead to memory bloat or undefined behavior if Val objects are destroyed.

static std::unordered_map<Val*, int64_t> extent_to_multiplier_map;

Test failures

(High, 106) CUDA driver too old for runtime on dlcluster_h100 (nvFuser matmul/MMA test suites)

Test Name	H100	Source
.tests.python.multidevice.test_communication	❌
.tests.python.multidevice.test_deepseek_v3	❌
.tests.python.multidevice.test_dtensor	❌
.tests.python.multidevice.test_matmul	❌
.tests.python.multidevice.test_multidevice	❌
.tests.python.multidevice.test_overlap	❌
.tests.python.multidevice.test_transformer	❌
.tests.python.opinfo.test_direct_ops	❌
.tests.python.test_alphafold3	❌
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/1024_1_1_1	❌	Link
... with 96 more test failures omitted. Check internal logs.

(High, 7) CUDA driver too old on dlcluster_h100 – initialization fails in multiple test suites

Test Name	H100	Source
.tests.python.multidevice.test_transformer_engine	❌
.tests.python.opinfo.test_legacy_ops	❌
.tests.python.test_normalization	❌
.tests.python.test_python_frontend	❌
.tests.python.test_schedule_ops	❌
tests.python.multidevice.test_expert_parallel.test_dispatch_and_combine	❌
tests.python.test_moe.test_llama4_moe_thunderfx	❌

(Medium, 6) nvFuser multi-device assert failure (inconsistent extent multiplier) in test_overlap_allgather_matmul_shard_outermost

Test Name	A100 (dist.)	GB200 (dist.)	H100 (dist.)	Source
tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.cuda]	❌	❌	❌
tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.nccl]	❌	❌	❌

zasdfgbnm · 2025-12-18T08:13:49Z

!test

Check consistency of multiplier

838e153

zasdfgbnm mentioned this pull request Dec 18, 2025

Hack in unshardedSizes can cause inconsistent values for the same extent Val* #5710

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check consistency of multiplier #5708

Check consistency of multiplier #5708

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Check consistency of multiplier #5708

Are you sure you want to change the base?

Check consistency of multiplier #5708

Uh oh!

Conversation

zasdfgbnm commented Dec 18, 2025

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

github-actions bot commented Dec 18, 2025 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

zasdfgbnm commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 18, 2025 •

edited by xwang233

Loading