Enable TensorIndexer with all C++ tests #5724

naoyam · 2025-12-20T02:40:12Z

No description provided.

naoyam · 2025-12-20T02:40:20Z

!test

github-actions · 2025-12-20T02:41:01Z

Review updated until commit 0823fe7

Description

Move circular buffer index variable allocation earlier in the logic flow
Add support for Stream parallel type in loop index variable allocation
Enable IdModel (TensorIndexer) by default in matmul scheduler tests
Enable IdModel by default in stream tests and base test fixture

Changes walkthrough

Relevant files

Bug fix

id_model.cpp `Fix circular buffer allocation order and add stream support` csrc/id_model/id_model.cpp Move circular buffer index variable allocation before thread index handling Add ParallelType::Stream support to thread index condition Add error checking for missing circular buffer index variables	+23/-17

Tests

test_matmul_scheduler.cpp `Enable IdModel in matmul scheduler test fixtures` tests/cpp/test_matmul_scheduler.cpp Add SetUp() methods to enable IdModel with "all" option Apply to MatmulSchedulerTest, MatmulSchedulerPluginTest, AllocationDomainTest Apply to HopperPlusMatmulSchedulerTest	+17/-0
test_stream.cpp `Enable IdModel in stream tests` tests/cpp/test_stream.cpp Enable IdModel with "all" option in StreamTest constructor	+1/-0
utils.cpp `Enable IdModel in base test fixture` tests/cpp/utils.cpp Enable IdModel with "all" option in NVFuserTest::SetUp()	+1/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Code Movement Impact The circular buffer handling code has been moved earlier in the allocateLoopIndexVariables() function. While the logic appears preserved, this reordering could potentially change execution flow or introduce subtle bugs. The moved code handles allocation of circular buffer index variables for each stage, and any change in when this code executes relative to other logic could have unintended consequences. if (GpuLower::current()->circularBufferInfo().isCircularBufferedIterDomain( loop_group->front()->as<IterDomain>())) { // Allocate index variable for each stage of the circular // buffered loop. auto indices = std::make_unique<CircularBufferIndices>(); for (auto i : arange(static_cast<int>(CircularBufferLoopStage::EndOfStages))) { indices->emplace( static_cast<CircularBufferLoopStage>(i), IrBuilder::create<Val>(DataType::Index)); } circular_buffered_loop_index_variable_map_[loop_group] = std::move(indices); continue; } New Error Checking New error checking has been added for circular buffer index variables that will throw NVF_ERROR if the circular_buffered_loop_index_variable_map_ doesn't contain the expected loop_group. This is good for catching bugs but could cause test failures if the assumptions about when this function is called don't hold true in all test scenarios. NVF_ERROR( circular_buffered_loop_index_variable_map_.contains(loop_group), "Failed to find circular buffer index var for: ", nvfuser::toString(loop_group), ", ", loop_group->front()->toString()); Parallel Type Handling The condition for using NamedScalar::getParallelIndex has been expanded to include ParallelType::Stream in addition to thread parallel types. This change in indexing logic for stream types could potentially affect performance or correctness in stream-based operations and should be validated. } else if (isParallelTypeThread(ptype) \|\| ptype == ParallelType::Stream) { loop_index = NamedScalar::getParallelIndex(ptype); }

Test failures

(High, 189) CUDA driver version insufficient for runtime on dlcluster_h100 (nvFuser test suites)

Test Name	H100	Source
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/128_1_0_0	❌	Link
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/4096_2_1_1	❌	Link
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_1_0_0	❌	Link
ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_1_1_1	❌	Link
ArgsortTest.ZeroDimensionalInput	❌	Link
BlackwellMatmulTest.EpilogueSiluPersistentBroadcastInputs	❌	Link
BlockSizeAndItemsPerThread/ArgSortComprehensiveTest.ComprehensiveValidation/BlockSize128_ItemsPerThread4	❌	Link
ClusterReductionTest.SimpleFusionAllReduce/cluster_10_dtype_float	❌	Link
ClusterReductionTest.SimpleFusionAllReduce/cluster_6_dtype_double	❌	Link
ClusterReductionTest.SimpleFusionAllReduce/cluster_9_dtype___bfloat	❌	Link
... with 179 more test failures omitted. Check internal logs.

(High, 44) NCCL NVLS multicast memory bind failures across multidevice/nvfuser test suites on dlcluster_viking_ci

Test Name	H100 (dist.)	Source
tests.python.multidevice.test_communication.test_allgather	❌
tests.python.multidevice.test_communication.test_allgather_expanded_broadcast	❌
tests.python.multidevice.test_communication.test_allreduce	❌
tests.python.multidevice.test_communication.test_reduce_scatter	❌
tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous	❌
tests.python.multidevice.test_dtensor.test_column_parallel_linear	❌
tests.python.multidevice.test_dtensor.test_plus_one	❌
tests.python.multidevice.test_dtensor.test_row_parallel_linear	❌
tests.python.multidevice.test_expert_parallel.test_dispatch_and_combine	❌
tests.python.multidevice.test_matmul.test_column_parallel_grouped_mm	❌
... with 34 more test failures omitted. Check internal logs.

(Medium, 1) NCCL invalid usage error in multidevice overlap tests (tests/python/multidevice/test_overlap.py)

Test Name H100 (dist.) Source

tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.cuda] ❌

Loop index fix

…y_default_all

naoyam · 2025-12-20T05:57:09Z

!test

Enable TensorIndexer with all C++ tests

a9011b6

naoyam added 5 commits December 19, 2025 21:30

Enable TensorIndexer

4af15b6

fix

df46dac

Enable TensorIndexer with the stream tests

8e9bd5f

Loop index fix

Merge branch 'tensorindexer_matmul_scheduler' into tensorindexer_on_b…

c199b36

…y_default_all

Merge branch 'tensorindexer_stream' into tensorindexer_on_by_default_all

0823fe7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable TensorIndexer with all C++ tests #5724

Enable TensorIndexer with all C++ tests #5724

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

github-actions bot commented Dec 20, 2025 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable TensorIndexer with all C++ tests #5724

Are you sure you want to change the base?

Enable TensorIndexer with all C++ tests #5724

Uh oh!

Conversation

naoyam commented Dec 20, 2025

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

github-actions bot commented Dec 20, 2025 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 20, 2025 •

edited by xwang233

Loading