Enable TensorIndexer with the stream tests #5726

naoyam · 2025-12-20T05:53:53Z

Also fixes a loop index setting for stream parallel loops

Loop index fix

naoyam · 2025-12-20T05:54:11Z

!test --diff

github-actions · 2025-12-20T05:54:44Z

Description

Fix loop index setting for stream parallel loops by including Stream type in thread index condition
Enable TensorIndexer for stream tests by adding IdModel option with "all" value
Stream parallel types now properly use NamedScalar::getParallelIndex instead of zeroVal
Improves stream parallel loop handling in the fusion framework

Changes walkthrough

Relevant files

Bug fix

id_model.cpp `Fix loop index for stream parallel types` csrc/id_model/id_model.cpp Extend loop index condition to include ParallelType::Stream Stream parallel types now use NamedScalar::getParallelIndex Fixes incorrect zeroVal usage for stream loops	+1/-1

Enhancement

test_stream.cpp `Enable TensorIndexer for stream tests` tests/cpp/test_stream.cpp Enable IdModel option with "all" value in StreamTest constructor Activates TensorIndexer functionality for stream tests	+1/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Stream parallel loop index handling The change correctly extends the condition to include `ParallelType::Stream` alongside thread parallel types for using `NamedScalar::getParallelIndex()`. This fixes the loop index setting for stream parallel loops as mentioned in the PR description. The logic flow appears correct: device dimensions and zero-index cases use zero, while thread and stream parallel types use the parallel index. } else if (isParallelTypeThread(ptype) \|\| ptype == ParallelType::Stream) {

greptile-apps · 2025-12-20T05:57:46Z

Greptile Summary

This PR fixes loop index allocation for stream-parallel loops and enables TensorIndexer in stream tests.

Key Changes:

Fixed allocateLoopIndexVariables() in IdModel to handle ParallelType::Stream alongside thread parallel types, ensuring stream-parallel loops get proper index variables via NamedScalar::getParallelIndex()
Enabled IdModel with "all" options in StreamTest fixture, which activates the TensorIndexer and exercises the fixed code path

The fix is necessary because when TensorIndexer is enabled via EnableOption::IdModel {"all"}, it uses IdModel::allocateLoopIndexVariables() to pre-allocate loop index variables. Without this fix, stream-parallel loops would not get assigned the correct streamIdx variable, causing indexing failures.

Confidence Score: 5/5

This PR is safe to merge with no risk
The change is minimal, targeted, and follows the existing pattern for handling parallel types. The fix adds ParallelType::Stream to a condition alongside thread parallel types, which is consistent with how NamedScalar::getParallelIndex() supports Stream (returns "streamIdx"). The test change simply enables an existing feature flag to exercise the fixed code path. Both changes are well-understood and align with the existing infrastructure.
No files require special attention

Important Files Changed

Filename	Overview
csrc/id_model/id_model.cpp	Added `ParallelType::Stream` handling to loop index allocation logic, ensuring stream-parallel loops get proper index variables
tests/cpp/test_stream.cpp	Enabled TensorIndexer via `EnableOption::IdModel` with "all" argument to test stream parallelization with the IdModel-based indexer

Sequence Diagram

sequenceDiagram
    participant Test as StreamTest
    participant Exec as KernelExecutor
    participant Lower as GpuLower
    participant IdModel as IdModel
    participant NamedScalar as NamedScalar
    
    Note over Test: EnableOption::IdModel {"all"} enabled
    Test->>Exec: compile(&fusion)
    Exec->>Lower: lower2device()
    Lower->>IdModel: allocateLoopIndexVariables()
    
    Note over IdModel: For each loop group
    IdModel->>IdModel: getParallelType(loop_group)
    
    alt ptype == Stream
        IdModel->>NamedScalar: getParallelIndex(ParallelType::Stream)
        NamedScalar-->>IdModel: streamIdx variable
        IdModel->>IdModel: loop_index_variable_map_[loop_group] = streamIdx
    else ptype is Thread type
        IdModel->>NamedScalar: getParallelIndex(ptype)
        NamedScalar-->>IdModel: thread index variable
    end
    
    IdModel-->>Lower: Loop indices allocated
    Lower->>Lower: Build TensorIndexer with IdModel
    Lower-->>Exec: Lowered kernel ready
    Exec->>Test: Kernel compiled
    Test->>Exec: run({in_tensor, kStreamIndex}, {out_tensor})
    Exec-->>Test: Stream-parallel execution complete

Enable TensorIndexer with the stream tests

8e9bd5f

Loop index fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable TensorIndexer with the stream tests #5726

Enable TensorIndexer with the stream tests #5726

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

github-actions bot commented Dec 20, 2025

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable TensorIndexer with the stream tests #5726

Are you sure you want to change the base?

Enable TensorIndexer with the stream tests #5726

Uh oh!

Conversation

naoyam commented Dec 20, 2025

Uh oh!

naoyam commented Dec 20, 2025

Uh oh!

github-actions bot commented Dec 20, 2025

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Dec 20, 2025

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants