Skip to content

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Dec 20, 2025

Also fixes a loop index setting for stream parallel loops

@naoyam
Copy link
Collaborator Author

naoyam commented Dec 20, 2025

!test --diff

@github-actions
Copy link

Description

  • Fix loop index setting for stream parallel loops by including Stream type in thread index condition

  • Enable TensorIndexer for stream tests by adding IdModel option with "all" value

  • Stream parallel types now properly use NamedScalar::getParallelIndex instead of zeroVal

  • Improves stream parallel loop handling in the fusion framework

Changes walkthrough

Relevant files
Bug fix
id_model.cpp
Fix loop index for stream parallel types                                 

csrc/id_model/id_model.cpp

  • Extend loop index condition to include ParallelType::Stream
  • Stream parallel types now use NamedScalar::getParallelIndex
  • Fixes incorrect zeroVal usage for stream loops
  • +1/-1     
    Enhancement
    test_stream.cpp
    Enable TensorIndexer for stream tests                                       

    tests/cpp/test_stream.cpp

  • Enable IdModel option with "all" value in StreamTest constructor
  • Activates TensorIndexer functionality for stream tests
  • +1/-0     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Stream parallel loop index handling

    The change correctly extends the condition to include ParallelType::Stream alongside thread parallel types for using NamedScalar::getParallelIndex(). This fixes the loop index setting for stream parallel loops as mentioned in the PR description. The logic flow appears correct: device dimensions and zero-index cases use zero, while thread and stream parallel types use the parallel index.

    } else if (isParallelTypeThread(ptype) || ptype == ParallelType::Stream) {

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Dec 20, 2025

    Greptile Summary

    This PR fixes loop index allocation for stream-parallel loops and enables TensorIndexer in stream tests.

    Key Changes:

    • Fixed allocateLoopIndexVariables() in IdModel to handle ParallelType::Stream alongside thread parallel types, ensuring stream-parallel loops get proper index variables via NamedScalar::getParallelIndex()
    • Enabled IdModel with "all" options in StreamTest fixture, which activates the TensorIndexer and exercises the fixed code path

    The fix is necessary because when TensorIndexer is enabled via EnableOption::IdModel {"all"}, it uses IdModel::allocateLoopIndexVariables() to pre-allocate loop index variables. Without this fix, stream-parallel loops would not get assigned the correct streamIdx variable, causing indexing failures.

    Confidence Score: 5/5

    • This PR is safe to merge with no risk
    • The change is minimal, targeted, and follows the existing pattern for handling parallel types. The fix adds ParallelType::Stream to a condition alongside thread parallel types, which is consistent with how NamedScalar::getParallelIndex() supports Stream (returns "streamIdx"). The test change simply enables an existing feature flag to exercise the fixed code path. Both changes are well-understood and align with the existing infrastructure.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/id_model/id_model.cpp Added ParallelType::Stream handling to loop index allocation logic, ensuring stream-parallel loops get proper index variables
    tests/cpp/test_stream.cpp Enabled TensorIndexer via EnableOption::IdModel with "all" argument to test stream parallelization with the IdModel-based indexer

    Sequence Diagram

    sequenceDiagram
        participant Test as StreamTest
        participant Exec as KernelExecutor
        participant Lower as GpuLower
        participant IdModel as IdModel
        participant NamedScalar as NamedScalar
        
        Note over Test: EnableOption::IdModel {"all"} enabled
        Test->>Exec: compile(&fusion)
        Exec->>Lower: lower2device()
        Lower->>IdModel: allocateLoopIndexVariables()
        
        Note over IdModel: For each loop group
        IdModel->>IdModel: getParallelType(loop_group)
        
        alt ptype == Stream
            IdModel->>NamedScalar: getParallelIndex(ParallelType::Stream)
            NamedScalar-->>IdModel: streamIdx variable
            IdModel->>IdModel: loop_index_variable_map_[loop_group] = streamIdx
        else ptype is Thread type
            IdModel->>NamedScalar: getParallelIndex(ptype)
            NamedScalar-->>IdModel: thread index variable
        end
        
        IdModel-->>Lower: Loop indices allocated
        Lower->>Lower: Build TensorIndexer with IdModel
        Lower-->>Exec: Lowered kernel ready
        Exec->>Test: Kernel compiled
        Test->>Exec: run({in_tensor, kStreamIndex}, {out_tensor})
        Exec-->>Test: Stream-parallel execution complete
    
    Loading

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants