Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

No description provided.

@github-actions
Copy link

Description

  • Added extensive debug logging throughout nvFuser execution pipeline

  • Enhanced LinearOp::evaluate with detailed tensor metadata output

  • Added FusionExecutorCache and FusionKernelRuntime debug traces

  • Created reproduction tests for meta tensor linear operation hangs

Changes walkthrough

Relevant files
Debugging
7 files
nodes.cpp
Add extensive debug logging to LinearOp::evaluate               
+184/-0 
fusion_executor_cache.cpp
Add debug output to FusionExecutorCache methods                   
+65/-0   
fusion_kernel_runtime.cpp
Add step-by-step debug logging to FusionKernelRuntime       
+203/-0 
fusion_cache.cpp
Add debug traces to FusionCache serialization/deserialization
+69/-1   
fusion_definition.cpp
Add debug logging to FusionDefinition constructor               
+14/-2   
fusion_state.cpp
Add debug output to FusionState operations                             
+8/-1     
utils.py
Comment out test assertions for debugging                               
+12/-12 
Tests
4 files
test_tutorial.cpp
Add tests for linear operations and meta tensor hangs       
+336/-3 
test_meta_linear_hang.py
Create standalone test for at::linear meta tensor hang     
+105/-0 
test_python_frontend.py
Add repro test loop and modify test assertions                     
+62/-1   
test_repro_standalone.py
Create standalone reproduction script for linear operations
+42/-0   
Miscellaneous
1 files
README.md
Replace README content with test method list                         
+44/-75 

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review
Performance Impact

The extensive std::cout debug output throughout the constructor and getMaybeHeuristicsFor method will severely impact performance. This debug code should be removed or made conditional with a debug flag before merging.

std::cout << "[DEBUG] FusionKernelRuntime::ctor - ENTRY (fusion_id=" << fusion_id << ", concrete_id=" << concrete_id << ", runtime_id=" << runtime_id << ")" << std::endl;
std::cout.flush();

FUSER_PERF_SCOPE("FusionKernelRuntime::FusionKernelRuntime");

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP A: Checking hasDynamicTransform" << std::endl;
std::cout.flush();
NVF_ERROR(
    !fusion->hasDynamicTransform(),
    "Fusion must be concretized before constructing FusionKernelRuntime");
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP A: Dynamic transform check passed" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP B: Running PreSegmenter optimization pass" << std::endl;
std::cout.flush();
preseg_passes::OptimizationPass<preseg_passes::PreSegmenter>::runPass(
    fusion.get());
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP B: PreSegmenter pass completed" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP C: Checking debug dump enabled" << std::endl;
std::cout.flush();
if (isDebugDumpEnabled(DebugDumpOption::FusionIrPreseg)) {
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP C1: Debug dump is enabled, getting communicator" << std::endl;
  std::cout.flush();
  const auto& communicator = Communicator::getInstance();
  // Only the first local rank will print. Pre-segmenter fusion IR is device
  // agnostic, so letting all ranks print isn't any more useful.
  if (!communicator.is_available() || communicator.local_rank() == 0) {
    std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP C2: Printing fusion IR" << std::endl;
    std::cout.flush();
    debug() << "Fusion IR after pre-segmenter optimization passes:"
            << std::endl;
    fusion->print();
    std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP C3: Fusion IR printed" << std::endl;
    std::cout.flush();
  }
}
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP C: Debug dump check completed" << std::endl;
std::cout.flush();

// SchedulerRuntimeInfo modifies the fusion, so it is required for both
// compile paths.
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP D: Getting all TVs from fusion" << std::endl;
std::cout.flush();
std::vector<TensorView*> all_tvs = fusion->allTvs();
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP D: Got " << all_tvs.size() << " TVs" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP E: Creating SchedulerRuntimeInfo" << std::endl;
std::cout.flush();
SchedulerRuntimeInfo runtime_info(
    fusion.get(), args, nullptr, all_tvs, forced_index_type);
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP E: SchedulerRuntimeInfo created" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F: Checking serde_buffer (nullptr=" << (serde_buffer == nullptr) << ")" << std::endl;
std::cout.flush();
if (serde_buffer == nullptr || !serde_buffer->segmented_fusion()->valid()) {
  // Default compilation path applies segmentation before scheduling and
  // compiling the fusion.
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F1: Default path - calling SegmentCandidateFinder::segment" << std::endl;
  std::cout.flush();
  segmented_fusion_ =
      SegmentCandidateFinder::segment(std::move(fusion), args, runtime_info);
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F1: Segmentation completed" << std::endl;
  std::cout.flush();
} else {
  // Serialization path that generates segmented fusion from flatbuffers.
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2: Serde path - getting segmented_groups" << std::endl;
  std::cout.flush();
  // Convert Welford to two-pass if option is enabled and the original
  // heuristic is persistent
  const flatbuffers::Vector<flatbuffers::Offset<serde::SegmentedGroup>>*
      segmented_groups = serde_buffer->segmented_fusion()->groups();
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2a: Got " << segmented_groups->size() << " segmented groups" << std::endl;
  std::cout.flush();

  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2b: Checking for persistent heuristics" << std::endl;
  std::cout.flush();
  bool has_persistent_heuristic = std::any_of(
      segmented_groups->begin(),
      segmented_groups->end(),
      [](const serde::SegmentedGroup* sg) {
        auto heuristic = static_cast<SchedulerType>(sg->heuristic());
        return heuristic == SchedulerType::InnerPersistent ||
            heuristic == SchedulerType::OuterPersistent ||
            heuristic == SchedulerType::InnerOuterPersistent;
      });
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2c: has_persistent_heuristic=" << has_persistent_heuristic << std::endl;
  std::cout.flush();

  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2d: Checking for Welford ops" << std::endl;
  std::cout.flush();
  bool has_welford_ops = ir_utils::hasOpsOfType<WelfordOp>(fusion.get());
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2e: has_welford_ops=" << has_welford_ops << std::endl;
  std::cout.flush();

  if (has_welford_ops && has_persistent_heuristic) {
    std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2f: Translating Welford in fusion" << std::endl;
    std::cout.flush();
    SegmentCandidateFinder::translateWelfordInFusion(fusion.get(), args);
    std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2g: Welford translation completed" << std::endl;
    std::cout.flush();
  }

  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2h: Creating SegmentedFusion" << std::endl;
  std::cout.flush();
  segmented_fusion_ = std::make_unique<SegmentedFusion>(std::move(fusion));
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2i: Deserializing segmented_fusion" << std::endl;
  std::cout.flush();
  segmented_fusion_->deserialize(serde_buffer->segmented_fusion());
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP F2j: Deserialization completed" << std::endl;
  std::cout.flush();
}

// Pre-compute the executor order so that the run time path
//  would go directly to kernel launch.
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP G: Preparing runtime order" << std::endl;
std::cout.flush();
runtime_workspace_ = prepareRuntimeOrder(*segmented_fusion_);
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP G: Runtime order prepared" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP H: Resizing executors (num_groups=" << segmented_fusion_->groups().size() << ")" << std::endl;
std::cout.flush();
executors_.resize(segmented_fusion_->groups().size());
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP H: Executors resized" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP I: Checking debug dump for segments" << std::endl;
std::cout.flush();
if (isDebugDumpEnabled(DebugDumpOption::FusionSegments)) {
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP I1: Printing segmented fusion" << std::endl;
  std::cout.flush();
  segmented_fusion_->print();
  std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP I2: Segmented fusion printed" << std::endl;
  std::cout.flush();
}

// Even if we go through the segmented path we may still end up
//  with a segmented fusion with one group. This case still
//  counts as un-segmented.
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP J: Setting is_segmented flag" << std::endl;
std::cout.flush();
is_segmented_ = segmented_fusion_->groups().size() > 1;
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP J: is_segmented=" << is_segmented_ << std::endl;
std::cout.flush();

// Create Initial Heuristics for Segmented Fusion
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP K: Getting heuristics" << std::endl;
std::cout.flush();
auto maybe_heuristics = getMaybeHeuristicsFor(args, forced_index_type);
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP K1: Got maybe_heuristics, checking if has_value" << std::endl;
std::cout.flush();
NVF_CHECK(maybe_heuristics.has_value());
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP K2: Heuristics have value, moving" << std::endl;
std::cout.flush();
heuristics_ = std::move(maybe_heuristics.value());
std::cout << "[DEBUG] FusionKernelRuntime::ctor - STEP K3: Heuristics moved successfully" << std::endl;
std::cout << "[DEBUG] FusionKernelRuntime::ctor - EXIT (constructor completed successfully)" << std::endl;
std::cout.flush();
Debug Code in Production

LinearOp::evaluate contains extremely verbose debug output (100+ lines) that will significantly impact performance. This debug code should be removed or made conditional before merging to main.

std::cout << "[DEBUG] LinearOp::evaluate - ENTRY" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] LinearOp::evaluate - STEP 1: Getting input tensor (inputs.size=" << inputs.size() << ")" << std::endl;
std::cout.flush();
const auto in = inputs.at(0).as<at::Tensor>();
std::cout << "[DEBUG] LinearOp::evaluate - STEP 1a: Input tensor obtained, shape=[";
for (int64_t i = 0; i < in.dim(); i++) {
  if (i > 0) std::cout << ", ";
  std::cout << in.size(i);
}
std::cout << "], device=" << in.device() << std::endl;
std::cout.flush();

std::cout << "[DEBUG] LinearOp::evaluate - STEP 2: Getting weight tensor" << std::endl;
std::cout.flush();
auto weight = inputs.at(1).as<at::Tensor>();
std::cout << "[DEBUG] LinearOp::evaluate - STEP 2a: Weight tensor obtained, shape=[";
for (int64_t i = 0; i < weight.dim(); i++) {
  if (i > 0) std::cout << ", ";
  std::cout << weight.size(i);
}
std::cout << "], device=" << weight.device() << std::endl;
std::cout.flush();

auto squeeze_device_dims = [](at::Tensor& t,
                              int64_t num_device_dims) -> void {
  // Record the initial shape for the error message.
  std::vector<int64_t> shape = t.sizes().vec();
  for ([[maybe_unused]] auto _ : arange(num_device_dims)) {
    NVF_CHECK(
        t.size(0) == 1,
        "When the weight is >2D, expect its preceding dimensions and "
        "the bias's preceding dimensions to "
        "be DID-parallel and therefore size-1: ",
        shape);
    t = t.squeeze(0);
  }
};

// The squeezes and unsqueezes are currently required to support a sharded
// linear layer. Remove them after #2563.
std::cout << "[DEBUG] LinearOp::evaluate - STEP 3: Calculating num_device_dims (weight.dim=" << weight.dim() << ")" << std::endl;
std::cout.flush();
auto num_device_dims = weight.dim() - 2;
std::cout << "[DEBUG] LinearOp::evaluate - STEP 3a: num_device_dims=" << num_device_dims << std::endl;
std::cout.flush();

std::cout << "[DEBUG] LinearOp::evaluate - STEP 4: Squeezing device dims from weight" << std::endl;
std::cout.flush();
squeeze_device_dims(weight, num_device_dims);
std::cout << "[DEBUG] LinearOp::evaluate - STEP 4a: Weight squeezed, new shape=[";
for (int64_t i = 0; i < weight.dim(); i++) {
  if (i > 0) std::cout << ", ";
  std::cout << weight.size(i);
}
std::cout << "]" << std::endl;
std::cout.flush();

std::cout << "[DEBUG] LinearOp::evaluate - STEP 5: Checking hasBias (hasBias=" << hasBias() << ")" << std::endl;
std::cout.flush();
at::Tensor out_tensor;
if (hasBias()) {
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5a: Getting bias tensor" << std::endl;
  std::cout.flush();
  auto bias = inputs.at(2).as<at::Tensor>();
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5b: Bias tensor obtained, shape=[";
  for (int64_t i = 0; i < bias.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << bias.size(i);
  }
  std::cout << "], device=" << bias.device() << std::endl;
  std::cout.flush();

  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5c: Squeezing device dims from bias" << std::endl;
  std::cout.flush();
  squeeze_device_dims(bias, num_device_dims);
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5d: Bias squeezed, new shape=[";
  for (int64_t i = 0; i < bias.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << bias.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout.flush();

  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5e: Calling at::linear with bias" << std::endl;
  std::cout << "[DEBUG] LinearOp::evaluate - INPUT METADATA:" << std::endl;
  std::cout << "  in.sizes: [";
  for (int64_t i = 0; i < in.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << in.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  in.strides: [";
  for (int64_t i = 0; i < in.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << in.stride(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  in.dtype: " << in.dtype() << std::endl;
  std::cout << "  in.device: " << in.device() << std::endl;
  std::cout << "  in.is_contiguous: " << in.is_contiguous() << std::endl;
  std::cout << "  in.numel: " << in.numel() << std::endl;
  std::cout << "[DEBUG] LinearOp::evaluate - WEIGHT METADATA:" << std::endl;
  std::cout << "  weight.sizes: [";
  for (int64_t i = 0; i < weight.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << weight.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  weight.strides: [";
  for (int64_t i = 0; i < weight.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << weight.stride(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  weight.dtype: " << weight.dtype() << std::endl;
  std::cout << "  weight.device: " << weight.device() << std::endl;
  std::cout << "  weight.is_contiguous: " << weight.is_contiguous() << std::endl;
  std::cout << "  weight.numel: " << weight.numel() << std::endl;
  std::cout << "[DEBUG] LinearOp::evaluate - BIAS METADATA:" << std::endl;
  std::cout << "  bias.sizes: [";
  for (int64_t i = 0; i < bias.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << bias.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  bias.strides: [";
  for (int64_t i = 0; i < bias.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << bias.stride(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  bias.dtype: " << bias.dtype() << std::endl;
  std::cout << "  bias.device: " << bias.device() << std::endl;
  std::cout << "  bias.is_contiguous: " << bias.is_contiguous() << std::endl;
  std::cout << "  bias.numel: " << bias.numel() << std::endl;
  std::cout.flush();
  out_tensor = at::linear(in, weight, bias);
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5f: at::linear completed" << std::endl;
  std::cout.flush();
} else {
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5g: Calling at::linear without bias" << std::endl;
  std::cout << "[DEBUG] LinearOp::evaluate - INPUT METADATA:" << std::endl;
  std::cout << "  in.sizes: [";
  for (int64_t i = 0; i < in.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << in.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  in.strides: [";
  for (int64_t i = 0; i < in.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << in.stride(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  in.dtype: " << in.dtype() << std::endl;
  std::cout << "  in.device: " << in.device() << std::endl;
  std::cout << "  in.is_contiguous: " << in.is_contiguous() << std::endl;
  std::cout << "  in.numel: " << in.numel() << std::endl;
  std::cout << "[DEBUG] LinearOp::evaluate - WEIGHT METADATA:" << std::endl;
  std::cout << "  weight.sizes: [";
  for (int64_t i = 0; i < weight.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << weight.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  weight.strides: [";
  for (int64_t i = 0; i < weight.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << weight.stride(i);
  }
  std::cout << "]" << std::endl;
  std::cout << "  weight.dtype: " << weight.dtype() << std::endl;
  std::cout << "  weight.device: " << weight.device() << std::endl;
  std::cout << "  weight.is_contiguous: " << weight.is_contiguous() << std::endl;
  std::cout << "  weight.numel: " << weight.numel() << std::endl;
  std::cout.flush();
  out_tensor = at::linear(in, weight);
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 5h: at::linear completed" << std::endl;
  std::cout.flush();
}

std::cout << "[DEBUG] LinearOp::evaluate - STEP 6: at::linear result shape=[";
for (int64_t i = 0; i < out_tensor.dim(); i++) {
  if (i > 0) std::cout << ", ";
  std::cout << out_tensor.size(i);
}
std::cout << "], device=" << out_tensor.device() << std::endl;
std::cout.flush();

std::cout << "[DEBUG] LinearOp::evaluate - STEP 7: Unsqueezing output (num_device_dims=" << num_device_dims << ")" << std::endl;
std::cout.flush();
for ([[maybe_unused]] auto _ : arange(num_device_dims)) {
  out_tensor = out_tensor.unsqueeze(0);
}
std::cout << "[DEBUG] LinearOp::evaluate - STEP 7a: Unsqueezed output shape=[";
for (int64_t i = 0; i < out_tensor.dim(); i++) {
  if (i > 0) std::cout << ", ";
  std::cout << out_tensor.size(i);
}
std::cout << "]" << std::endl;
std::cout.flush();

// Handle rFactor DIDs similar to MatmulOp::evaluate.
std::cout << "[DEBUG] LinearOp::evaluate - STEP 8: Checking rFactor device dimension index" << std::endl;
std::cout.flush();
if (const auto rfactor_did_idx = getRFactorDeviceDimensionIndex(out());
    rfactor_did_idx != -1) {
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 8a: rFactor DID index=" << rfactor_did_idx << ", unsqueezing" << std::endl;
  std::cout.flush();
  out_tensor = out_tensor.unsqueeze(rfactor_did_idx);
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 8b: Final output shape=[";
  for (int64_t i = 0; i < out_tensor.dim(); i++) {
    if (i > 0) std::cout << ", ";
    std::cout << out_tensor.size(i);
  }
  std::cout << "]" << std::endl;
  std::cout.flush();
} else {
  std::cout << "[DEBUG] LinearOp::evaluate - STEP 8c: No rFactor DID, skipping unsqueeze" << std::endl;
  std::cout.flush();
}

std::cout << "[DEBUG] LinearOp::evaluate - STEP 9: Returning result" << std::endl;
std::cout.flush();
return {out_tensor};
Disabled Test Assertions

The ReproLinearAddFusion test has commented out validation assertions (lines 1611-1612, 1650-1657). These should be uncommented or the test should be marked as expected to fail if the issue being reproduced is not yet resolved.

  // at::Tensor ref = at::linear(t0, t2) + t1;
  // testValidate(
  //     executor_cache.fusion(), outputs, {t0, t1, t2}, {ref}, __LINE__, __FILE__);

  // Serialize the FusionExecutorCache to test serde path
  // This reproduces the serialization behavior when enable_automatic_serialization() is used
  flatbuffers::FlatBufferBuilder builder(1024);
  auto serialized = executor_cache.serialize(builder);
  builder.Finish(serialized);

  // Get the serialized buffer
  uint8_t* buf = builder.GetBufferPointer();

  // Create a new fusion and executor cache for deserialization
  auto fusion2 = std::make_unique<Fusion>();
  FusionGuard fg2(fusion2.get());

  auto tv0_2 = makeSymbolicTensor(3, DataType::BFloat16);
  auto tv1_2 = makeSymbolicTensor(3, DataType::BFloat16);
  auto tv2_2 = makeSymbolicTensor(2, DataType::BFloat16);

  fusion2->addInput(tv0_2);
  fusion2->addInput(tv1_2);
  fusion2->addInput(tv2_2);

  auto tv3_2 = linear(tv0_2, tv2_2);
  auto tv4_2 = add(tv3_2, tv1_2);
  fusion2->addOutput(tv4_2);

  FusionExecutorCache executor_cache2(std::move(fusion2), /*fusion_id=*/1);

  // Deserialize into the new executor cache
  // Cast the buffer to the FusionExecutorCache flatbuffer type
  auto buffer = flatbuffers::GetRoot<serde::FusionExecutorCache>(buf);
  executor_cache2.deserialize(buffer, /*fusion_id=*/1);

  // Run with the deserialized cache
  auto outputs2 = executor_cache2.runFusionWithInputs({t0, t1, t2});
  (void)outputs2;

  // // Validate deserialized run
  // testValidate(
  //     executor_cache2.fusion(),
  //     outputs2,
  //     {t0, t1, t2},
  //     {ref},
  //     __LINE__,
  //     __FILE__);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants