Skip to content

Conversation

@galagam
Copy link
Contributor

@galagam galagam commented Dec 28, 2025

What does this PR do?

Type of change: Bug fix

Overview: Fix AutoCast ReferenceRunner to handle large models.
Models above 2GB cannot be serialized to string, which is what polygraphy is doing under the hood. Use a temporary file instead to save the modified onnx with all tensors marked as outputs.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • Improvements
    • Enhanced model processing to better support large ONNX models during validation and runtime execution
    • Added diagnostic logging of model sizes at key processing stages for improved debugging and performance monitoring

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment on lines 131 to 134
has_external_data = any(
init.HasField("data_location") and init.data_location == onnx.TensorProto.EXTERNAL
for init in self.model.graph.initializer
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse this function to check for external data?

def has_external_data(onnx_model_path: str):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane I'm not sure I want to introduce dependencies from modelopt.torch here just for this.
Since modelopt/torch/_deploy/utils/torch_onnx.py is already importing quite a few utils from modelopt.onnx.utils, how about I move this function to modelopt.onnx.utils and import it in modelopt.torch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane please revisit - since I edited torch utils, I now need a review from modelopt-torch-deploy-codeowners 🙏

@galagam galagam force-pushed the dev-gagam-narrowing-error branch from 1a0711c to 7a2d91a Compare January 19, 2026 19:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The changes enhance ONNX model handling with external data support by introducing detection and branching logic in ReferenceRunner, adding model size logging to PrecisionConverter, and updating utils to support external data workflows with modified validation and return signatures.

Changes

Cohort / File(s) Summary
Model Size Logging
modelopt/onnx/autocast/precisionconverter.py
Added print_byte_size(label: str) helper method to serialize and log model byte sizes. Updated _sanity_check to log sizes before and after ONNX validation. Changed model validation in convert to in-place check instead of assignment.
External Data Detection & Handling
modelopt/onnx/autocast/referencerunner.py
Introduced _get_ort_runner(model) method that detects external data requirements (via data_location checks and size thresholds) and branches into either temporary-file-based loading with external data or in-memory path. Updated run method to delegate session construction to new helper. Added imports: tempfile, onnx_utils.
Model Validation & Utility Updates
modelopt/onnx/utils.py
Changed check_model signature from returning onnx.ModelProto to returning None; now performs in-place validation without returning modified model. Updated external data handling to force external data flag in save workflow. Upgraded logging level from debug to warning for size/external data reporting. Added copy import.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as ReferenceRunner
    participant GetRunner as _get_ort_runner()
    participant Model as ONNX Model
    participant FS as File System
    participant ORT as ONNXRuntime

    Runner->>GetRunner: run(model)
    GetRunner->>Model: Check initializers for external data
    alt External Data Detected or Size > 2GB
        GetRunner->>FS: Create temp .onnx file
        GetRunner->>Model: Save with external data enabled
        GetRunner->>FS: Write temp ONNX file
        GetRunner->>ORT: Create InferenceSession from file
        ORT-->>GetRunner: Session ready
    else In-Memory Path
        GetRunner->>ORT: Create Session via BytesFromOnnx
        ORT-->>GetRunner: Session ready
    end
    GetRunner-->>Runner: Return OnnxrtRunner(session)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[5725362] AutoCast Fixes for models with external data' directly and clearly describes the main change: fixing AutoCast to handle models with external data. This aligns with all three commits and the file changes.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@galagam galagam marked this pull request as ready for review January 19, 2026 19:30
@galagam galagam requested review from a team as code owners January 19, 2026 19:31
@galagam galagam requested a review from gcunhase January 19, 2026 19:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@modelopt/onnx/autocast/precisionconverter.py`:
- Around line 85-88: Remove the temporary debug method print_byte_size: it uses
print() with a "GAGAM" prefix and lacks a docstring; either delete this method
from the class or replace it with a properly documented utility that uses the
module logger (e.g., logging.getLogger(__name__)) and a D102-compliant
docstring, and if keeping functionality, rename to something descriptive (e.g.,
log_model_byte_size) and call self.model.SerializeToString() to compute size
before logging the result instead of printing.
- Around line 1298-1303: Remove the debug instrumentation by deleting the calls
to self.print_byte_size(...) inside the _sanity_check method and remove the
print_byte_size method definition entirely; ensure _sanity_check only performs
onnx_utils.check_model(self.model) (and any existing sanity_ok handling) without
emitting byte-size prints, and run tests to confirm no remaining references to
print_byte_size remain in the class.

In `@modelopt/onnx/autocast/referencerunner.py`:
- Around line 160-165: The temp ONNX file created with
tempfile.NamedTemporaryFile(delete=False) and passed to
onnx_utils.save_onnx/InfernceSession is never removed; store the path (e.g.,
self._temp_model_path) and ensure cleanup once the session is no longer needed
by either creating the temp inside a TemporaryDirectory context (mirroring
utils.check_model) that you keep open for the session lifetime or add a
dedicated cleanup method (e.g., close()/cleanup_temp_model) that deletes the
file and call it when the runner is disposed; optionally register that cleanup
with atexit as a fallback to avoid accumulating files.

In `@modelopt/onnx/utils.py`:
- Line 661: The line force-setting save_as_external_data = True is a temporary
debug override that bypasses the prior size-based logic; remove this hardcoded
assignment (the "save_as_external_data = True # GAGAM: for debug" statement) so
the function/method that computes save_as_external_data earlier in the scope can
determine the value normally, and ensure no other debug-only overrides remain in
the same function or surrounding block.
🧹 Nitpick comments (3)
modelopt/onnx/utils.py (1)

647-649: Consider reverting the log level from warning to debug.

This message logs model size and external data usage for every save operation. Using warning level will generate noise in production logs for what is essentially informational/diagnostic output. The original debug level was more appropriate unless there's a specific reason to always surface this information.

Suggested change
-        logger.warning(
+        logger.debug(
             f"Model size: {model_size} bytes, using external data: {save_as_external_data}"
         )
modelopt/onnx/autocast/referencerunner.py (2)

131-152: Consider extracting external data detection to a shared utility.

This logic duplicates functionality that exists in modelopt/torch/_deploy/utils/torch_onnx.py (has_external_data and check_model_uses_external_data). Per a past review comment, consider moving the detection logic to modelopt/onnx/utils.py so it can be reused across the codebase.

The current implementation is correct but consolidating would improve maintainability.


166-166: Lambda closure captures session correctly, but consider clarity.

The lambda lambda: session works because session is captured by reference. However, this pattern can be confusing. The OnnxrtRunner expects a callable that returns a session — passing the session directly via a lambda is acceptable but slightly unusual.

Signed-off-by: Gal Hubara Agam <[email protected]>
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 58.97436% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.15%. Comparing base (391f6cb) to head (00ea80c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/autocast/referencerunner.py 54.54% 15 Missing ⚠️
modelopt/onnx/utils.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   74.19%   74.15%   -0.04%     
==========================================
  Files         192      191       -1     
  Lines       19238    19258      +20     
==========================================
+ Hits        14273    14281       +8     
- Misses       4965     4977      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Gal Hubara Agam <[email protected]>
Copy link
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@galagam galagam requested a review from a team as a code owner January 20, 2026 15:01
@galagam galagam requested a review from cjluo-nv January 20, 2026 15:01
Signed-off-by: Gal Hubara Agam <[email protected]>
@galagam galagam force-pushed the dev-gagam-narrowing-error branch from 2ae99d0 to 00ea80c Compare January 20, 2026 15:14
@galagam galagam requested a review from ajrasane January 20, 2026 15:23
Comment on lines +157 to +162
tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
tmp_file.close()
tmp_file_path = tmp_file.name
onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)
logger.debug(f"Model with all outputs saved to {tmp_file_path}")
session = ort.InferenceSession(tmp_file_path, providers=self.providers)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI Polygraphy's SaveOnnx can handle models with external data. Also SessionFromOnnx can accept paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pranavm-nvidia . I'll take a look and see if I can refactor.
If it's an quick fix for you - feel free to push a commit to this PR, and I'll review.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion with the change. One thing I'm not sure about - does your onnx_utils.save_onnx do anything special besides saving the model? On quick inspection, it seems like it's also setting a custom IR version. If that's still required, you'll probably need to add a line like:

modified_model.ir_version = 10

prior to calling Polygraphy's save_onnx.

try:
# Try to estimate size by serializing the model
# If it fails or exceeds 2GB, we need file-based approach
model_size = len(self.model.SerializeToString())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model_size = len(self.model.SerializeToString())
model_size = model.ByteSize()


def _get_ort_runner(self, model):
import onnxruntime as ort
from polygraphy.backend.onnx import BytesFromOnnx
Copy link

@pranavm-nvidia pranavm-nvidia Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from polygraphy.backend.onnx import BytesFromOnnx
from polygraphy.backend.onnx import BytesFromOnnx, save_onnx

Comment on lines +151 to +170
if has_external_data:
logger.debug("Model has external data, using file-based approach")
# Get the actual ONNX ModelProto from ModifyOutputs wrapper
modified_model = model()

# Use a persistent temp file to handle external data files properly
tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
tmp_file.close()
tmp_file_path = tmp_file.name
onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)
logger.debug(f"Model with all outputs saved to {tmp_file_path}")
session = ort.InferenceSession(tmp_file_path, providers=self.providers)
runners = [OnnxrtRunner(lambda: session)]

else:
# For models without external data, use the original BytesFromOnnx approach (no tmp files)
logger.debug("Model has no external data, using BytesFromOnnx approach")
serialize_onnx = BytesFromOnnx(model)
build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)
runners = [OnnxrtRunner(build_onnxrt_session)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if has_external_data:
logger.debug("Model has external data, using file-based approach")
# Get the actual ONNX ModelProto from ModifyOutputs wrapper
modified_model = model()
# Use a persistent temp file to handle external data files properly
tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
tmp_file.close()
tmp_file_path = tmp_file.name
onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)
logger.debug(f"Model with all outputs saved to {tmp_file_path}")
session = ort.InferenceSession(tmp_file_path, providers=self.providers)
runners = [OnnxrtRunner(lambda: session)]
else:
# For models without external data, use the original BytesFromOnnx approach (no tmp files)
logger.debug("Model has no external data, using BytesFromOnnx approach")
serialize_onnx = BytesFromOnnx(model)
build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)
runners = [OnnxrtRunner(build_onnxrt_session)]
if has_external_data:
logger.debug("Model has external data, using file-based approach")
# Get the actual ONNX ModelProto from ModifyOutputs wrapper
modified_model = model()
# Use a persistent temp file to handle external data files properly
outdir = tempfile.TemporaryDirectory()
tmp_file_path = os.path.join(outdir.name, "tmp_model.onnx")
save_onnx(modified_model, tmp_file_path, external_data_path="ext.data")
logger.debug(f"Model with all outputs saved to {tmp_file_path}")
build_onnxrt_session = SessionFromOnnx(tmp_file_path, providers=self.providers)
else:
# For models without external data, use the original BytesFromOnnx approach (no tmp files)
logger.debug("Model has no external data, using BytesFromOnnx approach")
serialize_onnx = BytesFromOnnx(model)
build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)
runners = [OnnxrtRunner(build_onnxrt_session)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants