[5725362] AutoCast Fixes for models with external data #731

galagam · 2025-12-28T10:27:43Z

What does this PR do?

Type of change: Bug fix

Overview: Fix AutoCast ReferenceRunner to handle large models.
Models above 2GB cannot be serialized to string, which is what polygraphy is doing under the hood. Use a temporary file instead to save the modified onnx with all tensors marked as outputs.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

Improvements
- Enhanced model processing to better support large ONNX models during validation and runtime execution
- Added diagnostic logging of model sizes at key processing stages for improved debugging and performance monitoring

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-28T10:27:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

ajrasane · 2026-01-09T05:11:48Z

modelopt/onnx/autocast/referencerunner.py

+        has_external_data = any(
+            init.HasField("data_location") and init.data_location == onnx.TensorProto.EXTERNAL
+            for init in self.model.graph.initializer
+        )


Can we reuse this function to check for external data?

Model-Optimizer/modelopt/torch/_deploy/utils/torch_onnx.py

Line 629 in 307fe71

def has_external_data(onnx_model_path: str):

@ajrasane I'm not sure I want to introduce dependencies from modelopt.torch here just for this.
Since modelopt/torch/_deploy/utils/torch_onnx.py is already importing quite a few utils from modelopt.onnx.utils, how about I move this function to modelopt.onnx.utils and import it in modelopt.torch?

@ajrasane please revisit - since I edited torch utils, I now need a review from modelopt-torch-deploy-codeowners 🙏

modelopt/onnx/autocast/referencerunner.py

modelopt/onnx/utils.py

Signed-off-by: Gal Hubara Agam <[email protected]>

coderabbitai · 2026-01-19T19:22:04Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

The changes enhance ONNX model handling with external data support by introducing detection and branching logic in ReferenceRunner, adding model size logging to PrecisionConverter, and updating utils to support external data workflows with modified validation and return signatures.

Changes

Cohort / File(s)	Summary
Model Size Logging `modelopt/onnx/autocast/precisionconverter.py`	Added `print_byte_size(label: str)` helper method to serialize and log model byte sizes. Updated `_sanity_check` to log sizes before and after ONNX validation. Changed model validation in `convert` to in-place check instead of assignment.
External Data Detection & Handling `modelopt/onnx/autocast/referencerunner.py`	Introduced `_get_ort_runner(model)` method that detects external data requirements (via data_location checks and size thresholds) and branches into either temporary-file-based loading with external data or in-memory path. Updated `run` method to delegate session construction to new helper. Added imports: `tempfile`, `onnx_utils`.
Model Validation & Utility Updates `modelopt/onnx/utils.py`	Changed `check_model` signature from returning `onnx.ModelProto` to returning `None`; now performs in-place validation without returning modified model. Updated external data handling to force external data flag in save workflow. Upgraded logging level from debug to warning for size/external data reporting. Added `copy` import.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as ReferenceRunner
    participant GetRunner as _get_ort_runner()
    participant Model as ONNX Model
    participant FS as File System
    participant ORT as ONNXRuntime

    Runner->>GetRunner: run(model)
    GetRunner->>Model: Check initializers for external data
    alt External Data Detected or Size > 2GB
        GetRunner->>FS: Create temp .onnx file
        GetRunner->>Model: Save with external data enabled
        GetRunner->>FS: Write temp ONNX file
        GetRunner->>ORT: Create InferenceSession from file
        ORT-->>GetRunner: Session ready
    else In-Memory Path
        GetRunner->>ORT: Create Session via BytesFromOnnx
        ORT-->>GetRunner: Session ready
    end
    GetRunner-->>Runner: Return OnnxrtRunner(session)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[5725362] AutoCast Fixes for models with external data' directly and clearly describes the main change: fixing AutoCast to handle models with external data. This aligns with all three commits and the file changes.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@modelopt/onnx/autocast/precisionconverter.py`:
- Around line 85-88: Remove the temporary debug method print_byte_size: it uses
print() with a "GAGAM" prefix and lacks a docstring; either delete this method
from the class or replace it with a properly documented utility that uses the
module logger (e.g., logging.getLogger(__name__)) and a D102-compliant
docstring, and if keeping functionality, rename to something descriptive (e.g.,
log_model_byte_size) and call self.model.SerializeToString() to compute size
before logging the result instead of printing.
- Around line 1298-1303: Remove the debug instrumentation by deleting the calls
to self.print_byte_size(...) inside the _sanity_check method and remove the
print_byte_size method definition entirely; ensure _sanity_check only performs
onnx_utils.check_model(self.model) (and any existing sanity_ok handling) without
emitting byte-size prints, and run tests to confirm no remaining references to
print_byte_size remain in the class.

In `@modelopt/onnx/autocast/referencerunner.py`:
- Around line 160-165: The temp ONNX file created with
tempfile.NamedTemporaryFile(delete=False) and passed to
onnx_utils.save_onnx/InfernceSession is never removed; store the path (e.g.,
self._temp_model_path) and ensure cleanup once the session is no longer needed
by either creating the temp inside a TemporaryDirectory context (mirroring
utils.check_model) that you keep open for the session lifetime or add a
dedicated cleanup method (e.g., close()/cleanup_temp_model) that deletes the
file and call it when the runner is disposed; optionally register that cleanup
with atexit as a fallback to avoid accumulating files.

In `@modelopt/onnx/utils.py`:
- Line 661: The line force-setting save_as_external_data = True is a temporary
debug override that bypasses the prior size-based logic; remove this hardcoded
assignment (the "save_as_external_data = True # GAGAM: for debug" statement) so
the function/method that computes save_as_external_data earlier in the scope can
determine the value normally, and ensure no other debug-only overrides remain in
the same function or surrounding block.

🧹 Nitpick comments (3)

modelopt/onnx/utils.py (1)
647-649: Consider reverting the log level from warning to debug.

This message logs model size and external data usage for every save operation. Using warning level will generate noise in production logs for what is essentially informational/diagnostic output. The original debug level was more appropriate unless there's a specific reason to always surface this information.
Suggested change
-        logger.warning(
+        logger.debug(
             f"Model size: {model_size} bytes, using external data: {save_as_external_data}"
         )
modelopt/onnx/autocast/referencerunner.py (2)

131-152: Consider extracting external data detection to a shared utility.

This logic duplicates functionality that exists in modelopt/torch/_deploy/utils/torch_onnx.py (has_external_data and check_model_uses_external_data). Per a past review comment, consider moving the detection logic to modelopt/onnx/utils.py so it can be reused across the codebase.

The current implementation is correct but consolidating would improve maintainability.

166-166: Lambda closure captures session correctly, but consider clarity.

The lambda lambda: session works because session is captured by reference. However, this pattern can be confusing. The OnnxrtRunner expects a callable that returns a session — passing the session directly via a lambda is acceptable but slightly unusual.

modelopt/onnx/autocast/precisionconverter.py

modelopt/onnx/autocast/referencerunner.py

modelopt/onnx/utils.py

Signed-off-by: Gal Hubara Agam <[email protected]>

codecov · 2026-01-19T19:48:09Z

Codecov Report

❌ Patch coverage is 58.97436% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.15%. Comparing base (391f6cb) to head (00ea80c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/autocast/referencerunner.py	54.54%	15 Missing ⚠️
modelopt/onnx/utils.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   74.19%   74.15%   -0.04%     
==========================================
  Files         192      191       -1     
  Lines       19238    19258      +20     
==========================================
+ Hits        14273    14281       +8     
- Misses       4965     4977      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Gal Hubara Agam <[email protected]>

gcunhase

LGTM, thanks

Signed-off-by: Gal Hubara Agam <[email protected]>

pranavm-nvidia · 2026-01-20T17:43:18Z

modelopt/onnx/autocast/referencerunner.py

+            tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
+            tmp_file.close()
+            tmp_file_path = tmp_file.name
+            onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)
+            logger.debug(f"Model with all outputs saved to {tmp_file_path}")
+            session = ort.InferenceSession(tmp_file_path, providers=self.providers)


FYI Polygraphy's SaveOnnx can handle models with external data. Also SessionFromOnnx can accept paths.

Thanks @pranavm-nvidia . I'll take a look and see if I can refactor.
If it's an quick fix for you - feel free to push a commit to this PR, and I'll review.

Left a suggestion with the change. One thing I'm not sure about - does your onnx_utils.save_onnx do anything special besides saving the model? On quick inspection, it seems like it's also setting a custom IR version. If that's still required, you'll probably need to add a line like:

modified_model.ir_version = 10

prior to calling Polygraphy's save_onnx.

pranavm-nvidia · 2026-01-20T18:50:39Z

modelopt/onnx/autocast/referencerunner.py

+            try:
+                # Try to estimate size by serializing the model
+                # If it fails or exceeds 2GB, we need file-based approach
+                model_size = len(self.model.SerializeToString())


Suggested change

model_size = len(self.model.SerializeToString())

model_size = model.ByteSize()

pranavm-nvidia · 2026-01-20T18:51:20Z

modelopt/onnx/autocast/referencerunner.py


+    def _get_ort_runner(self, model):
+        import onnxruntime as ort
+        from polygraphy.backend.onnx import BytesFromOnnx


Suggested change

from polygraphy.backend.onnx import BytesFromOnnx

from polygraphy.backend.onnx import BytesFromOnnx, save_onnx

pranavm-nvidia · 2026-01-20T18:55:28Z

modelopt/onnx/autocast/referencerunner.py

+        if has_external_data:
+            logger.debug("Model has external data, using file-based approach")
+            # Get the actual ONNX ModelProto from ModifyOutputs wrapper
+            modified_model = model()
+
+            # Use a persistent temp file to handle external data files properly
+            tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)
+            tmp_file.close()
+            tmp_file_path = tmp_file.name
+            onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)
+            logger.debug(f"Model with all outputs saved to {tmp_file_path}")
+            session = ort.InferenceSession(tmp_file_path, providers=self.providers)
+            runners = [OnnxrtRunner(lambda: session)]
+
+        else:
+            # For models without external data, use the original BytesFromOnnx approach (no tmp files)
+            logger.debug("Model has no external data, using BytesFromOnnx approach")
+            serialize_onnx = BytesFromOnnx(model)
+            build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)
+            runners = [OnnxrtRunner(build_onnxrt_session)]


Suggested change

if has_external_data:

logger.debug("Model has external data, using file-based approach")

# Get the actual ONNX ModelProto from ModifyOutputs wrapper

modified_model = model()

# Use a persistent temp file to handle external data files properly

tmp_file = tempfile.NamedTemporaryFile(suffix=".onnx", delete=False)

tmp_file.close()

tmp_file_path = tmp_file.name

onnx_utils.save_onnx(modified_model, tmp_file_path, save_as_external_data=True)

logger.debug(f"Model with all outputs saved to {tmp_file_path}")

session = ort.InferenceSession(tmp_file_path, providers=self.providers)

runners = [OnnxrtRunner(lambda: session)]

else:

# For models without external data, use the original BytesFromOnnx approach (no tmp files)

logger.debug("Model has no external data, using BytesFromOnnx approach")

serialize_onnx = BytesFromOnnx(model)

build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)

runners = [OnnxrtRunner(build_onnxrt_session)]

if has_external_data:

logger.debug("Model has external data, using file-based approach")

# Get the actual ONNX ModelProto from ModifyOutputs wrapper

modified_model = model()

# Use a persistent temp file to handle external data files properly

outdir = tempfile.TemporaryDirectory()

tmp_file_path = os.path.join(outdir.name, "tmp_model.onnx")

save_onnx(modified_model, tmp_file_path, external_data_path="ext.data")

logger.debug(f"Model with all outputs saved to {tmp_file_path}")

build_onnxrt_session = SessionFromOnnx(tmp_file_path, providers=self.providers)

else:

# For models without external data, use the original BytesFromOnnx approach (no tmp files)

logger.debug("Model has no external data, using BytesFromOnnx approach")

serialize_onnx = BytesFromOnnx(model)

build_onnxrt_session = SessionFromOnnx(serialize_onnx, providers=self.providers)

runners = [OnnxrtRunner(build_onnxrt_session)]

ajrasane reviewed Jan 9, 2026

View reviewed changes

gcunhase reviewed Jan 16, 2026

View reviewed changes

modelopt/onnx/utils.py Outdated Show resolved Hide resolved

galagam added 3 commits January 19, 2026 11:20

fix referencerunner for external data

d4e15ed

Signed-off-by: Gal Hubara Agam <[email protected]>

draft: skip model checker for models with external data

6532522

Signed-off-by: Gal Hubara Agam <[email protected]>

fix check_model for external data

7a2d91a

Signed-off-by: Gal Hubara Agam <[email protected]>

galagam force-pushed the dev-gagam-narrowing-error branch from 1a0711c to 7a2d91a Compare January 19, 2026 19:22

galagam marked this pull request as ready for review January 19, 2026 19:30

galagam requested review from a team as code owners January 19, 2026 19:31

galagam requested a review from gcunhase January 19, 2026 19:31

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

modelopt/onnx/autocast/precisionconverter.py Outdated Show resolved Hide resolved

modelopt/onnx/autocast/precisionconverter.py Outdated Show resolved Hide resolved

modelopt/onnx/autocast/referencerunner.py Show resolved Hide resolved

modelopt/onnx/utils.py Outdated Show resolved Hide resolved

cleanup debug prints

581d686

Signed-off-by: Gal Hubara Agam <[email protected]>

move deepcopy to save_onnx

cd96fa5

Signed-off-by: Gal Hubara Agam <[email protected]>

gcunhase approved these changes Jan 19, 2026

View reviewed changes

galagam requested a review from a team as a code owner January 20, 2026 15:01

galagam requested a review from cjluo-nv January 20, 2026 15:01

code de-deuplication

00ea80c

Signed-off-by: Gal Hubara Agam <[email protected]>

galagam force-pushed the dev-gagam-narrowing-error branch from 2ae99d0 to 00ea80c Compare January 20, 2026 15:14

galagam requested a review from ajrasane January 20, 2026 15:23

pranavm-nvidia reviewed Jan 20, 2026

View reviewed changes

	model_size = len(self.model.SerializeToString())
	model_size = model.ByteSize()

	from polygraphy.backend.onnx import BytesFromOnnx
	from polygraphy.backend.onnx import BytesFromOnnx, save_onnx

[5725362] AutoCast Fixes for models with external data #731

Are you sure you want to change the base?

[5725362] AutoCast Fixes for models with external data #731

Conversation

galagam commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gcunhase left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavm-nvidia Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

galagam commented Dec 28, 2025 •

edited

Loading

coderabbitai bot commented Jan 19, 2026 •

edited

Loading

codecov bot commented Jan 19, 2026 •

edited

Loading

pranavm-nvidia Jan 20, 2026 •

edited

Loading