Skip to content

[6058907] Fix ShapeInferenceError in ONNX int8+fp16 quantization of weakly-typed models#1627

Open
ajrasane wants to merge 3 commits into
mainfrom
ajrasane/onnx-stale-shape-typeinfer-fix
Open

[6058907] Fix ShapeInferenceError in ONNX int8+fp16 quantization of weakly-typed models#1627
ajrasane wants to merge 3 commits into
mainfrom
ajrasane/onnx-stale-shape-typeinfer-fix

Conversation

@ajrasane

@ajrasane ajrasane commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Bug fix

ONNX INT8 + FP16 quantization (--quantize_mode int8 --high_precision_dtype fp16) crashed with a ShapeInferenceError on weakly-typed models (e.g. TensorFlow exports). Two root causes, both fixed in modelopt/onnx/utils.py:

  • Stale rank-0 output shapes. Such models can declare a graph.output rank that conflicts with the graph topology — typically a leftover rank-0 (scalar) annotation on a tensor that is really rank-2+. A stale rank-0 passes onnx.checker (a scalar is valid) but poisons downstream shape inference: ORT fails while augmenting the model for INT8 calibration (axis must be in [-rank, rank-1]. Input rank was 0), and onnx.shape_inference(strict_mode=True) raises Inferred shape and existing shape differ in rank during FP16 autocast. clear_stale_value_info now reconciles stale output shapes — it clears and re-derives them from the operator graph via ORT symbolic shape inference (falling back to the size-aware infer_shapes wrapper) and adopts the inferred shape. A graph output is never left without a shape field (which onnx.checker requires); an output whose shape cannot be re-derived keeps its original declaration.

  • Ops ONNX static shape inference can't resolve. The same models can contain ops (e.g. TopK) that ONNX's static shape inference gives up on, leaving downstream tensors untyped and breaking AutoCast's type lookups. infer_types now falls back to the schema-based standalone type inferencer when ONNX shape inference raises or leaves tensors untyped, running the fallback on the shape-inferred model so any shapes ONNX did derive are preserved.

Healthy models are unaffected: re-inference reproduces their existing output shapes, and a fully typed graph skips the fallback.

Usage

python -m modelopt.onnx.quantization \
  --quantize_mode int8 --high_precision_dtype fp16 \
  --onnx_path model.onnx --output_path model_int8_fp16.onnx

Testing

  • Added CPU-only unit tests in tests/unit/onnx/test_onnx_utils.py: stale rank-0 output reconciliation, preservation of a valid output shape, and the type-inference fallback on a TopK-overflow model. Run: CUDA_VISIBLE_DEVICES="" pytest tests/unit/onnx/test_onnx_utils.py.
  • Verified end-to-end on a weakly-typed object-detection model that previously failed: the command now completes and produces a valid quantized FP16 model (onnx.checker passes, ORT loads it, QuantizeLinear/DequantizeLinear nodes inserted, FP16 initializers present, all graph-output ranks correct).
  • Full tests/unit/onnx suite: no new failures versus the base branch.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • Bug Fixes

    • Improved mixed-precision/FP16 conversions to prevent stale or missing shape/type metadata. Output tensor rank/shape declarations are reconciled after clearing stale info, and stricter ONNX shape/type inference now falls back to standalone type inference when certain ops can’t be resolved.
  • Tests

    • Expanded ONNX utility and autocast test coverage for fixing rank-0 output shapes, preserving valid static and dynamic dim_param output declarations, and verifying strict-mode fallback for unresolved TopK during FP16 conversion.

@copy-pr-bot

copy-pr-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Run ONNX shape inference in strict mode with a try/except fallback to standalone type inference; add output-shape reconciliation to fix stale output declarations; extract shape metadata clearing to a shared utility; integrate strict-mode inference into AutoCast conversions; add unit tests and a changelog entry.

Changes

Shape inference robustness and reconciliation

Layer / File(s) Summary
Shape metadata clearing refactoring
modelopt/onnx/autocast/utils.py, modelopt/onnx/autocast/precisionconverter.py
Extract recursive type/shape clearing logic from PrecisionConverter._clear_types_and_shapes_recursive() into a shared utility clear_types_and_shapes_recursive() that traverses main graphs and subgraphs, clearing types and optionally shapes while preserving only subgraph-produced intermediates in value_info.
Core shape/type inference refactoring
modelopt/onnx/utils.py
infer_types() wraps ONNX shape inference in try/except and falls back to _infer_types_only(model) on errors; added _reconcile_stale_output_shapes() to re-derive and patch stale graph.output shapes using ORT symbolic inference with fallback; clear_stale_value_info() includes fixed shape count in its return value.
AutoCast strict-mode integration
modelopt/onnx/autocast/convert.py
convert_to_mixed_precision() and convert_to_f16() call infer_types(..., strict_mode=True) to ensure strict ONNX shape inference with fallback when unresolvable ops are encountered.
Test coverage for inference and reconciliation
tests/unit/onnx/test_onnx_utils.py, tests/unit/onnx/autocast/test_autocast.py
Added tests verifying stale rank-0 output reconciliation to correct ranks, preservation of valid shapes and dim_param names, fallback to standalone type inference when ONNX shape inference fails (TopK overflow), and AutoCast conversion with unresolvable ops.
Changelog documentation
CHANGELOG.rst
Adds a bugfix note describing the ShapeInferenceError fix for ONNX INT8+FP16 quantization on weakly-typed models and the shape reconciliation and strict-mode inference changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • jenchen13
  • cjluo-nv
  • gcunhase
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and precisely summarizes the main fix: addressing ShapeInferenceError in ONNX INT8+FP16 quantization on weakly-typed models, which is the central objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in PR. No torch.load, numpy.load with allow_pickle, trust_remote_code, eval/exec on input, nosec bypasses, or new PIP dependencies introduced. All code uses prope...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/onnx-stale-shape-typeinfer-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1627/

Built to branch gh-pages at 2026-06-04 21:58 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.62338% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.22%. Comparing base (ca7eb64) to head (20562db).
⚠️ Report is 76 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/utils.py 78.43% 11 Missing ⚠️
modelopt/onnx/autocast/utils.py 69.56% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1627      +/-   ##
==========================================
- Coverage   77.39%   77.22%   -0.17%     
==========================================
  Files         482      482              
  Lines       52960    53090     +130     
==========================================
+ Hits        40986    40998      +12     
- Misses      11974    12092     +118     
Flag Coverage Δ
examples 42.88% <6.49%> (+0.88%) ⬆️
gpu 59.44% <68.83%> (-0.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajrasane ajrasane force-pushed the ajrasane/onnx-stale-shape-typeinfer-fix branch 2 times, most recently from 6d53f4d to e6576a6 Compare June 4, 2026 19:40

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Targeted bug fix with three new tests. Two concerns worth a human look before merging:

  1. Dynamic-dim collateral damage in _reconcile_stale_output_shapes. The helper runs unconditionally on every model that goes through _preprocess_onnxclear_stale_value_info (not just weakly-typed exports), and it calls ORT's SymbolicShapeInference.infer_shapes(model, auto_merge=True, guess_output_rank=True). With those flags, ORT routinely rewrites user-named dim_params (e.g. batch, seq_len) into internal unk__N symbols, or commits a guessed rank when it can't fully propagate. The PR description says "re-inference reproduces their existing output shapes" and "healthy models are unaffected", but the bytes-level _sig comparison will treat a batchunk__0 rename as a change and overwrite the declared shape, which can break downstream consumers that look up output dims by name (export pipelines, TRT profile shapes, etc.). The added tests only cover fully static shapes ([3,4] / [3,5] / [1,3]), so this case is untested. Worth either (a) skipping the overwrite when only dim_param names differ but ranks/values agree, (b) only invoking the reconciler when the declared shape is actually inconsistent with the topology (the original motivation), or (c) at minimum adding a dynamic-batch regression test.

  2. strict_mode=True is now the default for infer_types callers in autocast/convert.py. The fallback to _infer_types_only on any exception is a behavior broadening — previously-tolerated weakly-typed graphs now silently route through the standalone path instead. The standalone inferencer is a reasonable fallback, but this is a change in the AutoCast hot path with no new AutoCast-level test (the new test_infer_types_falls_back_to_standalone_when_onnx_fails exercises infer_types directly, not convert_to_mixed_precision / convert_to_f16). Worth confirming the existing AutoCast test suite still passes against a representative weakly-typed model and not just the synthetic TopK fixture.

Fix itself is well-scoped and the docstrings/comments are clear; flagging mainly so an ONNX-pipeline owner can sign off on the dynamic-shape behavior change.

@ajrasane ajrasane marked this pull request as ready for review June 4, 2026 21:21
@ajrasane ajrasane requested review from a team as code owners June 4, 2026 21:22
@ajrasane ajrasane requested a review from galagam June 4, 2026 21:22
@ajrasane ajrasane force-pushed the ajrasane/onnx-stale-shape-typeinfer-fix branch from e6576a6 to c7dc86b Compare June 4, 2026 21:44
@ajrasane

ajrasane commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the careful review — addressed both in the latest revision:

1. Dynamic-dim collateral damage. The reconciler now overwrites a declared output shape only when it's genuinely stale (a rank mismatch or a conflicting concrete dim); outputs that merely differ in symbolic dim_param names (e.g. a re-derived unk__0 vs a declared batch) keep their declaration, so healthy models with dynamic dims are left untouched. I also dropped guess_output_rank=True — it's unnecessary for the fix (symbolic inference still derives the affected output ranks without it) and was the source of the guessed-rank risk you noted. Added test_clear_stale_value_info_preserves_dynamic_dim_names as a regression test (declares ["my_batch", 4] where inference derives ["batch", 4] and asserts the declared dim_param survives).

2. strict_mode=True / AutoCast coverage. Added test_convert_to_f16_falls_back_on_unresolvable_op, which exercises the full convert_to_f16 path (used by INT8 + --high_precision_dtype fp16) on a weakly-typed model whose TopK defeats strict ONNX shape inference, asserting it converts via the standalone type-inference fallback rather than crashing. The existing AutoCast suite (tests/unit/onnx/autocast/) still passes, and the original weakly-typed model is verified end-to-end.

…weakly-typed models

Weakly-typed ONNX models (e.g. TensorFlow exports) can carry graph.output
entries whose stored rank disagrees with the graph topology -- most commonly a
leftover rank-0 (scalar) annotation on a tensor that is really rank-2+. A stale
rank-0 passes onnx.checker (a scalar is valid) but poisons downstream shape
inference: ORT fails while augmenting the model for INT8 calibration
("axis must be in [-rank, rank-1]. Input rank was 0") and
onnx.shape_inference(strict_mode=True) raises "Inferred shape and existing shape
differ in rank" during fp16 autocast. Such models can also contain ops (e.g.
TopK) that ONNX's static shape inference cannot resolve, leaving downstream
tensors untyped and breaking AutoCast's type lookups.

Fixes:

- modelopt/onnx/utils.py: clear_stale_value_info now reconciles stale
  graph.output shapes. It snapshots and clears each output shape, re-derives
  shapes from the operator graph via ORT symbolic shape inference (falling back
  to the size-aware infer_shapes wrapper), and overwrites a declaration only when
  it is genuinely stale -- a rank mismatch or a conflicting concrete dim.
  Outputs that merely differ in symbolic dim_param names (e.g. a re-derived
  unk__0 vs a declared batch) keep their declaration, so healthy models with
  dynamic dims are untouched. A graph output is never left without a shape field
  (onnx.checker requires it).

- modelopt/onnx/utils.py: infer_types falls back to the schema-based standalone
  type inferencer when ONNX shape inference fails (e.g. raises in strict mode on
  an op it cannot resolve), so the returned model is still fully typed.

- modelopt/onnx/autocast/convert.py: convert_to_mixed_precision and
  convert_to_f16 now run the initial infer_types in strict mode, so an
  unresolvable op surfaces as an exception (triggering the fallback above)
  instead of silently leaving tensors untyped.

Healthy models are unaffected: strict shape inference succeeds for them and
output-shape reconciliation preserves their declared (incl. dynamic) shapes.

Adds unit tests for stale rank-0 output reconciliation, preservation of a valid
output shape, preservation of named dynamic dims, the infer_types fallback on a
TopK-overflow model, and an AutoCast-level convert_to_f16 fallback test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/onnx-stale-shape-typeinfer-fix branch from c7dc86b to cf2d65a Compare June 4, 2026 21:52

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/onnx/autocast/test_autocast.py (1)

345-357: ⚡ Quick win

Assert the strict-inference failure precondition before conversion.

This test currently proves conversion succeeds, but it doesn’t explicitly prove strict ONNX inference fails first (the condition that should trigger fallback). Add a precondition assertion so the test remains a precise fallback regression test.

♻️ Suggested test hardening
 def test_convert_to_f16_falls_back_on_unresolvable_op(weakly_typed_topk_model):
@@
-    converted_model = convert_to_f16(weakly_typed_topk_model, keep_io_types=True)
+    with pytest.raises(Exception):
+        onnx.shape_inference.infer_shapes(weakly_typed_topk_model, strict_mode=True)
+
+    converted_model = convert_to_f16(weakly_typed_topk_model, keep_io_types=True)

As per coding guidelines, “Checked-in tests should document expected behavior [and] protect against regressions.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/onnx/autocast/test_autocast.py` around lines 345 - 357, Add a
precondition assertion that strict ONNX inference fails on the
weakly_typed_topk_model before calling convert_to_f16: wrap a call to ONNX
strict inference (e.g., onnx.shape_inference.infer_shapes or infer_types with
strict=True) in pytest.raises(...) to assert it raises for
weakly_typed_topk_model, then proceed to call
convert_to_f16(weakly_typed_topk_model, keep_io_types=True) as before; reference
the test function name test_convert_to_f16_falls_back_on_unresolvable_op and the
fixture weakly_typed_topk_model so the failure-before-conversion condition is
explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/onnx/autocast/test_autocast.py`:
- Around line 345-357: Add a precondition assertion that strict ONNX inference
fails on the weakly_typed_topk_model before calling convert_to_f16: wrap a call
to ONNX strict inference (e.g., onnx.shape_inference.infer_shapes or infer_types
with strict=True) in pytest.raises(...) to assert it raises for
weakly_typed_topk_model, then proceed to call
convert_to_f16(weakly_typed_topk_model, keep_io_types=True) as before; reference
the test function name test_convert_to_f16_falls_back_on_unresolvable_op and the
fixture weakly_typed_topk_model so the failure-before-conversion condition is
explicit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34432f62-d82c-4657-b227-1e7dfd5a3692

📥 Commits

Reviewing files that changed from the base of the PR and between c7dc86b and cf2d65a.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • modelopt/onnx/autocast/convert.py
  • modelopt/onnx/utils.py
  • tests/unit/onnx/autocast/test_autocast.py
  • tests/unit/onnx/test_onnx_utils.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (3)
  • modelopt/onnx/autocast/convert.py
  • tests/unit/onnx/test_onnx_utils.py
  • modelopt/onnx/utils.py

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Both critical concerns from the previous review have been addressed:

  1. Dynamic-dim collateral damage (cjluo-nv): _is_stale now only flags rank mismatches or conflicting concrete dims; symbolic dim_param renames are intentionally ignored. guess_output_rank=True was also dropped. The new test_clear_stale_value_info_preserves_dynamic_dim_names test declares [my_batch, 4] while inference would derive [batch, 4] and asserts my_batch is preserved — exactly the regression case raised before.

  2. strict_mode=True AutoCast hot-path: The new test_convert_to_f16_falls_back_on_unresolvable_op exercises the full convert_to_f16 entry point (not just infer_types) on a weakly-typed TopK model, confirming the standalone fallback fires end-to-end.

The local SymbolicShapeInference import inside _reconcile_stale_output_shapes is justified (optional onnxruntime dep with a graceful fallback to infer_shapes). Fix is well-scoped, no design-review concerns (additive fallback paths only, no new subsystem).

Complex PR: spans 5 directories (≥ 5). Looping in a human for approval.

Comment thread modelopt/onnx/utils.py
Comment on lines +1878 to +1879
def _reconcile_stale_output_shapes(model: onnx.ModelProto) -> int:
"""Re-derive stale ``graph.output`` shapes from the operator graph.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajrasane There's similar logic in PrecisionConverter::_clear_types_and_shapes_recursive. In which we don't check what's stale and what's not, just clear everything since we're going to infer types (and optionally shapes) from the graph.
Perhaps we can extract it to utils + add the fallback to standalone type inference on shape inference expcetion?

def _clear_types_and_shapes_recursive(
self, graph: onnx.GraphProto, is_subgraph: bool = False
) -> None:
"""Recursively clear type/shape information for a graph and all its subgraphs.
If use_standalone_type_inference is True, we clear only types, not shapes.
For subgraphs, input types/shapes are cleared, so that the input types/shapes are propagated
from the main graph.
Args:
graph: The ONNX graph to clear types and shapes for.
is_subgraph: Whether this is a subgraph (True) or the main graph (False).
"""
def _clear_callback(g: onnx.GraphProto, parent: onnx.NodeProto, is_sub: bool) -> None:
logger.debug(
f"Clearing types/shapes in {'subgraph' if is_sub else 'main graph'}: {g.name}"
)
# Clear type/shape information for inputs (only for subgraphs, not main graph inputs)
if is_sub:
for inp in g.input:
if inp.type.HasField("tensor_type"):
inp.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
if not self.use_standalone_type_inference:
for idx, d in enumerate(inp.type.tensor_type.shape.dim):
if d.dim_value:
inp.type.tensor_type.shape.dim[idx].dim_param = "unk"
if is_sub:
# Identify which tensors are produced by nodes in this subgraph
subgraph_outputs = set()
for node in g.node:
subgraph_outputs.update(node.output)
# Clear value_info only for intermediates produced by nodes in this subgraph
for vi in g.value_info:
if vi.name in subgraph_outputs:
vi.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
if not self.use_standalone_type_inference:
for idx, d in enumerate(vi.type.tensor_type.shape.dim):
if d.dim_value:
vi.type.tensor_type.shape.dim[idx].dim_param = "unk"
else:
for vi in g.value_info:
vi.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
for idx, d in enumerate(vi.type.tensor_type.shape.dim):
if d.dim_value:
vi.type.tensor_type.shape.dim[idx].dim_param = "unk"
# Clear outputs for both main graph and subgraphs
for out in g.output:
out.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
if not self.use_standalone_type_inference:
for idx, d in enumerate(out.type.tensor_type.shape.dim):
if d.dim_value:
out.type.tensor_type.shape.dim[idx].dim_param = "unk"
utils.walk_subgraphs_recursive(graph, _clear_callback, is_subgraph=is_subgraph)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galagam — good call, addressed in 7374672.

1. Extracted the clear logic to utils. _clear_types_and_shapes_recursive is now a module-level clear_types_and_shapes_recursive(graph, clear_shapes=...) in autocast/utils.py, right next to walk_subgraphs_recursive (its only dependency). PrecisionConverter calls it with clear_shapes=not self.use_standalone_type_inference. I kept it in the autocast utils rather than the general onnx/utils.py because the latter can only reach walk_subgraphs_recursive through the lazy import it already does in _infer_types_only — moving it there would force the same circular-import dance.

2. Fallback to standalone type inference on shape-inference exception is centralized in onnx_utils.infer_types (the try/except in this PR). The AutoCast precision-conversion path already routes through it via infer_types(..., strict_mode=True), so it gets the fallback for free.

One thing I deliberately did not fold into the wholesale clear: the output-shape reconciliation (_reconcile_stale_output_shapes). clear_types_and_shapes_recursive only renames concrete dims to unk and sets elem_type=UNDEFINED — it never changes a tensor's rank. The bug here is a stale rank-0 output on a tensor that's really rank-2+, so clear-and-reinfer leaves the rank-0 in place and it re-poisons inference. We also can't just ClearField("shape") the output: a graph output with no shape fails onnx.checker, and the type-only fallback won't repopulate it. So reconciliation re-derives a present, correct-rank shape (ORT symbolic inference → infer_shapes) and overwrites only genuinely-stale declarations.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 7, 2026
…utils

Move the recursive type/shape clearing out of PrecisionConverter into
modelopt/onnx/autocast/utils.py as a shared clear_types_and_shapes_recursive()
next to walk_subgraphs_recursive; PrecisionConverter now delegates to it.
Behavior is unchanged (clear_shapes = not use_standalone_type_inference).

The standalone type-inference fallback on shape-inference failure already
lives in infer_types(), which the AutoCast path exercises via strict_mode=True.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane requested a review from a team as a code owner June 16, 2026 15:57
@ajrasane ajrasane requested review from jenchen13 and meenchen June 16, 2026 15:57

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of the ONNX INT8+FP16 ShapeInferenceError fix (7 files, +338/-74). All critical concerns from the prior review have been addressed:

  • 💬 Dynamic-dim collateral damage (cjluo-nv, critical): Resolved. _is_stale now overwrites a declared output shape only on a rank mismatch or a conflicting concrete dim; symbolic dim_param renames are intentionally ignored, and guess_output_rank=True was dropped. New test_clear_stale_value_info_preserves_dynamic_dim_names declares ["my_batch", 4] vs inferred ["batch", 4] and asserts the declaration survives — exactly the regression case raised.
  • 💬 strict_mode=True AutoCast hot path (cjluo-nv, critical): Resolved. test_convert_to_f16_falls_back_on_unresolvable_op exercises the full convert_to_f16 entry point on a weakly-typed TopK model, confirming the standalone fallback fires end-to-end (not just infer_types in isolation).
  • 💬 Duplicate clear logic (galagam, important): Resolved. _clear_types_and_shapes_recursive was extracted to a module-level clear_types_and_shapes_recursive(graph, clear_shapes=...) in autocast/utils.py; PrecisionConverter now calls it with clear_shapes=not self.use_standalone_type_inference, and the shape-inference fallback is centralized in infer_types. Author gave a clear rationale for keeping _reconcile_stale_output_shapes separate (clear-and-reinfer never repairs a stale rank). I confirmed the old method definition and call site are gone in the PR branch and replaced by the shared helper.

Remaining minor item: the coderabbit nitpick to assert strict ONNX inference fails before convert_to_f16 (making the fallback precondition explicit) was not adopted; the author's success-via-fallback assertion is a reasonable alternative.

Flagging for human sign-off because this changes a shared AutoCast code path (strict_mode default + new output-shape reconciler running on every model through clear_stale_value_info, not just weakly-typed exports) and the end-to-end / object-detection validation was done manually on a private model not in CI. New tests are CPU-only and synthetic; an ONNX-pipeline owner should confirm the existing AutoCast suite + a representative dynamic-shape/real model still behave correctly.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/onnx/autocast/utils.py`:
- Around line 169-173: The code in the loop iterating over g.value_info is
unconditionally modifying shape dimensions by rewriting dim_value to dim_param
without respecting the clear_shapes parameter. Add a conditional check so that
the shape dimension rewriting (the nested loop that iterates over
vi.type.tensor_type.shape.dim and sets dim_param to "unk") only executes when
clear_shapes=True, ensuring the function respects its contract of only clearing
shapes when explicitly requested.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 37d0d224-630b-4dbf-a8e3-ea060f8e35d2

📥 Commits

Reviewing files that changed from the base of the PR and between cf2d65a and 7374672.

📒 Files selected for processing (2)
  • modelopt/onnx/autocast/precisionconverter.py
  • modelopt/onnx/autocast/utils.py

Comment thread modelopt/onnx/autocast/utils.py Outdated
Comment on lines +169 to +173
for vi in g.value_info:
vi.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
for idx, d in enumerate(vi.type.tensor_type.shape.dim):
if d.dim_value:
vi.type.tensor_type.shape.dim[idx].dim_param = "unk"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

clear_shapes is ignored for main-graph value_info dims.

Line 171 currently rewrites shape dims even when clear_shapes=False, which contradicts the function contract and can erase shape metadata in standalone type-inference mode.

Suggested patch
         else:
             for vi in g.value_info:
                 vi.type.tensor_type.elem_type = onnx.TensorProto.UNDEFINED
-                for idx, d in enumerate(vi.type.tensor_type.shape.dim):
-                    if d.dim_value:
-                        vi.type.tensor_type.shape.dim[idx].dim_param = "unk"
+                if clear_shapes:
+                    for idx, d in enumerate(vi.type.tensor_type.shape.dim):
+                        if d.dim_value:
+                            vi.type.tensor_type.shape.dim[idx].dim_param = "unk"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/autocast/utils.py` around lines 169 - 173, The code in the loop
iterating over g.value_info is unconditionally modifying shape dimensions by
rewriting dim_value to dim_param without respecting the clear_shapes parameter.
Add a conditional check so that the shape dimension rewriting (the nested loop
that iterates over vi.type.tensor_type.shape.dim and sets dim_param to "unk")
only executes when clear_shapes=True, ensuring the function respects its
contract of only clearing shapes when explicitly requested.

Trim the docstring to the module's density and factor the repeated
elem_type/shape clearing into a local helper. No functional change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/onnx-stale-shape-typeinfer-fix branch from 7e4cfe9 to 20562db Compare June 16, 2026 16:29

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of the ONNX INT8+FP16 ShapeInferenceError fix (7 files, +318/-74). All prior critical comments are resolved; flagging for human ONNX-pipeline sign-off because this touches a shared AutoCast hot path and E2E validation was manual/private.

Prior comments — status:

  • 💬 Dynamic-dim collateral damage (cjluo-nv, critical): Resolved. _is_stale now overwrites a declared output shape only on a rank mismatch or a conflicting concrete dim; symbolic dim_param renames are ignored and guess_output_rank=True was dropped. New test_clear_stale_value_info_preserves_dynamic_dim_names declares ["my_batch", 4] vs inferred ["batch", 4] and asserts the declaration survives — exactly the regression raised. Still warrants human sign-off because the reconciler runs on every model through clear_stale_value_info, not just weakly-typed exports, and CI coverage is synthetic CPU-only.
  • 💬 strict_mode=True AutoCast hot path (cjluo-nv, critical): Resolved. test_convert_to_f16_falls_back_on_unresolvable_op exercises the full convert_to_f16 entry point on a weakly-typed TopK model, confirming the standalone fallback fires end-to-end — still flagging because the strict_mode default broadening reroutes previously-tolerated graphs and only a synthetic fixture exercises it.
  • 💬 Duplicate clear logic (galagam, important): Resolved. _clear_types_and_shapes_recursive was extracted to module-level clear_types_and_shapes_recursive(graph, clear_shapes=...) in autocast/utils.py; old method and call site removed (verified). Author gave a clear rationale for keeping _reconcile_stale_output_shapes separate (clear-and-reinfer never repairs a stale rank).
  • CodeRabbit nitpick (assert strict inference fails before convert): Minor, not adopted; success-via-fallback assertion is a reasonable equivalent.

Verified — not a regression (re CodeRabbit's "clear_shapes ignored for main-graph value_info"): I diffed the extracted helper against the original. The main-graph value_info branch cleared shapes unconditionally in the original too; the author preserved that with clear_shape=True. CodeRabbit's suggested patch would actually change behavior. The only wart is that the docstring's clear_shapes contract now reads as if it gates all shape clearing when it doesn't for main-graph value_info — non-blocking.

Answer to "how to simplify while keeping functionality": A few options, all optional: (1) _reconcile_stale_output_shapes does declared→clear→infer→compare with _sig bytes-comparison and a separate _is_stale; since _is_stale already encodes the overwrite policy, the trailing _sig(decl) != _sig(new_shape) change-count could be derived directly from whether _is_stale fired, dropping the _sig helper. (2) The two inference attempts (SymbolicShapeInference then infer_shapes) duplicate the output-shape extraction dict-comprehension — extract a small _outputs_with_shapes(model) helper. (3) The reconciler currently runs on every model; gating it on "only when at least one declared output has a rank that conflicts with topology" (the original motivation) would both simplify the common-case path and shrink blast radius — but that's a behavior trade-off the owner should weigh, not a free simplification.

No licensing concerns (standard NVIDIA header). No prompt-injection in the PR content.

@kevalmorabia97 kevalmorabia97 removed the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 17, 2026
Comment thread CHANGELOG.rst

**Bug Fixes**

- Fix ``ShapeInferenceError`` during ONNX INT8 + FP16 quantization (``--high_precision_dtype fp16``) of weakly-typed models (e.g. TensorFlow exports) that carry stale rank-0 ``graph.output`` shapes or ops such as ``TopK`` that ONNX's static shape inference cannot resolve. ``clear_stale_value_info`` now reconciles stale output shapes via symbolic shape inference (keeping every output's shape field populated), and AutoCast runs ONNX shape inference in strict mode and falls back to schema-based standalone type inference when it fails, so unresolved ops no longer leave tensors untyped.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goes in 0.46 section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants