Add YAML based AutoQuantize recipe (currently only CLI is supported) by juhi10071998 · Pull Request #1523 · NVIDIA/Model-Optimizer

juhi10071998 · 2026-05-21T00:38:11Z

Add AutoQuantize YAML based recipe support to `mtq.auto_quantize`

What does this PR do?

Type of change: New feature.

Extends the recipe system (PR #1423) to support mtq.auto_quantize. Users
can now run autoquant via a single --recipe <name> flag instead of
combining --auto_quantize_bits, --qformat, --auto_quantize_method,
etc. The recipe carries the full search spec — candidate formats, budget,
scoring method, KV cache scheme — as a typed YAML.

Mirrors the existing PTQ recipe pattern (PR #1423): recipe is authoritative
for the search; CLI flags supply runtime concerns (dataset, calib size,
batch size).

Usage

# Before:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --qformat nvfp4,fp8 --auto_quantize_bits 4.8 \
    --auto_quantize_method gradient --kv_cache_qformat fp8_cast \
    --calib_size 512 --export_path ./out

# After:
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B \
    --recipe general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast \
    --calib_size 512 --export_path ./out

Example recipe (modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml):

imports:
  nvfp4: configs/ptq/presets/model/nvfp4
  fp8: configs/ptq/presets/model/fp8
  kv_fp8_cast: configs/ptq/units/kv_fp8_cast

metadata:
  recipe_type: auto_quantize
  description: Mixed NVFP4 + FP8 at 4.8 effective bits with FP8 KV cache (cast).

auto_quantize:
  constraints:
    effective_bits: 4.8

  candidate_formats:
    - $import: nvfp4
    - $import: fp8

  kv_cache:
    quant_cfg:
      - $import: kv_fp8_cast

  method: gradient
  num_score_steps: 128

  disabled_layers:
    - "*lm_head*"

Key design points

#	Decision	Choice
1	`constraints` shape	Mirror upstream `mtq.auto_quantize` nested dict exactly — zero-transformation dispatch via `.model_dump(exclude_none=True)`. Future-compat with PR #1497 (cost models).
2	KV cache placement	Top-level optional `kv_cache.qformat` field, not per-candidate `$import` (avoids duplication when KV is shared across candidates).
3	CLI override policy	Recipe is strict-authoritative for LP search fields (`effective_bits`, candidates, etc.). CLI may fall back only for orthogonal post-step fields — today only `kv_cache.qformat`. `--auto_quantize_bits + --recipe` errors out explicitly.
4	`auto_quantize()` helper layout	Helper is a leaf orchestrator — does not know whether inputs came from CLI or recipe. All resolution happens at the dispatch site in `quantize_main`.

Testing

Unit tests (tests/unit/recipe/test_loader.py, 7 tests):

Built-in recipe loads, type dispatch is correct
Pydantic defaults applied (method=gradient, num_score_steps=128, score_checkpoint=None)
$imported candidates byte-identical to mtq.NVFP4_DEFAULT_CFG / FP8_DEFAULT_CFG (single source of truth)
Schema validation rejects: missing auto_quantize section, <2 candidates, effective_bits outside (0, 16]
kv_cache field is optional

Equivalence smoke on Qwen/Qwen3-8B at --calib_size 512:

                            CLI (--auto_quantize_bits 6.0)   Recipe (effective_bits: 6.0)
quant_algo                  MIXED_PRECISION                  MIXED_PRECISION
kv_cache_quant_algo         FP8                              FP8
Total quantized layers      252                              252
NVFP4 layers                157                              157
FP8 layers                  95                               95
hf_quant_config.json hash   4b0564bf1f613132                 4b0564bf1f613132

hf_quant_config.json is byte-identical between the two paths.

Backward compatibility

✅ Yes. All four existing flows preserved:

Flow	Path	Status
CLI PTQ (`--qformat nvfp4`)	unchanged	✓
CLI autoquant (`--auto_quantize_bits 4.8`)	dispatch site resolves args inline; helper is pure orchestrator now (no behavior change)	✓
PTQ recipe (`--recipe general/ptq/...`)	recipe-load gate widened to accept PTQ + AutoQuantize	✓
AutoQuantize recipe (NEW)	new dispatch branch	✓

One new explicit error: --auto_quantize_bits + --recipe (previously would silently honor recipe). Fails fast with a clear message.

Files changed

modelopt/recipe/config.py — Pydantic schema (AutoQuantizeConfig, etc.) + RecipeType.AUTO_QUANTIZE enum + dispatch entry
examples/llm_ptq/hf_ptq.py — dispatch site resolves recipe/CLI knobs and passes them to auto_quantize() as kw-only kwargs; helper signature is pure value-driven
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml — example recipe
tests/unit/recipe/test_loader.py — 7 unit tests

Checklist

Backward compatible
Signed commits (git commit -s -S)
Pre-commit clean
New unit tests added
CHANGELOG (n/a — new feature, but maintainer to decide)
Claude review (/claude review)

Summary by CodeRabbit

New Features
- First-class auto-quantize recipe type and a built-in mixed-precision recipe targeting ~4.8 effective bits; calibration can include labels for gradient-based auto-quantize and recipes may override KV-cache qformat.
Bug Fixes
- Unified recipe- and CLI-driven auto-quantize flows with earlier validation, clearer incompatible-option rejection, and broader batch-size probing to trigger recipe-driven runs; auto-quantize now receives resolved tuning knobs explicitly.
Documentation
- Clarified help text on budget ownership and KV-cache override behavior.
Tests
- Added tests for recipe loading, defaults, validation, and error cases.

coderabbitai · 2026-05-21T00:38:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an AutoQuantize recipe type with Pydantic schemas, an example recipe, recipe-aware calibration and CLI integration, refactors the auto_quantize orchestrator to accept resolved knobs, maps recipe candidate formats to canonical presets, and adds unit tests for recipe loading and validation.

Changes

AutoQuantize Recipe Feature

Layer / File(s)	Summary
AutoQuantize Recipe Schema and Types `modelopt/recipe/config.py`	Introduces `RecipeType.AUTO_QUANTIZE` and adds `AutoQuantizeKVCache`, `AutoQuantizeConstraints`, `AutoQuantizeConfig`, and `ModelOptAutoQuantizeRecipe` Pydantic models with validators enforcing `effective_bits` range `(0,16]` and requiring at least two `candidate_formats`. Updates `RECIPE_TYPE_TO_CLASS`.
AutoQuantize Example Recipe `modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml`	Adds an NVFP4+FP8 mixed-precision AutoQuantize recipe targeting 4.8 effective bits, gradient scoring with 128 steps, KV-cache `fp8_cast`, and disabled `lm_head` pattern.
Recipe-Aware Calibration Setup `examples/llm_ptq/hf_ptq.py`	Expands recipe imports, adds optional `recipe` parameter to `make_calib_dataloader()`, and updates `include_labels` logic to include labels when either CLI or a `ModelOptAutoQuantizeRecipe` requests gradient-based scoring.
Auto-Quantize Function Refactoring `examples/llm_ptq/hf_ptq.py`	Refactors `auto_quantize()` into a keyword-only orchestrator that accepts resolved knobs (`auto_quantize_method`, `auto_quantize_score_size`, `auto_quantize_checkpoint`, `constraints`, `quantization_formats`, `disabled_layers`, `kv_cache_qformat`) and forwards them to `mtq.auto_quantize()` with KV-cache qformat handling.
Recipe-Driven Auto-Quantize Orchestration `examples/llm_ptq/hf_ptq.py`	Loads and validates recipes in `quantize_main()`, rejects `--auto_quantize_bits` when `--recipe` is provided, broadens batch-size probing to trigger for AutoQuantize recipes, resolves constraints and candidate formats from recipe (mapping dumps back to `QUANT_CFG_CHOICES`) or CLI, resolves KV-cache qformat from recipe or CLI fallback, and invokes the refactored `auto_quantize()`. Updates `--recipe` help text.
AutoQuantize Recipe Tests `tests/unit/recipe/test_loader.py`	Adds imports and tests for loading built-in and custom AutoQuantize recipes, validating `effective_bits`, candidate formats, `kv_cache` parsing/defaults, matching candidate presets against `modelopt.torch.quantization`, and negative cases (missing section, <2 candidates, out-of-range bits, invalid kv_cache qformat).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

sychen52
cjluo-nv
shengliangxu
yeyu-nvidia

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security issues found: no torch.load/numpy.load problems, trust_remote_code is safe CLI arg, no eval/exec, no nosec comments, safe YAML parsing, no new dependencies.
Title check	✅ Passed	The title accurately summarizes the primary change: adding YAML-based AutoQuantize recipe support. It is specific, concise, and directly matches the main changeset objective.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch juhim/autoquant-recipe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

copy-pr-bot · 2026-05-21T00:38:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-05-21T00:41:57Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1523/
Built to branch `gh-pages` at 2026-05-22 22:49 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1087: The conditional that chooses the auto-quantize branch incorrectly
treats falsy numeric values as "unset" — change the check so it explicitly tests
presence of the CLI value: in the expression that currently reads "if
isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits",
replace the truthy check with an explicit presence check for
args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so
that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe
isinstance check as-is.

In `@modelopt/recipe/config.py`:
- Around line 112-117: The qformat field currently accepts any string but should
be validated against the allowed keys; update the ModeloptField declaration for
qformat (and/or add a pydantic validator on the recipe class handling kv_cache)
to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing
None), raising a clear schema/validation error at recipe-load time instead of
allowing a later KeyError; ensure you reference the qformat field,
ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid
inputs are caught early.

In `@tests/unit/recipe/test_loader.py`:
- Around line 286-293: The test contains a function-local import "import
modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48396567-3825-425a-b877-f63b60bb6545

📥 Commits

Reviewing files that changed from the base of the PR and between c9098b6 and 75e16d2.

📒 Files selected for processing (4)

examples/llm_ptq/hf_ptq.py
modelopt/recipe/config.py
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml
tests/unit/recipe/test_loader.py

codecov · 2026-05-21T00:51:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.79%. Comparing base (3ff15cc) to head (635227b).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1523      +/-   ##
==========================================
+ Coverage   76.68%   76.79%   +0.10%     
==========================================
  Files         476      476              
  Lines       51891    51923      +32     
==========================================
+ Hits        39795    39876      +81     
+ Misses      12096    12047      -49

Flag	Coverage Δ
examples	`41.66% <84.84%> (+2.61%)`	⬆️
gpu	`59.51% <84.84%> (-0.61%)`	⬇️
regression	`15.24% <75.75%> (+0.08%)`	⬆️
unit	`52.74% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

realAsma

Proposal looks good; What is the plan to plugin these yaml recipes to auto_quantize API?

juhi10071998 · 2026-05-21T17:20:20Z

Proposal looks good; What is the plan to plugin these yaml recipes to auto_quantize API?

Thanks for the review Asma!
Currently I am resolving the recipe at the dispatch site (so whether it is CLI/ recipe- they get resolved) and then passed to the auto_quantize helper inside hf_ptq.py. some details here we can review in today's meeting

I believe the actual mtq.auto_quantize API will stay as is currently as I understand, similar to the quantize API (the API shouldn't differentiate whether the args came from CLI/ recipe)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/llm_ptq/hf_ptq.py (1)
1123-1123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Respect an explicit recipe disabled_layers: [] (don’t collapse it via or).

In examples/llm_ptq/hf_ptq.py, the call site:
disabled_layers=aq.disabled_layers or default_disabled_layers,
treats an explicitly provided empty list ([]) as falsy and replaces it with default_disabled_layers, so the recipe can’t author “disable nothing”.

Update the logic to distinguish “field omitted” vs “field provided” (e.g., use Pydantic field-set tracking):

disabled_layers=aq.disabled_layers if "disabled_layers" in aq.model_fields_set else default_disabled_layers
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` at line 1123, The call uses
"disabled_layers=aq.disabled_layers or default_disabled_layers" which treats an
explicit empty list as falsy; change it to detect whether the field was provided
on the Pydantic model (e.g., use aq.model_fields_set) and only fall back to
default_disabled_layers when the field was omitted — update the assignment to
use something like: if "disabled_layers" in aq.model_fields_set then use
aq.disabled_layers else use default_disabled_layers so an explicit [] is
preserved; adjust the call site in hf_ptq.py where disabled_layers is passed and
reference aq and default_disabled_layers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1123: The call uses "disabled_layers=aq.disabled_layers or
default_disabled_layers" which treats an explicit empty list as falsy; change it
to detect whether the field was provided on the Pydantic model (e.g., use
aq.model_fields_set) and only fall back to default_disabled_layers when the
field was omitted — update the assignment to use something like: if
"disabled_layers" in aq.model_fields_set then use aq.disabled_layers else use
default_disabled_layers so an explicit [] is preserved; adjust the call site in
hf_ptq.py where disabled_layers is passed and reference aq and
default_disabled_layers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ad220cd-8ec5-4eeb-8977-9eb19219c0d6

📥 Commits

Reviewing files that changed from the base of the PR and between 75e16d2 and 31606d1.

📒 Files selected for processing (3)

examples/llm_ptq/hf_ptq.py
modelopt/recipe/config.py
tests/unit/recipe/test_loader.py

Edwardf0t1 · 2026-05-21T21:12:19Z

+def test_load_recipe_autoquantize_missing_section_raises(tmp_path):
+    """An AutoQuantize recipe missing the ``auto_quantize`` section is rejected."""
+    bad = tmp_path / "bad.yml"
+    bad.write_text("metadata:\n  recipe_type: auto_quantize\n")
+    with pytest.raises(ValueError, match="auto_quantize"):
+        load_recipe(bad)


Should we add a RecipeType.AUTO_QUANTIZE: "auto_quantize" in REQUIRED_SECTION_PER_RECIPE_TYPE in modelopt/recipe/loader.py?

cc @shengliangxu @realAsma

agreed, makes sense, let me update that

shengliangxu · 2026-05-21T21:51:56Z

+        description="Path to save/restore search state for resume or cheap re-solve.",
+    )
+
+    kv_cache: AutoQuantizeKVCache | None = ModeloptField(


let's not use a hard coded string, let's also make it a quant cfg so user can customize

I see, so if I understand correctly, they could be some presets that we have defined? https://github.com/NVIDIA/Model-Optimizer/tree/main/modelopt_recipes/configs/ptq/presets/kv?

just use the raw config as what you do for candidate_formats: list[QuantizeConfig]

shengliangxu · 2026-05-22T01:43:27Z

+    # All auto_quantize() knobs are resolved here before calling the helper.
+    # Helper is a leaf orchestrator — it does not know whether inputs came from
+    # CLI args or a recipe.
+    if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits is not None:


I think auto_quantize may not have that many users, so we may want to just remove the args support for auto quantize, move completely to yaml recipes. How do you guys think @realAsma @meenchen

I agree, I think we can remove the auto_quantize specific args that are part of the proposed recipe now

realAsma · 2026-05-22T17:15:22Z

Can we add support to specify the per format effective bit cost?

Currently we have a heuristic which uses 4.0 for FP4 and 8 for FP8;

The actual effective bits of FP4 is 4.5.

It will be nice if we can specify the FP4 cost from recipe libraries.

Thanks Asma, yes that makes sense.
I am thinking of something where we specify this information in the existing recipe libraries (presets), and then we import them in the candidate_formats in auto_quant recipe.

Something like this-

imports: base_disable_all: configs/ptq/units/base_disable_all default_disabled_quantizers: configs/ptq/units/default_disabled_quantizers w4a4_nvfp4_nvfp4: configs/ptq/units/w4a4_nvfp4_nvfp4 fp8: configs/ptq/presets/model/fp8 auto_quantize: candidate_formats: # Inline NVFP4 with custom effective_bits - algorithm: max effective_bits: 4.7 # ← inline value quant_cfg: - $import: base_disable_all - $import: w4a4_nvfp4_nvfp4 - $import: default_disabled_quantizers # Use FP8 preset directly - $import: fp8

And then in the estimate_quant_compression, we will perform the override and skip the quant compression computation if the user already provides the effective bits in the QuantRecipe

def estimate_quant_compression(quant_cfg: QuantizeConfig) -> float: # NEW: if user supplied effective_bits, use it if quant_cfg.effective_bits is not None: return quant_cfg.effective_bits / 16.0 # FALLBACK: existing heuristic (today's behavior) cfgs = [e.get("cfg") for e in quant_cfg.quant_cfg if e.get("enable", True)] return min(estimate_quant_compression_for_quantizer(c) for c in cfgs)

cc- @shengliangxu , @Edwardf0t1

It is still open item whether we want to include the effective bits as part of the QuantConfig or of the AutoQuantize class

Implemented in commit. Adds an optional effective_bits: float | None field on QuantizeConfig; when set, estimate_quant_compression() uses it directly, bypassing the heuristic. Backward compatible: existing recipes without the field hit the heuristic exactly as before.

For now I added in the QuantizeConfig instead of the auto_quantize config.

Add effective_bits to QuantizeConfig— preset YAMLs in configs/ptq/presets/... carry the override; recipes inherit via $import; or override inline through sibling-field merge.

Add an autoquant-only override (e.g. a candidate_effective_bits: dict[str, float] map on AutoQuantizeConfig) — strictly autoquant scope.

For now I have picked option1 as users get the inline form as below, and in option2 we may have to add more wrapper class in the auto_quantize config {when the QuantRecipe gets constructed for auto_quantize)

candidate_formats: - $import: nvfp4 effective_bits: 4.5 # inline override on the candidate itself - $import: fp8

Let me know your thoughts if this aligns with what you had in mind or we can discuss further

cc- @shengliangxu , @Edwardf0t1 , @meenchen

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

… update the kv_cache pydantic type in YAML str -> QuantizeConfig, also update the dispatch in hf_ptq.py now, also add REQUIRED_SECTION_PER_RECIPE_TYPE for Autoquantize and fix a minor bug there Signed-off-by: Juhi Mittal <juhim@nvidia.com>

…cost num_bits per recipe Signed-off-by: Juhi Mittal <juhim@nvidia.com>

juhi10071998 requested review from a team as code owners May 21, 2026 00:38

juhi10071998 requested a review from realAsma May 21, 2026 00:38

juhi10071998 marked this pull request as draft May 21, 2026 00:38

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py Outdated

Comment thread modelopt/recipe/config.py Outdated

Comment thread tests/unit/recipe/test_loader.py

realAsma reviewed May 21, 2026

View reviewed changes

juhi10071998 requested review from Edwardf0t1 and shengliangxu May 21, 2026 18:34

juhi10071998 marked this pull request as ready for review May 21, 2026 19:17

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot approved these changes May 21, 2026

View reviewed changes

juhi10071998 force-pushed the juhim/autoquant-recipe branch 3 times, most recently from 27d1a12 to 74d3c66 Compare May 21, 2026 20:53

Edwardf0t1 reviewed May 21, 2026

View reviewed changes

juhi10071998 changed the title ~~Add AutoQuantize recipe support to mtq.auto_quantize~~ Add YAML based AutoQuantize recipe (currently only CLI is supported) May 21, 2026

shengliangxu reviewed May 21, 2026

View reviewed changes

Comment thread modelopt/recipe/config.py Outdated

shengliangxu reviewed May 21, 2026

View reviewed changes

juhi10071998 force-pushed the juhim/autoquant-recipe branch from 1596ec2 to 31bb878 Compare May 22, 2026 00:18

shengliangxu reviewed May 22, 2026

View reviewed changes

juhi10071998 force-pushed the juhim/autoquant-recipe branch from 31bb878 to 74d17c9 Compare May 22, 2026 15:55

realAsma reviewed May 22, 2026

View reviewed changes

juhi10071998 added 3 commits May 22, 2026 20:43

wip: autoquant recipe schema + hf_ptq dispatch

1a99d25

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

address review comments

e5953d9

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

juhi10071998 force-pushed the juhim/autoquant-recipe branch from 74d17c9 to 8b1d3c6 Compare May 22, 2026 20:43

add effective bits in the QuantRecipe field to override the estimate …

635227b

…cost num_bits per recipe Signed-off-by: Juhi Mittal <juhim@nvidia.com>

juhi10071998 requested a review from a team as a code owner May 22, 2026 22:45

juhi10071998 requested a review from sugunav14 May 22, 2026 22:45

Conversation

juhi10071998 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add AutoQuantize YAML based recipe support to mtq.auto_quantize

What does this PR do?

Usage

Key design points

Testing

Backward compatibility

Files changed

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-22 22:49 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

juhi10071998 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juhi10071998 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juhi10071998 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juhi10071998 commented May 21, 2026 •

edited

Loading

Add AutoQuantize YAML based recipe support to `mtq.auto_quantize`

coderabbitai Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-22 22:49 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 21, 2026 •

edited

Loading

juhi10071998 commented May 21, 2026 •

edited

Loading

juhi10071998 May 22, 2026 •

edited

Loading

juhi10071998 May 22, 2026 •

edited

Loading