Add YAML based AutoQuantize recipe (currently only CLI is supported)#1523
Add YAML based AutoQuantize recipe (currently only CLI is supported)#1523juhi10071998 wants to merge 4 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an AutoQuantize recipe type with Pydantic schemas, an example recipe, recipe-aware calibration and CLI integration, refactors the auto_quantize orchestrator to accept resolved knobs, maps recipe candidate formats to canonical presets, and adds unit tests for recipe loading and validation. ChangesAutoQuantize Recipe Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1087: The conditional that chooses the auto-quantize branch incorrectly
treats falsy numeric values as "unset" — change the check so it explicitly tests
presence of the CLI value: in the expression that currently reads "if
isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits",
replace the truthy check with an explicit presence check for
args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so
that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe
isinstance check as-is.
In `@modelopt/recipe/config.py`:
- Around line 112-117: The qformat field currently accepts any string but should
be validated against the allowed keys; update the ModeloptField declaration for
qformat (and/or add a pydantic validator on the recipe class handling kv_cache)
to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing
None), raising a clear schema/validation error at recipe-load time instead of
allowing a later KeyError; ensure you reference the qformat field,
ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid
inputs are caught early.
In `@tests/unit/recipe/test_loader.py`:
- Around line 286-293: The test contains a function-local import "import
modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48396567-3825-425a-b877-f63b60bb6545
📒 Files selected for processing (4)
examples/llm_ptq/hf_ptq.pymodelopt/recipe/config.pymodelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yamltests/unit/recipe/test_loader.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1523 +/- ##
==========================================
+ Coverage 76.68% 76.79% +0.10%
==========================================
Files 476 476
Lines 51891 51923 +32
==========================================
+ Hits 39795 39876 +81
+ Misses 12096 12047 -49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
realAsma
left a comment
There was a problem hiding this comment.
Proposal looks good; What is the plan to plugin these yaml recipes to auto_quantize API?
Thanks for the review Asma! I believe the actual |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/llm_ptq/hf_ptq.py (1)
1123-1123:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRespect an explicit recipe
disabled_layers: [](don’t collapse it viaor).In
examples/llm_ptq/hf_ptq.py, the call site:disabled_layers=aq.disabled_layers or default_disabled_layers,treats an explicitly provided empty list (
[]) as falsy and replaces it withdefault_disabled_layers, so the recipe can’t author “disable nothing”.Update the logic to distinguish “field omitted” vs “field provided” (e.g., use Pydantic field-set tracking):
disabled_layers=aq.disabled_layers if "disabled_layers" in aq.model_fields_set else default_disabled_layers🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` at line 1123, The call uses "disabled_layers=aq.disabled_layers or default_disabled_layers" which treats an explicit empty list as falsy; change it to detect whether the field was provided on the Pydantic model (e.g., use aq.model_fields_set) and only fall back to default_disabled_layers when the field was omitted — update the assignment to use something like: if "disabled_layers" in aq.model_fields_set then use aq.disabled_layers else use default_disabled_layers so an explicit [] is preserved; adjust the call site in hf_ptq.py where disabled_layers is passed and reference aq and default_disabled_layers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1123: The call uses "disabled_layers=aq.disabled_layers or
default_disabled_layers" which treats an explicit empty list as falsy; change it
to detect whether the field was provided on the Pydantic model (e.g., use
aq.model_fields_set) and only fall back to default_disabled_layers when the
field was omitted — update the assignment to use something like: if
"disabled_layers" in aq.model_fields_set then use aq.disabled_layers else use
default_disabled_layers so an explicit [] is preserved; adjust the call site in
hf_ptq.py where disabled_layers is passed and reference aq and
default_disabled_layers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9ad220cd-8ec5-4eeb-8977-9eb19219c0d6
📒 Files selected for processing (3)
examples/llm_ptq/hf_ptq.pymodelopt/recipe/config.pytests/unit/recipe/test_loader.py
27d1a12 to
74d3c66
Compare
| def test_load_recipe_autoquantize_missing_section_raises(tmp_path): | ||
| """An AutoQuantize recipe missing the ``auto_quantize`` section is rejected.""" | ||
| bad = tmp_path / "bad.yml" | ||
| bad.write_text("metadata:\n recipe_type: auto_quantize\n") | ||
| with pytest.raises(ValueError, match="auto_quantize"): | ||
| load_recipe(bad) |
There was a problem hiding this comment.
Should we add a RecipeType.AUTO_QUANTIZE: "auto_quantize" in REQUIRED_SECTION_PER_RECIPE_TYPE in modelopt/recipe/loader.py?
There was a problem hiding this comment.
agreed, makes sense, let me update that
| description="Path to save/restore search state for resume or cheap re-solve.", | ||
| ) | ||
|
|
||
| kv_cache: AutoQuantizeKVCache | None = ModeloptField( |
There was a problem hiding this comment.
let's not use a hard coded string, let's also make it a quant cfg so user can customize
There was a problem hiding this comment.
I see, so if I understand correctly, they could be some presets that we have defined? https://github.com/NVIDIA/Model-Optimizer/tree/main/modelopt_recipes/configs/ptq/presets/kv?
There was a problem hiding this comment.
just use the raw config as what you do for candidate_formats: list[QuantizeConfig]
1596ec2 to
31bb878
Compare
| # All auto_quantize() knobs are resolved here before calling the helper. | ||
| # Helper is a leaf orchestrator — it does not know whether inputs came from | ||
| # CLI args or a recipe. | ||
| if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits is not None: |
There was a problem hiding this comment.
I agree, I think we can remove the auto_quantize specific args that are part of the proposed recipe now
31bb878 to
74d17c9
Compare
There was a problem hiding this comment.
Can we add support to specify the per format effective bit cost?
Currently we have a heuristic which uses 4.0 for FP4 and 8 for FP8;
The actual effective bits of FP4 is 4.5.
It will be nice if we can specify the FP4 cost from recipe libraries.
There was a problem hiding this comment.
Thanks Asma, yes that makes sense.
I am thinking of something where we specify this information in the existing recipe libraries (presets), and then we import them in the candidate_formats in auto_quant recipe.
Something like this-
imports:
base_disable_all: configs/ptq/units/base_disable_all
default_disabled_quantizers: configs/ptq/units/default_disabled_quantizers
w4a4_nvfp4_nvfp4: configs/ptq/units/w4a4_nvfp4_nvfp4
fp8: configs/ptq/presets/model/fp8
auto_quantize:
candidate_formats:
# Inline NVFP4 with custom effective_bits
- algorithm: max
effective_bits: 4.7 # ← inline value
quant_cfg:
- $import: base_disable_all
- $import: w4a4_nvfp4_nvfp4
- $import: default_disabled_quantizers
# Use FP8 preset directly
- $import: fp8
And then in the estimate_quant_compression, we will perform the override and skip the quant compression computation if the user already provides the effective bits in the QuantRecipe
def estimate_quant_compression(quant_cfg: QuantizeConfig) -> float:
# NEW: if user supplied effective_bits, use it
if quant_cfg.effective_bits is not None:
return quant_cfg.effective_bits / 16.0
# FALLBACK: existing heuristic (today's behavior)
cfgs = [e.get("cfg") for e in quant_cfg.quant_cfg if e.get("enable", True)]
return min(estimate_quant_compression_for_quantizer(c) for c in cfgs)
cc- @shengliangxu , @Edwardf0t1
It is still open item whether we want to include the effective bits as part of the QuantConfig or of the AutoQuantize class
There was a problem hiding this comment.
Implemented in commit. Adds an optional effective_bits: float | None field on QuantizeConfig; when set, estimate_quant_compression() uses it directly, bypassing the heuristic. Backward compatible: existing recipes without the field hit the heuristic exactly as before.
For now I added in the QuantizeConfig instead of the auto_quantize config.
-
Add effective_bits to QuantizeConfig— preset YAMLs in configs/ptq/presets/... carry the override; recipes inherit via $import; or override inline through sibling-field merge.
-
Add an autoquant-only override (e.g. a candidate_effective_bits: dict[str, float] map on AutoQuantizeConfig) — strictly autoquant scope.
For now I have picked option1 as users get the inline form as below, and in option2 we may have to add more wrapper class in the auto_quantize config {when the QuantRecipe gets constructed for auto_quantize)
candidate_formats:
- $import: nvfp4
effective_bits: 4.5 # inline override on the candidate itself
- $import: fp8
Let me know your thoughts if this aligns with what you had in mind or we can discuss further
cc- @shengliangxu , @Edwardf0t1 , @meenchen
Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Signed-off-by: Juhi Mittal <juhim@nvidia.com>
… update the kv_cache pydantic type in YAML str -> QuantizeConfig, also update the dispatch in hf_ptq.py now, also add REQUIRED_SECTION_PER_RECIPE_TYPE for Autoquantize and fix a minor bug there Signed-off-by: Juhi Mittal <juhim@nvidia.com>
74d17c9 to
8b1d3c6
Compare
…cost num_bits per recipe Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Add AutoQuantize YAML based recipe support to
mtq.auto_quantizeWhat does this PR do?
Type of change: New feature.
Extends the recipe system (PR #1423) to support
mtq.auto_quantize. Userscan now run autoquant via a single
--recipe <name>flag instead ofcombining
--auto_quantize_bits,--qformat,--auto_quantize_method,etc. The recipe carries the full search spec — candidate formats, budget,
scoring method, KV cache scheme — as a typed YAML.
Mirrors the existing PTQ recipe pattern (PR #1423): recipe is authoritative
for the search; CLI flags supply runtime concerns (dataset, calib size,
batch size).
Usage
Example recipe (
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml):Key design points
constraintsshapemtq.auto_quantizenested dict exactly — zero-transformation dispatch via.model_dump(exclude_none=True). Future-compat with PR #1497 (cost models).kv_cache.qformatfield, not per-candidate$import(avoids duplication when KV is shared across candidates).effective_bits, candidates, etc.). CLI may fall back only for orthogonal post-step fields — today onlykv_cache.qformat.--auto_quantize_bits + --recipeerrors out explicitly.auto_quantize()helper layoutquantize_main.Testing
Unit tests (
tests/unit/recipe/test_loader.py, 7 tests):method=gradient,num_score_steps=128,score_checkpoint=None)$importedcandidates byte-identical tomtq.NVFP4_DEFAULT_CFG/FP8_DEFAULT_CFG(single source of truth)auto_quantizesection,<2candidates,effective_bitsoutside(0, 16]kv_cachefield is optionalEquivalence smoke on
Qwen/Qwen3-8Bat--calib_size 512:hf_quant_config.jsonis byte-identical between the two paths.Backward compatibility
✅ Yes. All four existing flows preserved:
--qformat nvfp4)--auto_quantize_bits 4.8)--recipe general/ptq/...)One new explicit error:
--auto_quantize_bits + --recipe(previously would silently honor recipe). Fails fast with a clear message.Files changed
modelopt/recipe/config.py— Pydantic schema (AutoQuantizeConfig, etc.) +RecipeType.AUTO_QUANTIZEenum + dispatch entryexamples/llm_ptq/hf_ptq.py— dispatch site resolves recipe/CLI knobs and passes them toauto_quantize()as kw-only kwargs; helper signature is pure value-drivenmodelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml— example recipetests/unit/recipe/test_loader.py— 7 unit testsChecklist
git commit -s -S)/claude review)Summary by CodeRabbit