Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525
Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525shengliangxu wants to merge 4 commits into
Conversation
Replace the hardcoded QUANT_CFG_CHOICES / KV_QUANT_CFG_CHOICES dicts in
examples/llm_ptq/hf_ptq.py with a lazy Mapping that discovers available
qformat names by listing modelopt_recipes/configs/ptq/presets/{model,kv}/
and loads each YAML on first access via the existing
load_config(..., schema_type=QuantizeConfig) path.
A small _QFORMAT_ALIASES table keeps the previously-supported short CLI
names (int8_sq, nvfp4_awq, fp8_pb_wo, ...) working as deprecation
shims; the table is documented as not-for-extension since new formats
should land as preset YAMLs (or, longer term, as full recipes).
Also add presets/kv/fp8_cast.yaml and presets/kv/nvfp4_cast.yaml so
fp8_cast / nvfp4_cast become first-class KV presets composed from the
existing kv_fp8_cast / kv_nvfp4_cast unit fragments. This drops the
KV alias entries and lets us delete the runtime _set_kv_cache_constant_amax
helper and all three of its call sites; use_constant_amax is now
authoritative in the YAML.
Side effect: every preset YAML under presets/model/ (mxfp4, mxfp6,
mxint8, nvfp4_awq_full, nvfp4_fp8_mha, mamba_moe_*, ...) is now
automatically exposed as a valid --qformat value with no further
code change.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR refactors quantization configuration discovery in a PTQ example script from static ChangesPTQ quantization preset refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1525 +/- ##
==========================================
+ Coverage 76.75% 77.11% +0.36%
==========================================
Files 476 476
Lines 51811 51811
==========================================
+ Hits 39767 39954 +187
+ Misses 12044 11857 -187
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/llm_ptq/hf_ptq.py (1)
130-133: ⚡ Quick winDerive KV calibration skip from config semantics, not hardcoded format names.
Line 476 hardcodes cast-format names via
_KV_CAST_FORMATS. Since presets are YAML-driven now, this risks drift when presets evolve. Prefer checking whether the selected KV config actually needs calibration.Suggested refactor
- if args.kv_cache_qformat not in _KV_CAST_FORMATS: + if need_calibration({"quant_cfg": kv_cache_quant_cfg, "algorithm": "max"}): # Calibrate only the KV cache quantizers; disable all others. with mtq.set_quantizer_by_cfg_context( language_model, [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg], ): mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)Also applies to: 476-483
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` around lines 130 - 133, Replace the hardcoded _KV_CAST_FORMATS check with a semantic check on the chosen KV preset: instead of testing the format name via _KV_CAST_FORMATS, inspect the selected KV configuration object (the loaded preset used for KV, e.g., the variable that selects the KV preset in this module—refer to the code that chooses the "kv" preset) and decide to skip calibration when that config explicitly pins use_constant_amax (or an equivalent flag like requires_calibration/use_constant_amax) — remove the frozenset usage and branch on the KV config's semantic field so YAML-driven presets control the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 194-211: The CLI validation currently tests raw tokens against
_AUTO_QUANTIZE_QFORMATS, which rejects valid canonical names because
canonical/alias resolution happens later via QUANT_CFG_CHOICES; change the
validation to check against the full set of accepted keys (e.g., use
QUANT_CFG_CHOICES.keys() or build a normalized set of canonical names/aliases)
or resolve each token through the same lookup used later before rejecting.
Update the checks that reference _AUTO_QUANTIZE_QFORMATS (and any logic around
parsing auto-quantize tokens) to use QUANT_CFG_CHOICES (or a derived normalized
set) so canonical names and aliases are accepted consistently.
---
Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 130-133: Replace the hardcoded _KV_CAST_FORMATS check with a
semantic check on the chosen KV preset: instead of testing the format name via
_KV_CAST_FORMATS, inspect the selected KV configuration object (the loaded
preset used for KV, e.g., the variable that selects the KV preset in this
module—refer to the code that chooses the "kv" preset) and decide to skip
calibration when that config explicitly pins use_constant_amax (or an equivalent
flag like requires_calibration/use_constant_amax) — remove the frozenset usage
and branch on the KV config's semantic field so YAML-driven presets control the
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 61cb3534-3f30-4331-b8b7-3a3cf32cca68
📒 Files selected for processing (3)
examples/llm_ptq/hf_ptq.pymodelopt_recipes/configs/ptq/presets/kv/fp8_cast.yamlmodelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml
- Deepcopy in _PresetCfgChoices.__getitem__ so callers can freely mutate the returned quant_cfg without poisoning the cache. - Assert that _KV_NONE does not collide with any discovered KV preset. - Expand the comment on _AUTO_QUANTIZE_QFORMATS explaining why it stays hardcoded (auto_quantize compatibility is an export-path property, not a YAML-derivable one). - Add CHANGELOG entry for the qformat discovery refactor and the fp8_cast / nvfp4_cast preset promotion (including the note that out-of-tree recipes targeting cast KV must set use_constant_amax themselves now that the runtime override is gone). Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Codify the policy that the preset directory listing IS the CLI vocabulary — there is intentionally no separate allow-list. New presets are CLI-visible the moment they land in the directory; this is a feature, not an oversight. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
1. Auto-quantize validation: _AUTO_QUANTIZE_QFORMATS previously only listed the
short alias names, so passing canonical preset basenames (e.g. int8_smoothquant
instead of int8_sq) would be rejected even though the underlying configs are
identical. Switch the set to canonical names and canonicalize incoming tokens
via a new _canonical_qformat() helper so both forms are accepted.
2. KV cast detection: replace the hardcoded _KV_CAST_FORMATS = {fp8_cast,
nvfp4_cast} name set with a semantic check (_kv_cfg_uses_constant_amax) that
inspects the loaded KV cfg's *[kv]_bmm_quantizer entry for use_constant_amax.
This makes "should we skip KV calibration?" YAML-driven: any future cast-style
KV preset works without touching this script.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/llm_ptq/hf_ptq.py (1)
403-407:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize auto-quantize qformats before checking
_AUTO_QUANTIZE_QFORMATS.Line 406 validates raw CLI tokens, but Line 465 resolves them through
QUANT_CFG_CHOICES, which now accepts both canonical preset basenames and legacy aliases. A canonical preset likeint8_smoothquantis therefore accepted later but rejected here first.💡 Suggested fix
- qformat_list = args.qformat.split(",") + qformat_list = [q.strip() for q in args.qformat.split(",")] assert qformat_list, "No quantization formats provided" - # Check if all provided quantization formats are supported - assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in qformat_list), ( + canonical_qformats = [ + QUANT_CFG_CHOICES._canonical(qformat) if isinstance(QUANT_CFG_CHOICES, _PresetCfgChoices) else qformat + for qformat in qformat_list + ] + assert all(qformat is not None for qformat in canonical_qformats), ( + "Unsupported quantization format provided" + ) + assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in canonical_qformats), ( "One or more quantization formats provided are not supported for unified checkpoint export" )Also applies to: 465-465
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` around lines 403 - 407, The assertion is checking raw CLI tokens in qformat_list against _AUTO_QUANTIZE_QFORMATS before they are normalized later; update the validation to normalize each args.qformat token using the same resolution used at Line 465 (QUANT_CFG_CHOICES/its alias mapping) and then check the normalized canonical names against _AUTO_QUANTIZE_QFORMATS. Concretely, transform qformat_list by mapping each entry through the QUANT_CFG_CHOICES lookup (or its alias→canonical resolver) to produce canonical_qformats, then assert canonical_qformats is non-empty and that all entries are in _AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat, QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).
🧹 Nitpick comments (1)
examples/llm_ptq/hf_ptq.py (1)
487-499: ⚡ Quick winDrive KV-cache calibration skipping from the preset config, not the preset name.
Line 493 reintroduces a hardcoded name check after this refactor made YAML authoritative. If a future KV preset sets
use_constant_amax, it will be CLI-exposed here and then immediately recalibrated anyway.💡 Suggested fix
kv_cache_quant_cfg = copy.deepcopy(KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"]) kv_cache_quant_cfg = [ e for e in kv_cache_quant_cfg if e["quantizer_name"] != "*" ] # keep other quantizers from auto_quantize mtq.set_quantizer_by_cfg(language_model, quant_cfg=kv_cache_quant_cfg) - if args.kv_cache_qformat not in _KV_CAST_FORMATS: + needs_kv_calibration = any( + not entry.get("use_constant_amax", False) for entry in kv_cache_quant_cfg + ) + if needs_kv_calibration: # Calibrate only the KV cache quantizers; disable all others. with mtq.set_quantizer_by_cfg_context( language_model, [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg], ): mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/hf_ptq.py` around lines 487 - 499, The code currently decides whether to skip KV-cache calibration by checking the preset name (args.kv_cache_qformat not in _KV_CAST_FORMATS); instead, inspect the actual preset config (kv_cache_quant_cfg) and skip calibration when the preset indicates constant amax behavior. Replace the name-based condition with a config-based check (e.g., if not any(entry.get("use_constant_amax") for entry in kv_cache_quant_cfg): ... ) so mtq.calibrate(...) runs only when none of the KV quantizer entries specify use_constant_amax; reference kv_cache_quant_cfg, KV_QUANT_CFG_CHOICES, args.kv_cache_qformat, and mtq.calibrate in your change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 403-407: The assertion is checking raw CLI tokens in qformat_list
against _AUTO_QUANTIZE_QFORMATS before they are normalized later; update the
validation to normalize each args.qformat token using the same resolution used
at Line 465 (QUANT_CFG_CHOICES/its alias mapping) and then check the normalized
canonical names against _AUTO_QUANTIZE_QFORMATS. Concretely, transform
qformat_list by mapping each entry through the QUANT_CFG_CHOICES lookup (or its
alias→canonical resolver) to produce canonical_qformats, then assert
canonical_qformats is non-empty and that all entries are in
_AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat,
QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).
---
Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 487-499: The code currently decides whether to skip KV-cache
calibration by checking the preset name (args.kv_cache_qformat not in
_KV_CAST_FORMATS); instead, inspect the actual preset config
(kv_cache_quant_cfg) and skip calibration when the preset indicates constant
amax behavior. Replace the name-based condition with a config-based check (e.g.,
if not any(entry.get("use_constant_amax") for entry in kv_cache_quant_cfg): ...
) so mtq.calibrate(...) runs only when none of the KV quantizer entries specify
use_constant_amax; reference kv_cache_quant_cfg, KV_QUANT_CFG_CHOICES,
args.kv_cache_qformat, and mtq.calibrate in your change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a74f2b60-827e-4805-a9ff-ac9644c33ec5
📒 Files selected for processing (1)
examples/llm_ptq/hf_ptq.py
What does this PR do?
Type of change: Refactor
Replace the hardcoded
QUANT_CFG_CHOICES/KV_QUANT_CFG_CHOICESdicts inexamples/llm_ptq/hf_ptq.pywith a lazyMappingthat discovers available qformat names by listingmodelopt_recipes/configs/ptq/presets/{model,kv}/and loads each YAML on first access via the existingload_config(..., schema_type=QuantizeConfig)path. The directory listing becomes the source of truth for--qformat/--kv_cache_qformatCLI vocabulary.A small
_QFORMAT_ALIASEStable preserves previously-supported short CLI names (int8_sq,nvfp4_awq,fp8_pb_wo, ...) as deprecation shims. It is documented as not-for-extension — new formats land as preset YAMLs, and longer term, configurations should be authored as full recipes (--recipe).Also adds
presets/kv/fp8_cast.yamlandpresets/kv/nvfp4_cast.yaml, composed from the existingkv_fp8_cast/kv_nvfp4_castunit fragments. This promotesfp8_cast/nvfp4_castto first-class KV presets and lets us delete the runtime_set_kv_cache_constant_amaxhelper and all three of its call sites —use_constant_amaxis now authoritative in the YAML.Side effect: every preset YAML under
presets/model/(mxfp4,mxfp6,mxint8,nvfp4_awq_full,nvfp4_fp8_mha,mamba_moe_*, ...) is now automatically exposed as a valid--qformatvalue with no further code change.Usage
Testing
Verified locally with both
.venv(uv, py3.13) and thedev-py310-modeloptconda env:--qformatshort names resolve and produce dicts that are exactly equal to the correspondingmtq.X_DEFAULT_CFGconstants.fp8,fp8_cast,fp8_affine,nvfp4,nvfp4_cast,nvfp4_affine,nvfp4_rotate) resolve and match.fp8_cast/nvfp4_castYAML presets now containuse_constant_amax: truebaked into the[kv]_bmm_quantizercfg.fp8,nvfp4) still do not setuse_constant_amax(data-driven calibration preserved).argparseaccepts--kv_cache_qformat noneplus all cast / affine / rotate variants.KeyErrorat lookup time andargparsechoice error at the CLI.Before your PR is "Ready for review"
--qformatand--kv_cache_qformatvalues continue to work via the alias table; output configs are bit-equivalent to the prior hardcoded path.CONTRIBUTING.md: N/A — no new deps.mtq.X_DEFAULT_CFGconstants./claude reviewonce ready.Additional Information
modelopt_recipes/configs/ptq/presets/kv/{fp8_cast,nvfp4_cast}.yaml._set_kv_cache_constant_amaxhelper + all 3 call sites inhf_ptq.py.multinode_ptq.pyis intentionally untouched (out of scope for this branch).Summary by CodeRabbit
New Features
Refactor
Documentation