Skip to content

Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525

Open
shengliangxu wants to merge 4 commits into
mainfrom
shengliangx/hf-ptq-dereference-hardcoded-configs
Open

Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs)#1525
shengliangxu wants to merge 4 commits into
mainfrom
shengliangx/hf-ptq-dereference-hardcoded-configs

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu commented May 21, 2026

What does this PR do?

Type of change: Refactor

Replace the hardcoded QUANT_CFG_CHOICES / KV_QUANT_CFG_CHOICES dicts in examples/llm_ptq/hf_ptq.py with a lazy Mapping that discovers available qformat names by listing modelopt_recipes/configs/ptq/presets/{model,kv}/ and loads each YAML on first access via the existing load_config(..., schema_type=QuantizeConfig) path. The directory listing becomes the source of truth for --qformat / --kv_cache_qformat CLI vocabulary.

A small _QFORMAT_ALIASES table preserves previously-supported short CLI names (int8_sq, nvfp4_awq, fp8_pb_wo, ...) as deprecation shims. It is documented as not-for-extension — new formats land as preset YAMLs, and longer term, configurations should be authored as full recipes (--recipe).

Also adds presets/kv/fp8_cast.yaml and presets/kv/nvfp4_cast.yaml, composed from the existing kv_fp8_cast / kv_nvfp4_cast unit fragments. This promotes fp8_cast / nvfp4_cast to first-class KV presets and lets us delete the runtime _set_kv_cache_constant_amax helper and all three of its call sites — use_constant_amax is now authoritative in the YAML.

Side effect: every preset YAML under presets/model/ (mxfp4, mxfp6, mxint8, nvfp4_awq_full, nvfp4_fp8_mha, mamba_moe_*, ...) is now automatically exposed as a valid --qformat value with no further code change.

Usage

# Old short names still work via the alias shim
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat int8_sq \
    --kv_cache_qformat fp8_cast \
    --export_path out/

# New canonical preset basenames work directly
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat int8_smoothquant \
    --kv_cache_qformat fp8_cast \
    --export_path out/

# Newly-exposed presets (previously not on the CLI)
python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path <model> \
    --qformat nvfp4_awq_full \
    --export_path out/

Testing

Verified locally with both .venv (uv, py3.13) and the dev-py310-modelopt conda env:

  • All 20 previously-supported --qformat short names resolve and produce dicts that are exactly equal to the corresponding mtq.X_DEFAULT_CFG constants.
  • All 7 KV qformat names (fp8, fp8_cast, fp8_affine, nvfp4, nvfp4_cast, nvfp4_affine, nvfp4_rotate) resolve and match.
  • fp8_cast / nvfp4_cast YAML presets now contain use_constant_amax: true baked into the [kv]_bmm_quantizer cfg.
  • Non-cast variants (fp8, nvfp4) still do not set use_constant_amax (data-driven calibration preserved).
  • argparse accepts --kv_cache_qformat none plus all cast / affine / rotate variants.
  • Unknown qformats raise KeyError at lookup time and argparse choice error at the CLI.
  • All pre-commit hooks pass (ruff, mypy, bandit, license, yaml format, recipe validation).

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ — all previously-valid --qformat and --kv_cache_qformat values continue to work via the alias table; output configs are bit-equivalent to the prior hardcoded path.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new deps.
  • Did you write any new necessary tests?: ❌ — existing PTQ integration tests exercise these qformats; the refactor is config-equivalence-preserving and was spot-verified against mtq.X_DEFAULT_CFG constants.
  • Did you update Changelog?: ❌ — happy to add an entry; treating this as an internal refactor for now.
  • Did you get Claude approval on this PR?: ❌ — will run /claude review once ready.

Additional Information

  • Two new YAML presets: modelopt_recipes/configs/ptq/presets/kv/{fp8_cast,nvfp4_cast}.yaml.
  • Deletes: _set_kv_cache_constant_amax helper + all 3 call sites in hf_ptq.py.
  • multinode_ptq.py is intentionally untouched (out of scope for this branch).

Summary by CodeRabbit

  • New Features

    • Added FP8-cast and NVFP4-cast KV-cache quantization presets to expand supported quantization options.
  • Refactor

    • qformat and KV-cache qformat choices are now discovered from preset files (with backward-compatible aliases preserved).
    • KV-cache calibration behavior is driven by presets; the runtime post-edit override was removed. CLI validation and help now reflect preset-driven choices.
  • Documentation

    • Changelog updated to describe preset-driven options and new KV presets.

Review Change Stack

Replace the hardcoded QUANT_CFG_CHOICES / KV_QUANT_CFG_CHOICES dicts in
examples/llm_ptq/hf_ptq.py with a lazy Mapping that discovers available
qformat names by listing modelopt_recipes/configs/ptq/presets/{model,kv}/
and loads each YAML on first access via the existing
load_config(..., schema_type=QuantizeConfig) path.

A small _QFORMAT_ALIASES table keeps the previously-supported short CLI
names (int8_sq, nvfp4_awq, fp8_pb_wo, ...) working as deprecation
shims; the table is documented as not-for-extension since new formats
should land as preset YAMLs (or, longer term, as full recipes).

Also add presets/kv/fp8_cast.yaml and presets/kv/nvfp4_cast.yaml so
fp8_cast / nvfp4_cast become first-class KV presets composed from the
existing kv_fp8_cast / kv_nvfp4_cast unit fragments. This drops the
KV alias entries and lets us delete the runtime _set_kv_cache_constant_amax
helper and all three of its call sites; use_constant_amax is now
authoritative in the YAML.

Side effect: every preset YAML under presets/model/ (mxfp4, mxfp6,
mxint8, nvfp4_awq_full, nvfp4_fp8_mha, mamba_moe_*, ...) is now
automatically exposed as a valid --qformat value with no further
code change.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d899e5dd-ed90-4333-ad14-3718286a89d4

📥 Commits

Reviewing files that changed from the base of the PR and between 76fc552 and 29f03fa.

📒 Files selected for processing (1)
  • examples/llm_ptq/hf_ptq.py

📝 Walkthrough

Walkthrough

This PR refactors quantization configuration discovery in a PTQ example script from static mtq.*_CFG dictionaries to lazy-loaded YAML presets. It introduces a memoized preset loader, applies it across all quantization pipelines, removes post-hoc configuration overrides, and adds new KV-cache cast preset files.

Changes

PTQ quantization preset refactoring

Layer / File(s) Summary
Preset infrastructure foundation
examples/llm_ptq/hf_ptq.py (line 21, 59, 70–75, 95–229)
Imports support for lazy mappings and recipe utilities. Defines BUILTIN_CONFIG_ROOT, _PresetCfgChoices lazy mapping class, preset directory constants, backward-compatible qformat aliases, _KV_NONE sentinel, and _AUTO_QUANTIZE_QFORMATS validation set. Replaces static QUANT_CFG_CHOICES and KV_QUANT_CFG_CHOICES dicts with discovered preset mappings.
Using presets across quantization pipelines
examples/llm_ptq/hf_ptq.py (lines 406–408, 484–491, 517–524, 1174, 1186–1193, 1206)
Auto-quantize, low-memory, and mono quantization paths now retrieve KV-cache configs from KV_QUANT_CFG_CHOICES preset mappings instead of mtq module lookups. KV-cache enabling switches from string "none" to _KV_NONE sentinel. Removes _set_kv_cache_constant_amax helper and post-hoc override logic that forced use_constant_amax for cast formats. Updates error messages and validation to reflect the new preset mapping type.
CLI argument updates
examples/llm_ptq/hf_ptq.py (line 1356)
--kv_cache_qformat argument choices now includes _KV_NONE and dynamic keys from KV_QUANT_CFG_CHOICES preset mapping instead of static dict keys.
New KV-cache cast presets & changelog
modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml, modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml, CHANGELOG.rst
Adds FP8 E4M3 and NVFP4 KV-cache cast preset YAML files (each with imports and $import usage in quant_cfg) and updates the changelog describing preset-driven CLI discovery and removal of runtime use_constant_amax patching.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main refactoring: moving from hardcoded quant configs to YAML-preset-driven discovery for --qformat and --kv_cache_qformat choices.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No critical security anti-patterns found. No torch.load/numpy.load/eval/exec/hardcoded trust_remote_code/nosec comments. YAML loading uses safe_load_all with Pydantic validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/hf-ptq-dereference-hardcoded-configs

Comment @coderabbitai help to get the list of available commands and usage tips.

@shengliangxu shengliangxu changed the title Drive hf_ptq qformat choices from preset YAMLs Drive hf_ptq qformat choices from preset YAMLs (remove hardcoded CLI quant configs) May 21, 2026
@shengliangxu shengliangxu marked this pull request as ready for review May 21, 2026 19:12
@shengliangxu shengliangxu requested review from a team as code owners May 21, 2026 19:12
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.11%. Comparing base (c9098b6) to head (29f03fa).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1525      +/-   ##
==========================================
+ Coverage   76.75%   77.11%   +0.36%     
==========================================
  Files         476      476              
  Lines       51811    51811              
==========================================
+ Hits        39767    39954     +187     
+ Misses      12044    11857     -187     
Flag Coverage Δ
examples 40.26% <ø> (-0.48%) ⬇️
unit 52.63% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/llm_ptq/hf_ptq.py (1)

130-133: ⚡ Quick win

Derive KV calibration skip from config semantics, not hardcoded format names.

Line 476 hardcodes cast-format names via _KV_CAST_FORMATS. Since presets are YAML-driven now, this risks drift when presets evolve. Prefer checking whether the selected KV config actually needs calibration.

Suggested refactor
-        if args.kv_cache_qformat not in _KV_CAST_FORMATS:
+        if need_calibration({"quant_cfg": kv_cache_quant_cfg, "algorithm": "max"}):
             # Calibrate only the KV cache quantizers; disable all others.
             with mtq.set_quantizer_by_cfg_context(
                 language_model,
                 [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg],
             ):
                 mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)

Also applies to: 476-483

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 130 - 133, Replace the hardcoded
_KV_CAST_FORMATS check with a semantic check on the chosen KV preset: instead of
testing the format name via _KV_CAST_FORMATS, inspect the selected KV
configuration object (the loaded preset used for KV, e.g., the variable that
selects the KV preset in this module—refer to the code that chooses the "kv"
preset) and decide to skip calibration when that config explicitly pins
use_constant_amax (or an equivalent flag like
requires_calibration/use_constant_amax) — remove the frozenset usage and branch
on the KV config's semantic field so YAML-driven presets control the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 194-211: The CLI validation currently tests raw tokens against
_AUTO_QUANTIZE_QFORMATS, which rejects valid canonical names because
canonical/alias resolution happens later via QUANT_CFG_CHOICES; change the
validation to check against the full set of accepted keys (e.g., use
QUANT_CFG_CHOICES.keys() or build a normalized set of canonical names/aliases)
or resolve each token through the same lookup used later before rejecting.
Update the checks that reference _AUTO_QUANTIZE_QFORMATS (and any logic around
parsing auto-quantize tokens) to use QUANT_CFG_CHOICES (or a derived normalized
set) so canonical names and aliases are accepted consistently.

---

Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 130-133: Replace the hardcoded _KV_CAST_FORMATS check with a
semantic check on the chosen KV preset: instead of testing the format name via
_KV_CAST_FORMATS, inspect the selected KV configuration object (the loaded
preset used for KV, e.g., the variable that selects the KV preset in this
module—refer to the code that chooses the "kv" preset) and decide to skip
calibration when that config explicitly pins use_constant_amax (or an equivalent
flag like requires_calibration/use_constant_amax) — remove the frozenset usage
and branch on the KV config's semantic field so YAML-driven presets control the
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61cb3534-3f30-4331-b8b7-3a3cf32cca68

📥 Commits

Reviewing files that changed from the base of the PR and between c9098b6 and aae0fe1.

📒 Files selected for processing (3)
  • examples/llm_ptq/hf_ptq.py
  • modelopt_recipes/configs/ptq/presets/kv/fp8_cast.yaml
  • modelopt_recipes/configs/ptq/presets/kv/nvfp4_cast.yaml

Comment thread examples/llm_ptq/hf_ptq.py
- Deepcopy in _PresetCfgChoices.__getitem__ so callers can freely mutate
  the returned quant_cfg without poisoning the cache.
- Assert that _KV_NONE does not collide with any discovered KV preset.
- Expand the comment on _AUTO_QUANTIZE_QFORMATS explaining why it stays
  hardcoded (auto_quantize compatibility is an export-path property, not
  a YAML-derivable one).
- Add CHANGELOG entry for the qformat discovery refactor and the
  fp8_cast / nvfp4_cast preset promotion (including the note that
  out-of-tree recipes targeting cast KV must set use_constant_amax
  themselves now that the runtime override is gone).

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Codify the policy that the preset directory listing IS the CLI vocabulary —
there is intentionally no separate allow-list. New presets are CLI-visible
the moment they land in the directory; this is a feature, not an oversight.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
1. Auto-quantize validation: _AUTO_QUANTIZE_QFORMATS previously only listed the
   short alias names, so passing canonical preset basenames (e.g. int8_smoothquant
   instead of int8_sq) would be rejected even though the underlying configs are
   identical. Switch the set to canonical names and canonicalize incoming tokens
   via a new _canonical_qformat() helper so both forms are accepted.

2. KV cast detection: replace the hardcoded _KV_CAST_FORMATS = {fp8_cast,
   nvfp4_cast} name set with a semantic check (_kv_cfg_uses_constant_amax) that
   inspects the loaded KV cfg's *[kv]_bmm_quantizer entry for use_constant_amax.
   This makes "should we skip KV calibration?" YAML-driven: any future cast-style
   KV preset works without touching this script.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
examples/llm_ptq/hf_ptq.py (1)

403-407: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize auto-quantize qformats before checking _AUTO_QUANTIZE_QFORMATS.

Line 406 validates raw CLI tokens, but Line 465 resolves them through QUANT_CFG_CHOICES, which now accepts both canonical preset basenames and legacy aliases. A canonical preset like int8_smoothquant is therefore accepted later but rejected here first.

💡 Suggested fix
-    qformat_list = args.qformat.split(",")
+    qformat_list = [q.strip() for q in args.qformat.split(",")]
     assert qformat_list, "No quantization formats provided"
-    # Check if all provided quantization formats are supported
-    assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in qformat_list), (
+    canonical_qformats = [
+        QUANT_CFG_CHOICES._canonical(qformat) if isinstance(QUANT_CFG_CHOICES, _PresetCfgChoices) else qformat
+        for qformat in qformat_list
+    ]
+    assert all(qformat is not None for qformat in canonical_qformats), (
+        "Unsupported quantization format provided"
+    )
+    assert all(qformat in _AUTO_QUANTIZE_QFORMATS for qformat in canonical_qformats), (
         "One or more quantization formats provided are not supported for unified checkpoint export"
     )

Also applies to: 465-465

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 403 - 407, The assertion is checking
raw CLI tokens in qformat_list against _AUTO_QUANTIZE_QFORMATS before they are
normalized later; update the validation to normalize each args.qformat token
using the same resolution used at Line 465 (QUANT_CFG_CHOICES/its alias mapping)
and then check the normalized canonical names against _AUTO_QUANTIZE_QFORMATS.
Concretely, transform qformat_list by mapping each entry through the
QUANT_CFG_CHOICES lookup (or its alias→canonical resolver) to produce
canonical_qformats, then assert canonical_qformats is non-empty and that all
entries are in _AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat,
QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).
🧹 Nitpick comments (1)
examples/llm_ptq/hf_ptq.py (1)

487-499: ⚡ Quick win

Drive KV-cache calibration skipping from the preset config, not the preset name.

Line 493 reintroduces a hardcoded name check after this refactor made YAML authoritative. If a future KV preset sets use_constant_amax, it will be CLI-exposed here and then immediately recalibrated anyway.

💡 Suggested fix
         kv_cache_quant_cfg = copy.deepcopy(KV_QUANT_CFG_CHOICES[args.kv_cache_qformat]["quant_cfg"])
         kv_cache_quant_cfg = [
             e for e in kv_cache_quant_cfg if e["quantizer_name"] != "*"
         ]  # keep other quantizers from auto_quantize

         mtq.set_quantizer_by_cfg(language_model, quant_cfg=kv_cache_quant_cfg)
-        if args.kv_cache_qformat not in _KV_CAST_FORMATS:
+        needs_kv_calibration = any(
+            not entry.get("use_constant_amax", False) for entry in kv_cache_quant_cfg
+        )
+        if needs_kv_calibration:
             # Calibrate only the KV cache quantizers; disable all others.
             with mtq.set_quantizer_by_cfg_context(
                 language_model,
                 [{"quantizer_name": "*", "enable": False}, *kv_cache_quant_cfg],
             ):
                 mtq.calibrate(language_model, algorithm="max", forward_loop=calibrate_loop)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/llm_ptq/hf_ptq.py` around lines 487 - 499, The code currently
decides whether to skip KV-cache calibration by checking the preset name
(args.kv_cache_qformat not in _KV_CAST_FORMATS); instead, inspect the actual
preset config (kv_cache_quant_cfg) and skip calibration when the preset
indicates constant amax behavior. Replace the name-based condition with a
config-based check (e.g., if not any(entry.get("use_constant_amax") for entry in
kv_cache_quant_cfg): ... ) so mtq.calibrate(...) runs only when none of the KV
quantizer entries specify use_constant_amax; reference kv_cache_quant_cfg,
KV_QUANT_CFG_CHOICES, args.kv_cache_qformat, and mtq.calibrate in your change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 403-407: The assertion is checking raw CLI tokens in qformat_list
against _AUTO_QUANTIZE_QFORMATS before they are normalized later; update the
validation to normalize each args.qformat token using the same resolution used
at Line 465 (QUANT_CFG_CHOICES/its alias mapping) and then check the normalized
canonical names against _AUTO_QUANTIZE_QFORMATS. Concretely, transform
qformat_list by mapping each entry through the QUANT_CFG_CHOICES lookup (or its
alias→canonical resolver) to produce canonical_qformats, then assert
canonical_qformats is non-empty and that all entries are in
_AUTO_QUANTIZE_QFORMATS (referencing qformat_list, args.qformat,
QUANT_CFG_CHOICES, and _AUTO_QUANTIZE_QFORMATS).

---

Nitpick comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 487-499: The code currently decides whether to skip KV-cache
calibration by checking the preset name (args.kv_cache_qformat not in
_KV_CAST_FORMATS); instead, inspect the actual preset config
(kv_cache_quant_cfg) and skip calibration when the preset indicates constant
amax behavior. Replace the name-based condition with a config-based check (e.g.,
if not any(entry.get("use_constant_amax") for entry in kv_cache_quant_cfg): ...
) so mtq.calibrate(...) runs only when none of the KV quantizer entries specify
use_constant_amax; reference kv_cache_quant_cfg, KV_QUANT_CFG_CHOICES,
args.kv_cache_qformat, and mtq.calibrate in your change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a74f2b60-827e-4805-a9ff-ac9644c33ec5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff7c9e and 76fc552.

📒 Files selected for processing (1)
  • examples/llm_ptq/hf_ptq.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant