autotune target_bits example for llama recipe #2344

xin3he · 2025-11-25T08:33:55Z

PR Type

Enhancement

Description

Add target_bits as float in AutoRoundConfig
Remove redundant params_list in multiple classes
Introduce non_tunable_params in TorchBaseConfig
Add output_dir to AutoRoundConfig

Diagram Walkthrough

flowchart LR
  A["Add target_bits as float"] -- "in AutoRoundConfig" --> B["Remove redundant params_list"]
  B -- "Introduce non_tunable_params" --> C["Add output_dir"]

File Walkthrough

Relevant files

Enhancement

1 files

config.py `Modify AutoRoundConfig and clean up config classes`	+11/-167

Additional files

11 files

README.md	+217/-0
quantize.py	+260/-0
requirements.txt	[link]
README.md	+0/-125
quantize.py	+0/-261
Meta-Llama-3.1-8B-Instruct_7bits.json	+0/-2242
Meta-Llama-3.3-70B-Instruct_5bits.json	+0/-5602
run_hf_inf.py	+0/-29
base_config.py	+2/-1
autoround.py	+16/-1
auto_accelerator.py	+2/-1

Signed-off-by: He, Xin3 <[email protected]>

PRAgent4INC · 2025-11-25T08:34:37Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Exception Handling The `except` block in the `convert` method is too broad. It catches all exceptions without any handling, which can hide errors and make debugging difficult. pass Resource Management The `del model` statement and `self.accelerator.empty_cache()` call should be inside a `finally` block to ensure they are executed even if an exception occurs. del model self.accelerator.empty_cache() logger.info("Quantization is done, reloading model from saved directory...") Garbage Collection Calling `gc.collect()` in the `empty_cache` method might not be necessary and could impact performance. Consider removing it unless profiling shows a significant benefit. gc.collect()

PRAgent4INC · 2025-11-25T08:35:09Z

PR Code Suggestions ✨

Copilot

Pull request overview

This PR enhances AutoRound quantization configuration by improving parameter management and enabling flexible mixed-precision quantization through the target_bits parameter.

Key Changes:

Modified target_bits from int to float in AutoRoundConfig to support fractional bit-width targets
Refactored config classes to remove redundant params_list attributes and use dynamic generation instead
Added non_tunable_params mechanism in TorchBaseConfig to exclude specific parameters from tuning
Added output_dir parameter to AutoRoundConfig for managing temporary file storage

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`neural_compressor/torch/quantization/config.py`	Refactored config classes: removed hardcoded `params_list`, added `non_tunable_params` initialization in `TorchBaseConfig.__init__`, changed `target_bits` type to float, added `output_dir` parameter
`neural_compressor/common/base_config.py`	Updated tuning parameter filtering logic to check against `non_tunable_params`, added internal parameter filtering in `to_dict()`
`neural_compressor/torch/algorithms/weight_only/autoround.py`	Refactored device handling to store accelerator object, added model reloading logic for specific export formats with memory cleanup
`neural_compressor/torch/utils/auto_accelerator.py`	Enhanced CPU accelerator's `empty_cache()` to call `gc.collect()` instead of no-op
`examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/quantize.py`	New example script demonstrating AutoRound quantization with `target_bits` parameter
`examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md`	New comprehensive documentation for Llama3 quantization recipes and inference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

neural_compressor/torch/algorithms/weight_only/autoround.py

neural_compressor/common/base_config.py

Signed-off-by: He, Xin3 <[email protected]>

yiliu30

Overall, LGTM.
It would be better not to mix example changes and new features in one PR.”

neural_compressor/common/base_config.py

neural_compressor/torch/algorithms/weight_only/autoround.py

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md

neural_compressor/torch/algorithms/weight_only/autoround.py

Signed-off-by: He, Xin3 <[email protected]>

neural_compressor/torch/algorithms/weight_only/autoround.py

Co-authored-by: Copilot <[email protected]>

Co-authored-by: Tang Kaihui <[email protected]>

xin3he · 2025-11-26T06:05:20Z

Overall, LGTM. It would be better not to mix example changes and new features in one PR.”

Right, most issues are found during enabling example, and those are mixed aiming quick development.

Signed-off-by: He, Xin3 <[email protected]>

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md

Signed-off-by: He, Xin3 <[email protected]>

...pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/run_quant.sh

Signed-off-by: He, Xin3 <[email protected]>

chensuyue

Please update this readme with the new examples.
https://github.com/intel/neural-compressor/tree/xinhe/vllm/examples#quantization

Signed-off-by: He, Xin3 <[email protected]>

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md

Signed-off-by: He, Xin3 <[email protected]>

chensuyue · 2025-12-03T14:34:04Z

The CI issue looks not related to this PR, I will fix it in another PR.

autotune target_bits example for llama recipe

2e50295

Signed-off-by: He, Xin3 <[email protected]>

xin3he requested review from Kaihui-intel, chensuyue, Copilot, thuang6 and yiliu30 November 25, 2025 08:34

PRAgent4INC added the Review effort 4/5 label Nov 25, 2025

Copilot AI reviewed Nov 25, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Show resolved Hide resolved

neural_compressor/common/base_config.py Outdated Show resolved Hide resolved

update requirement

709cc71

Signed-off-by: He, Xin3 <[email protected]>

yiliu30 approved these changes Nov 25, 2025

View reviewed changes

neural_compressor/common/base_config.py Outdated Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/autoround.py Show resolved Hide resolved

chensuyue reviewed Nov 25, 2025

View reviewed changes

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md Show resolved Hide resolved

Kaihui-intel approved these changes Nov 26, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

xin3he added 2 commits November 26, 2025 00:55

add run_quant run_benchmark

cc25af5

Signed-off-by: He, Xin3 <[email protected]>

update readme

dcd69a2

Signed-off-by: He, Xin3 <[email protected]>

xin3he commented Nov 26, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

xin3he and others added 3 commits November 26, 2025 14:03

Update neural_compressor/torch/algorithms/weight_only/autoround.py

f07ca2d

Update neural_compressor/common/base_config.py

bca2063

Co-authored-by: Copilot <[email protected]>

Update neural_compressor/torch/algorithms/weight_only/autoround.py

1d812a0

Co-authored-by: Tang Kaihui <[email protected]>

xin3he added 3 commits November 26, 2025 21:05

fix bug

99b8fff

Signed-off-by: He, Xin3 <[email protected]>

update readme and fix CI

3ffb650

Signed-off-by: He, Xin3 <[email protected]>

fix CI

54f87bb

Signed-off-by: He, Xin3 <[email protected]>

thuang6 reviewed Dec 1, 2025

View reviewed changes

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md Show resolved Hide resolved

chensuyue added this to the 3.7 milestone Dec 1, 2025

xin3he added 4 commits December 2, 2025 00:24

update autotune with more details

39cdce3

Signed-off-by: He, Xin3 <[email protected]>

update readme for ease-of-use

448c613

Signed-off-by: He, Xin3 <[email protected]>

update target bits introduction

2097755

Signed-off-by: He, Xin3 <[email protected]>

update autotune with lower target bits

139bcc7

Signed-off-by: He, Xin3 <[email protected]>

xin3he added 2 commits December 2, 2025 03:53

fix bug during autotune

8b90038

Signed-off-by: He, Xin3 <[email protected]>

filter mmlu sub tasks

64eb723

Signed-off-by: He, Xin3 <[email protected]>

chensuyue reviewed Dec 3, 2025

View reviewed changes

...pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/run_quant.sh Show resolved Hide resolved

chensuyue reviewed Dec 3, 2025

View reviewed changes

...pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/run_quant.sh Show resolved Hide resolved

add pure RTN mxfp4 in run_quant.sh

51aa129

Signed-off-by: He, Xin3 <[email protected]>

chensuyue approved these changes Dec 3, 2025

View reviewed changes

xin3he requested review from chensuyue, mengniwang95 and thuang6 December 3, 2025 07:52

update readme of PT_MXQuant and example

f029482

Signed-off-by: He, Xin3 <[email protected]>

mengniwang95 approved these changes Dec 3, 2025

View reviewed changes

thuang6 approved these changes Dec 3, 2025

View reviewed changes

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md Show resolved Hide resolved

update

e844df6

Signed-off-by: He, Xin3 <[email protected]>

chensuyue merged commit 46eff9b into master Dec 3, 2025
25 of 27 checks passed

chensuyue deleted the xinhe/vllm branch December 3, 2025 14:34

autotune target_bits example for llama recipe #2344

autotune target_bits example for llama recipe #2344

Uh oh!

Conversation

xin3he commented Nov 25, 2025 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Nov 25, 2025

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Nov 25, 2025

PR Code Suggestions ✨

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chensuyue left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chensuyue commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

xin3he commented Nov 25, 2025 •

edited by PRAgent4INC

Loading

chensuyue left a comment •

edited

Loading