Skip to content

Commit 30e803d

Browse files
xin3hexinhe3
andauthored
fix bug and update readme (#2051)
* fix bug and update readme --------- Signed-off-by: xinhe3 <[email protected]> Co-authored-by: xinhe3 <[email protected]>
1 parent 7062eeb commit 30e803d

File tree

2 files changed

+16
-11
lines changed
  • examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only
  • neural_compressor/evaluation/lm_eval/models

2 files changed

+16
-11
lines changed

examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/README.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Below is the current support status on Intel® Xeon® Scalable Processor with Py
3737

3838
`run_clm_no_trainer.py` quantizes the large language models using the dataset [NeelNanda/pile-10k](https://huggingface.co/datasets/NeelNanda/pile-10k) calibration and validates datasets accuracy provided by lm_eval, an example command is as follows.
3939

40-
### Quantization
40+
### Quantization (CPU & HPU)
4141

4242
```bash
4343
python run_clm_no_trainer.py \
@@ -53,9 +53,10 @@ python run_clm_no_trainer.py \
5353
--gptq_use_max_length \
5454
--output_dir saved_results
5555
```
56-
### Evaluation
5756

58-
> Note: The SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false is an experimental flag which yields better performance for uint4, and it will be removed in a future release.
57+
> Note: `--gptq_actorder` is not supported by HPU.
58+
59+
### Evaluation (CPU)
5960

6061
```bash
6162
# original model
@@ -65,30 +66,33 @@ python run_clm_no_trainer.py \
6566
--batch_size 8 \
6667
--tasks "lambada_openai"
6768

68-
# quantized model
69-
SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false ENABLE_EXPERIMENTAL_FLAGS=1 python run_clm_no_trainer.py \
69+
python run_clm_no_trainer.py \
7070
--model meta-llama/Llama-2-7b-hf \
7171
--accuracy \
7272
--batch_size 8 \
7373
--tasks "lambada_openai" \
7474
--load \
7575
--output_dir saved_results
76-
```
76+
```
7777

78-
### Benchmark
78+
### Evaluation (HPU)
79+
80+
> Note: The SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false is an experimental flag which yields better performance for uint4, and it will be removed in a future release.
7981
8082
```bash
8183
# original model
8284
python run_clm_no_trainer.py \
8385
--model meta-llama/Llama-2-7b-hf \
84-
--performance \
85-
--batch_size 8
86+
--accuracy \
87+
--batch_size 8 \
88+
--tasks "lambada_openai"
8689

8790
# quantized model
8891
SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false ENABLE_EXPERIMENTAL_FLAGS=1 python run_clm_no_trainer.py \
8992
--model meta-llama/Llama-2-7b-hf \
90-
--performance \
93+
--accuracy \
9194
--batch_size 8 \
95+
--tasks "lambada_openai" \
9296
--load \
9397
--output_dir saved_results
9498
```

neural_compressor/evaluation/lm_eval/models/huggingface.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -885,7 +885,8 @@ def find_bucket(self, length):
885885
exit(0)
886886
else:
887887
if self.last_bucket != suitable_buckets[0]:
888-
self.model.clear_cache() # clear graph cache to avoid OOM
888+
if hasattr(self.model, "clear_cache"):
889+
self.model.clear_cache() # clear HPU graph cache to avoid OOM
889890
self.last_bucket = suitable_buckets[0]
890891
return self.last_bucket
891892

0 commit comments

Comments
 (0)