You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/README.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ Below is the current support status on Intel® Xeon® Scalable Processor with Py
37
37
38
38
`run_clm_no_trainer.py` quantizes the large language models using the dataset [NeelNanda/pile-10k](https://huggingface.co/datasets/NeelNanda/pile-10k) calibration and validates datasets accuracy provided by lm_eval, an example command is as follows.
39
39
40
-
### Quantization
40
+
### Quantization (CPU & HPU)
41
41
42
42
```bash
43
43
python run_clm_no_trainer.py \
@@ -53,9 +53,10 @@ python run_clm_no_trainer.py \
53
53
--gptq_use_max_length \
54
54
--output_dir saved_results
55
55
```
56
-
### Evaluation
57
56
58
-
> Note: The SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false is an experimental flag which yields better performance for uint4, and it will be removed in a future release.
57
+
> Note: `--gptq_actorder` is not supported by HPU.
> Note: The SRAM_SLICER_SHARED_MME_INPUT_EXPANSION_ENABLED=false is an experimental flag which yields better performance for uint4, and it will be removed in a future release.
0 commit comments