PaddlePaddle · minghaoBD · Oct 13, 2022 · Oct 14, 2022 · Oct 14, 2022 · Oct 26, 2022
diff --git a/docs/compression.md b/docs/compression.md
@@ -0,0 +1,86 @@
+# 模型压缩
+
+------------------------------------------------------------------------------------------
+
+## **简介**
+
+PaddleFleetX 集成了 PaddleSlim 中的常见的压缩方法：量化训练（Qutization Aware Training，QAT）、结构化稀疏（Structured Pruning，SP）和知识蒸馏（Knowledge Distillation，KD）。本文会介绍如何在 PaddleFleetX 中使用这些功能，来压缩并且导出压缩后的模型。
+
+## **特性**
+
+- <a href=https://github.com/PaddlePaddle/PaddleSlim/tree/release/2.4/demo/dygraph/quant>量化训练</a>：通过将全连接层的矩阵乘计算由 Float 浮点型优化为 INT8 整型来优化推理性能；
+- <a href=https://github.com/PaddlePaddle/PaddleSlim/tree/release/2.4/demo/dygraph/pruning>结构化稀疏</a>：通过剪裁全连接层权重的通道数目来优化推理性能；
+- <a href=#知识蒸馏>知识蒸馏</a>：通过使用高精度的大模型（教师模型）来蒸馏低精度的小模型（学生模型）来提升小模型精度
+
+
+
+## **配置文档**
+
+模型压缩开关通过 Compress 字段控制，预训练的模型参数路径由 pretrained 指定。接下来就是量化训练、结构化稀疏和知识蒸馏各自的技术参数。
+
+```yaml
+Compress:
+  pretrained:         // 预训练模型参数的保存路径
+
+  Quantization:       // 量化训练参数
+
+  Prune:              // 结构化稀疏参数
+
+  Distillation:       // 知识蒸馏参数
+```
+
+**注意**： 我们正在开发上述三种压缩方法的联合使用，请先单独使用上述各个方法。
+
+### **量化训练参数**
+
+```yaml
+Compress:
+  pretrained:
+  Quantization:
+    enable: True
+    weight_quantize_type: 'abs_max'
+    activation_quantize_type: 'moving_average_abs_max'
+    weight_preprocess_type: None
+    activation_preprocess_type: 'PACT'
+    weight_bits: 8
+    activation_bits: 8
+    quantizable_layer_type: ['Linear', 'ColumnParallelLinear', 'RowParallelLinear']
+    onnx_format: True
+```
+
+其中参数说明：
+
+| **参数名**                   | **参数释义**                              |
+|-----------------------------|-----------------------------------------|
+| pretrained                  | 预训练模型的加载目录，若设置该参数，将在量化之前加载预训练模型；若需要加载量化后参数，将此参数设置为None，直接设置Engine.save_load.ckpt_dir即可       |
+| enable                      | 是否开启量化训练                           |
+| weight_quantize_type        | weight量化方法, 默认为`channel_wise_abs_max`, 此外还支持`abs_max` |
+| activation_quantize_type    | activation量化方法, 默认为`moving_average_abs_max`               |
+| weight_preprocess_type      | weight预处理方法，默认为None，代表不进行预处理；当需要使用`PACT`方法时设置为`PACT` |
+| activation_preprocess_type  | activation预处理方法，默认为None，代表不进行预处理                   |
+| weight_bits                 | weight量化比特数, 默认为 8                                        |
+| activation_bits             | activation量化比特数, 默认为 8                                    |
+| quantizable_layer_type      | 需要量化的算子类型                                                |
+| onnx_format                 | 是否使用新量化格式，默认为False                                     |
+
+更详细的量化训练参数介绍可参考[PaddleSlim动态图量化训练接口介绍](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/quanter/qat.rst)。
+
+### **结构化稀疏参数**
+
+```yaml
+Compress:
+  pretrained:
+  Prune:
+    enable: True
+    criterion: l1_norm
+    ratio: 0.125
+```
+
+其中参数说明：
+
+| **参数名**                   | **参数释义**                              |
+|-----------------------------|-----------------------------------------|
+| pretrained                  | 预训练模型的加载目录       |
+| enable                      | 是否开启结构化稀疏训练                           |
+| criterion    | 权重的重要性指标，目前支持l1_norm 和 l2_norm|
+| ratio      | 权重稀疏的比例。例如，0.125的意思是12.5%的权重会被稀疏掉 |
diff --git a/ppfleetx/configs/nlp/gpt/eval_pruned_gpt_345M_single_card.yaml b/ppfleetx/configs/nlp/gpt/eval_pruned_gpt_345M_single_card.yaml
@@ -0,0 +1,29 @@
+_base_: ./pretrain_gpt_345M_single_card.yaml
+
+
+Engine:
+  save_load:
+    ckpt_dir: output/epoch_0_step_1000/
+
+
+Model:
+  module: GPTEvalModule
+  hidden_dropout_prob: 0.1
+  attention_probs_dropout_prob: 0.1
+
+
+Compress:
+  Prune:
+    enable: True
+    cal_sens: False
+    criterion: l1_norm
+    ratio: 0.125
+
+
+Offline_Eval:
+  eval_path: ./lambada_test.jsonl
+  cloze_eval: True
+  overlapping_eval: 32
+  batch_size: 8
+  max_seq_len: 1024
+  logging_freq: 10
diff --git a/ppfleetx/configs/nlp/gpt/generation_pruned_gpt_345M_single_card.yaml b/ppfleetx/configs/nlp/gpt/generation_pruned_gpt_345M_single_card.yaml
@@ -0,0 +1,18 @@
+_base_: ./pretrain_gpt_345M_single_card.yaml
+
+Model:
+  module: GPTGenerationModule
+
+Prune:
+  enable: True
+  criterion: l1_norm
+  ratio: 0.125
+
+Generation:
+  top_k: 50
+  top_p: 0.75
+  temperature: 1.0
+  min_dec_len: 1
+  max_dec_len: 200
+  num_return_sequences: 1
+  decode_strategy: "sampling"
diff --git a/ppfleetx/configs/nlp/gpt/inference_pruned_gpt_6.7B_single_card.yaml b/ppfleetx/configs/nlp/gpt/inference_pruned_gpt_6.7B_single_card.yaml
@@ -0,0 +1,57 @@
+_base_: ./generation_gpt_345M_single_card.yaml
+
+
+Inference:
+  model_dir: output/epoch_7500
+  mp_degree: 1
+
+Engine:
+  save_load:
+    ckpt_dir: output/epoch_7500
+
+Prune:
+  enable: False
+  criterion: l1_norm
+  ratio: 0.125
+
+Model:
+  vocab_size: 50304
+  hidden_size: 3584
+  num_layers: 32
+  num_attention_heads: 28
+  ffn_hidden_size: 14336
+  hidden_dropout_prob: 0.1
+  attention_probs_dropout_prob: 0.1
+  max_position_embeddings: 1024
+  type_vocab_size: 16
+  initializer_range: 0.02
+  use_recompute: False
+  recompute_granularity:
+  no_recompute_layers:
+
+Distributed:
+  dp_degree: 
+  mp_degree: 1
+  pp_degree: 1
+  sharding:
+    sharding_degree: 1
+    sharding_stage: 1
+    sharding_offload: False
+    reduce_overlap: False
+    broadcast_overlap: False
+
+Data:
+  Test:
+    dataset:
+      name: GPTDataset
+      input_dir: ./data/
+      split: [949, 50, 1]
+      max_seq_len: 1024
+    sampler:
+      name: GPTBatchSampler
+      shuffle: False
+      drop_last: True
+    loader:
+      num_workers: 1
+      return_list: False
+      collate_fn: gpt_collate_fn
diff --git a/ppfleetx/configs/nlp/gpt/prune_gpt_345M_single_card.yaml b/ppfleetx/configs/nlp/gpt/prune_gpt_345M_single_card.yaml
@@ -0,0 +1,78 @@
+_base_: ./pretrain_gpt_base.yaml
+
+Global:
+  global_batch_size: 
+  local_batch_size: 8
+  micro_batch_size: 8
+
+
+  Model:
+  vocab_size: 50304
+  hidden_size: 1024
+  num_layers: 24
+  num_attention_heads: 16
+  ffn_hidden_size: 4096
+  hidden_dropout_prob: 0.0
+  attention_probs_dropout_prob: 0.0
+  max_position_embeddings: 1024
+  type_vocab_size: 16
+  initializer_range: 0.02
+  use_recompute: False
+  recompute_granularity:
+  no_recompute_layers:
+
+
+Data:
+  Train:
+    dataset:
+      name: GPTDataset
+      input_dir: ./data/
+      split: [949, 50, 1]
+      max_seq_len: 1024
+    sampler:
+      name: GPTBatchSampler
+      shuffle: False
+      drop_last: True
+    loader:
+      num_workers: 1
+      return_list: False
+      collate_fn: gpt_collate_fn
+
+  Eval:
+    dataset:
+      name: GPTDataset
+      input_dir: ./data/
+
+Distributed:
+  dp_degree: 1
+  mp_degree: 1
+  pp_degree: 1
+  sharding:
+    sharding_degree: 1
+    sharding_stage: 1
+    sharding_offload: False
+    comm_overlap: False
+
+Engine:
+  max_steps: 100000
+  save_load:
+    save_steps: 1000
+    save_epoch: 1
+    output_dir: ./output
+    ckpt_dir: ./PaddleFleetX_GPT_345M_220826
+
+Optimizer:
+  weight_decay: 0.0
+  lr:
+    decay_steps: 90000
+    warmup_rate: 0.00
+    max_lr: 2.5e-5
+    min_lr: 5.0e-6
+
+Compress:
+  pretrained: ./PaddleFleetX_GPT_345M_220826
+  Prune:
+    enable: True
+    cal_sens: False
+    criterion: l1_norm
+    ratio: 0.125
diff --git a/ppfleetx/configs/nlp/gpt/qat_gpt_345M_mp8.yaml b/ppfleetx/configs/nlp/gpt/qat_gpt_345M_mp8.yaml
@@ -34,12 +34,13 @@ Distributed:
     broadcast_overlap: False
 
 
-Quantization:
-  enable: True
-  pretrained: 
-  weight_quantize_type: 'abs_max'
-  activation_quantize_type: 'moving_average_abs_max'
-  weight_bits: 8
-  activation_bits: 8
-  quantizable_layer_type: ['Conv2D', 'Linear', 'Conv2DTranspose', 'ColumnParallelLinear', 'RowParallelLinear']
-  onnx_format: True
+Compress:
+  pretrained:
+  Quantization:
+    enable: True
+    weight_quantize_type: 'abs_max'
+    activation_quantize_type: 'moving_average_abs_max'
+    weight_bits: 8
+    activation_bits: 8
+    quantizable_layer_type: ['Conv2D', 'Linear', 'Conv2DTranspose', 'ColumnParallelLinear', 'RowParallelLinear']
+    onnx_format: True
diff --git a/ppfleetx/configs/nlp/gpt/qat_gpt_6.7B_sharding16.yaml b/ppfleetx/configs/nlp/gpt/qat_gpt_6.7B_sharding16.yaml
@@ -43,12 +43,13 @@ Optimizer:
   tensor_fusion: True
 
 
-Quantization:
-  enable: True
-  pretrained: 
-  weight_quantize_type: 'abs_max'
-  activation_quantize_type: 'moving_average_abs_max'
-  weight_bits: 8
-  activation_bits: 8
-  quantizable_layer_type: ['Conv2D', 'Linear', 'Conv2DTranspose', 'ColumnParallelLinear', 'RowParallelLinear']
-  onnx_format: True
+Compress:
+  pretrained:
+  Quantization:
+    enable: True
+    weight_quantize_type: 'abs_max'
+    activation_quantize_type: 'moving_average_abs_max'
+    weight_bits: 8
+    activation_bits: 8
+    quantizable_layer_type: ['Conv2D', 'Linear', 'Conv2DTranspose', 'ColumnParallelLinear', 'RowParallelLinear']
+    onnx_format: True
diff --git a/ppfleetx/configs/vis/vit/ViT_base_patch16_384_ft_qat_in1k_2n16c_dp_fp16o2.yaml b/ppfleetx/configs/vis/vit/ViT_base_patch16_384_ft_qat_in1k_2n16c_dp_fp16o2.yaml
@@ -123,11 +123,12 @@ Data:
       use_shared_memory: True
 
 
-Quantization:
-  enable: True
-  weight_quantize_type: 'abs_max'
-  activation_quantize_type: 'moving_average_abs_max'
-  activation_preprocess_type: 'PACT'
-  weight_bits: 8
-  activation_bits: 8
-  onnx_format: True
+Compress:
+  Quantization:
+    enable: True
+    weight_quantize_type: 'abs_max'
+    activation_quantize_type: 'moving_average_abs_max'
+    activation_preprocess_type: 'PACT'
+    weight_bits: 8
+    activation_bits: 8
+    onnx_format: True
diff --git a/ppfleetx/configs/vis/vit/ViT_large_patch16_384_ft_qat_in1k_2n16c_dp_fp16o2.yaml b/ppfleetx/configs/vis/vit/ViT_large_patch16_384_ft_qat_in1k_2n16c_dp_fp16o2.yaml
@@ -122,11 +122,12 @@ Data:
       num_workers: 8
       use_shared_memory: True
 
-Quantization:
-  enable: True
-  weight_quantize_type: 'abs_max'
-  activation_quantize_type: 'moving_average_abs_max'
-  activation_preprocess_type: 'PACT'
-  weight_bits: 8
-  activation_bits: 8
-  onnx_format: True
+Compress:
+  Quantization:
+    enable: True
+    weight_quantize_type: 'abs_max'
+    activation_quantize_type: 'moving_average_abs_max'
+    activation_preprocess_type: 'PACT'
+    weight_bits: 8
+    activation_bits: 8
+    onnx_format: True