Skip to content

Commit 9779454

Browse files
njzjzcoderabbitai[bot]wanghan-iapcm
authored
docs: improve docs, examples, and error messages (#4338)
1. Refactor type embedding docs; 2. Refactor model compression docs; 3. Refactor DPA-1 docs; 4. Add error messages when type embedding is set in other backends; 5. Bump `sel` in the DPA-2 example. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced enhanced error handling for model parameters across various backends, clarifying the usage of `type_embedding`. - Added sections on "Type embedding" and "Model compression" in multiple documentation files to improve user guidance. - New input file `input_torch_compressible.json` added for the DPA-2 model, providing a compressible configuration. - **Bug Fixes** - Updated serialization and deserialization methods to prevent unsupported configurations in model handling. - **Documentation** - Revised documentation to clarify type embedding support and model compression conditions. - Removed references to outdated system formats in the documentation. - **Configuration Updates** - Increased selection parameters for three-body interactions and representation transformers in example configuration files. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Han Wang <[email protected]>
1 parent a83c98d commit 9779454

File tree

22 files changed

+236
-35
lines changed

22 files changed

+236
-35
lines changed

deepmd/dpmodel/model/model.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ def get_standard_model(data: dict) -> EnergyModel:
3636
data : dict
3737
The data to construct the model.
3838
"""
39+
if "type_embedding" in data:
40+
raise ValueError(
41+
"In the DP backend, type_embedding is not at the model level, but within the descriptor. See type embedding documentation for details."
42+
)
3943
data["descriptor"]["type_map"] = data["type_map"]
4044
data["descriptor"]["ntypes"] = len(data["type_map"])
4145
fitting_type = data["fitting_net"].pop("type")

deepmd/jax/model/model.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ def get_standard_model(data: dict):
3535
The data to construct the model.
3636
"""
3737
data = deepcopy(data)
38+
if "type_embedding" in data:
39+
raise ValueError(
40+
"In the JAX backend, type_embedding is not at the model level, but within the descriptor. See type embedding documentation for details."
41+
)
3842
descriptor_type = data["descriptor"].pop("type")
3943
data["descriptor"]["type_map"] = data["type_map"]
4044
data["descriptor"]["ntypes"] = len(data["type_map"])

deepmd/pt/model/model/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,10 @@
7676

7777

7878
def _get_standard_model_components(model_params, ntypes):
79+
if "type_embedding" in model_params:
80+
raise ValueError(
81+
"In the PyTorch backend, type_embedding is not at the model level, but within the descriptor. See type embedding documentation for details."
82+
)
7983
# descriptor
8084
model_params["descriptor"]["ntypes"] = ntypes
8185
model_params["descriptor"]["type_map"] = copy.deepcopy(model_params["type_map"])

deepmd/tf/model/model.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -842,6 +842,10 @@ def serialize(self, suffix: str = "") -> dict:
842842
Name suffix to identify this descriptor
843843
"""
844844
if self.typeebd is not None:
845+
if not self.descrpt.explicit_ntypes:
846+
raise RuntimeError(
847+
"type embedding for descriptors without mixed types is not supported in other backends"
848+
)
845849
self.descrpt.type_embedding = self.typeebd
846850
self.fitting.tebd_dim = self.typeebd.neuron[-1]
847851
if self.spin is not None:

deepmd/utils/argcheck.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1772,7 +1772,7 @@ def model_args(exclude_hybrid=False):
17721772
doc_data_stat_nbatch = "The model determines the normalization from the statistics of the data. This key specifies the number of `frames` in each `system` used for statistics."
17731773
doc_data_stat_protect = "Protect parameter for atomic energy regression."
17741774
doc_data_bias_nsample = "The number of training samples in a system to compute and change the energy bias."
1775-
doc_type_embedding = "The type embedding."
1775+
doc_type_embedding = "The type embedding. In other backends, the type embedding is already included in the descriptor."
17761776
doc_modifier = "The modifier of model output."
17771777
doc_use_srtab = "The table for the short-range pairwise interaction added on top of DP. The table is a text data file with (N_t + 1) * N_t / 2 + 1 columes. The first colume is the distance between atoms. The second to the last columes are energies for pairs of certain types. For example we have two atom types, 0 and 1. The columes from 2nd to 4th are for 0-0, 0-1 and 1-1 correspondingly."
17781778
doc_smin_alpha = "The short-range tabulated interaction will be switched according to the distance of the nearest neighbor. This distance is calculated by softmin. This parameter is the decaying parameter in the softmin. It is only required when `use_srtab` is provided."

doc/data/system.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# System
22

3-
DeePMD-kit takes a **system** as the data structure. A snapshot of a system is called a **frame**. A system may contain multiple frames with the same atom types and numbers, i.e. the same formula (like `H2O`). To contains data with different formulas, one usually needs to divide data into multiple systems, which may sometimes result in sparse-frame systems. See a [new system format](../model/train-se-atten.md#data-format) to further combine different systems with the same atom numbers, when training with descriptor `se_atten`.
3+
DeePMD-kit takes a **system** as the data structure. A snapshot of a system is called a **frame**. A system may contain multiple frames with the same atom types and numbers, i.e. the same formula (like `H2O`). To contains data with different formulas, one usually needs to divide data into multiple systems, which may sometimes result in sparse-frame systems.
44

55
A system should contain system properties, input frame properties, and labeled frame properties. The system property contains the following property:
66

doc/freeze/compress.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,8 @@ The model compression interface requires the version of DeePMD-kit used in the o
112112

113113
**Acceptable descriptor type**
114114

115-
Descriptors with `se_e2_a`, `se_e3`, `se_e2_r` and `se_atten_v2` types are supported by the model compression feature. `Hybrid` mixed with the above descriptors is also supported.
116-
117-
Notice: Model compression for the `se_atten_v2` descriptor is exclusively designed for models with the training parameter {ref}`attn_layer <model[standard]/descriptor[se_atten_v2]/attn_layer>` set to 0.
115+
Not any descriptor supports model compression.
116+
See the documentation of a specific descriptor to see whether it supports model compression.
118117

119118
**Available activation functions for descriptor:**
120119

doc/model/dpa2.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,13 @@ See the example `examples/water/lmp/jax_dpa2.lammps`.
3131
## Data format
3232

3333
DPA-2 supports both the [standard data format](../data/system.md) and the [mixed type data format](../data/system.md#mixed-type).
34+
35+
## Type embedding
36+
37+
Type embedding is within this descriptor with the {ref}`tebd_dim <model[standard]/descriptor[dpa2]/tebd_dim>` argument.
38+
39+
## Model compression
40+
41+
Model compression is supported when {ref}`repinit/tebd_input_mode <model[standard]/descriptor[dpa2]/repinit/tebd_input_mode>` is `strip`, but only the `repinit` part is compressed.
42+
An example is given in `examples/water/dpa2/input_torch_compressible.json`.
43+
The performance improvement will be limited if other parts are more expensive.

doc/model/train-hybrid.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,13 @@ A complete training input script of this example can be found in the directory
4848
```bash
4949
$deepmd_source_dir/examples/water/hybrid/input.json
5050
```
51+
52+
## Type embedding
53+
54+
Type embedding is different between the TensorFlow backend and other backends.
55+
In the TensorFlow backend, all descriptors share the same descriptor that defined in the model level.
56+
In other backends, each descriptor has its own type embedding and their parameters may be different.
57+
58+
## Model compression
59+
60+
Model compression is supported if all sub-descriptors support model compression.

doc/model/train-se-a-mask.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,11 @@ And the `loss` section in the training input script should be set as follows.
8484
"_comment": " that's all"
8585
}
8686
```
87+
88+
## Type embedding
89+
90+
Same as [`se_e2_a`](./train-se-e2-a.md).
91+
92+
## Model compression
93+
94+
Same as [`se_e2_a`](./train-se-e2-a.md).

0 commit comments

Comments
 (0)