-
Notifications
You must be signed in to change notification settings - Fork 162
Description
Current behaviour
Recently, the license string checker helper script was added to CI procedures, see #3699.
The script checks the values of license.attribution
field values and makes sure that the values match a desired controlled vocabulary of allowed values.
This works very well; however there is nothing too specific about licenses here, so the script could be easily generalised to check the values of other fields where we would like to make sure the values are from a controlled vocabularies, such as collision type, MC categories, etc.
Possible improvements
Let's generalise the license checker script to allow checking more metadata fields in the similar way.
For example, one could introduce a configuration file listing fields and their desired values to check against, such as:
license:
attribution:
- Apache-2.0
- BSD-3-Clause
- CC0-1.0
- GPL-3.0-only
- MIT
collision_information:
type:
- e+e-
- pp
- pPb
- PbPb
experiment:
- ALICE
- ATLAS
- CMS
- DELPHI
- LHCb
- OPERA
- PHENIX
- TOTEM
The content curators could define all the controlled vocabularies of interest, and the script would check all the fields and subfields values to see whether they match, and report any problems.
Notes
The above YAML configuration file example was listed just for illustration purposes; the actual implementation could use any other technique, for example JSON Schema with embedded enum types:
"enum": [
"Apache-2.0",
"BSD-3-Clause",
"CC0-1.0",
"GPL-3.0-only",
"MIT",
]
Advantage: fast JSON Schema validators exist, nothing to write on our end. Disadvantage: We would have to update metadata schema versions for each newly-allowed controlled vocabulary value, leading to a bit of a version jungle.