This utility enables the automated creation and run of COSMO_CLM^2 simulations. It is written in Python3, with some automatically generated bash scripts for interaction with scheduling systems, and with a view to be extensible to different machines (so far Piz Daint at CSCS and Mistral at DKRZ are supported).
The utility provides essentially one command, cc2_create_case
,
which builds a case in a dedicated directory based on the user’s
input provided either through command line arguments or an xml setup
file. During case creation, the following steps occur:
- all necessary files (input data files, executables and potentially modified namelists) are transfered to the case directory or placed in a subdirectory according to namelists specifications when needed. Based on the user’s specifications, either whole of the input data or only the part needed for the first chunk of the simulation (for long simulation needing restarts) is getting transferred.
- bash job scripts are created for running, transferring input data and archiving output files.
- case properties not stored in namelists (e.g. length of chunks, archiving options, run/transfer status etc) are written to a local xml configuration file (not ot be confused with the user’s xml setup file)
The first run job is then submitted and run, transfer and archive jobs are subsequently organized as represented on the following chart.
- When installing on Piz Daint
module load cray-python
- Install COSMO_CLM2_tools
pip install --user git+https://github.com/COSMO-RESM/COSMO_CLM2_tools.git
use
--upgrade
for later updatespip install --user --upgrade git+https://github.com/COSMO-RESM/COSMO_CLM2_tools.git
- Make sure
~/.local/bin
is in your path
In this section we explain in more details how to use the
utility. As mentionned in the desciption section, it mostly provides
the cc2_create_case
command to the user. cc2_control_case
is
also provided but is mostly usefull for the utility itself. Note
that cc2_compile_clm
is also provided to flexibly compile the
Commnity Land Model. It is independant from running a COSMO_CLM^2
simulation and is decribed later.
The first thing to describe is how users specifications are
provided to the cc2_create_case
command. Almost all options can
be passed either by command line arguments or read from an xml
setup file. The later is given by the -s, --setup_file
option.
The overall idea is that the user can store the most “stable”
options in the setup file and try other options by directly
providing them to the command line. It’s anyways up to the user to
make use of this flexibility keeping in mind that *any option
provided through the command line has precedence over its setup
file counterpart*. As exemplified bellow, options are grouped under
different nodes in the xml setup file:
- the
machine
node containing only the machine name. This is subject to change and might eventually move to themain
node - the
main
node contains machine-independent options - machine specific options are stored under the node named after the machine. Note that options common to several machines, like scheduler related options, are also stored there as the default value might vary from machine to machine.
<?xml version="1.0" encoding="utf-8"?>
<setup>
<machine></machine>
<main>
<name>COSMO_CLM2</name>
<install_dir></install_dir>
<archive_dir></archive_dir>
<cosmo_only></cosmo_only>
<start_date></start_date>
<end_date></end_date>
<run_length></run_length>
<cos_in>./COSMO_input</cos_in>
<cos_nml>./COSMO_nml</cos_nml>
<cos_exe>./cosmo</cos_exe>
<cesm_in>./CESM_input</cesm_in>
<cesm_nml>./CESM_nml</cesm_nml>
<cesm_exe>./cesm.exe</cesm_exe>
<oas_in>./OASIS_input</oas_in>
<oas_nml>./OASIS_nml</oas_nml>
<ncosx type="int"></ncosx>
<ncosy type="int"></ncosy>
<ncosio type="int"></ncosio>
<ncesm type="int"></ncesm>
<gpu_mode type="py_eval">False</gpu_mode>
<dummy_day type="py_eval">False</dummy_day>
<transfer_all type="py_eval">False</transfer_all>
<input_type>file</input_type>
</main>
<daint>
<account></account>
<partition></partition>
<modules_opt>switch</modules_opt>
<pgi_version></pgi_version>
<shebang>#!/bin/bash</shebang>
<run_time>24:00:00</run_time>
<transfer_time>02:00:00</transfer_time>
<archive_time>03:00:00</archive_time>
</daint>
<mistral>
<account></account>
<partition></partition>
<run_time>10:00:00</run_time>
</mistral>
</setup>
The command line help cc2_create_case --help
also displays
options following a similar structure.
Here we describe all options in details. An option --option_bla
in the command line has the node <option_bla>value</option_bla>
as counterpart in the xml setup file. In the later, in case the
option value has to be interpreted as something else than a string,
the type must be provided as an attribute to the option node (see
example from the previous section). It can be either "py_eval"
for directly evaluating the string by python or any valid python
type.
For boolean options you will see “type: bool, using anything Python
can parse as a boolean” in the command line help instead of an
option that doesn’t require an argument. So for instance you might
have to specify --gpu_mode 1
or --gpu_mode bla
instead of the
more usual --gpu_mode
only. For the xml file, you can specify in
both ways: either type="py_eval"
as attribute and True
or
False
for the value or type="bool"
and anything Python can
parse as a boolean for the value. This is due to the internals of
the code and how defaults are implemented.
-s, --setup_file
path to the xml setup file. Beware that all relative paths provided in the setup file or directly to the command line are relative to where thecc2_create_case
command gets executed.--machine
specify the machine name. It has to be given either by the command line or the in the setup file.--name
case name. The working directory will be named after the case name. It also affects CESM output file names.--install_dir
the case working directory gets created asINSTALL_DIR/CASE_NAME
--start_date
simulation start date formatted as ‘YYYY-MM-DD-HH’--end_date
simulation end date formatted as ‘YYYY-MM-DD-HH’--run_length
set simulation length if end_date not specified or run length between restarts otherwise. It can be given in one of the following forms: ‘N1yN2m’, ‘N1y’, ‘N2m’ or ‘N3d’. N1, N2 and N4 are arbitrary integers (N2>12 possible) and ‘y’, ‘m’ and ‘d’ stand respectively for years, months and days.
So far the following options have default values but these defaults might disappear in favor of an error thrown in case none of the setup file or the command line arguments contain it.
--cos_in
COSMO input files directory (default: ./COSMO_input)--cos_nml
COSMO namelists directory (default: ./COSMO_nml)--cos_exe
path to COSMO executable (default: ./cosmo)--cesm_in
CESM input files directory (default: ./CESM_input)--cesm_nml
CESM namelists directory (default: ./CESM_nml)--cesm_exe
path to CESM executable (default: ./cesm.exe)--oas_in
OASIS input files directory (default: ./OASIS_input)--oas_nml
OASIS namelists directory (default: ./OASIS_nml). WARNING: it must contain anamcouple_tmpl
file in which there has to be a_runtime_
placeholder so that the tool can insert the right run time at each restart.
--ncosx
number of COSMO subdomains along the ‘x-axis’ (type: int, default: from INPUT_ORG namelist)--ncosy
number of COSMO subdomains along the ‘y-axis’ (type: int, default: from INPUT_ORG namelist)--ncosio
number of COSMO tasks dedicated to i/o work, not tested (type: int, default: from INPUT_ORG namelist)--ncesm
number of CESM subdomains (type: int, default: from drv_in namelist)
The user has to make sure that the total number of tasks ncosx *
ncosy + ncosio + ncesm
add up to a integer times the number of
tasks per node on the machine. When COSMO is ran in gpu mode,
ncesm
is ignored and all available tasks are associated to CESM,
i.e. n_nodes * (n_tasks_per_node - 1)
--cosmo_only
run only cosmo with the build-in soil model TERRA (type: bool, using anything Python can parse as a boolean, default: False). Warning: provide a COSMO executable compiled accordingly.--start_mode
specify the type of start requested (choices: ‘startup’, ‘continue’, ‘restart’, default: ‘startup’).- ‘startup’ is for simulations with a classical initial state.
- ‘continue’ is for continuing an existing simulation. Use in
conjunction with the
restart_date
,cos_rst
andcesm_rst
options. Warning: the original and continued cases need to have the same name. Also do not modify thestart_date
option, keep the original case start date and use therestart_date
option for specifying when to continue. - ‘restart’ is for restarting from another case.
Use both ‘continue’ and ‘restart’ in conjunction with the
restart_date
,cos_rst
andcesm_rst
options. Thestart_date
option needs to correspond to the original case you’re continuing/restarting, userestart_date
to specify when to continue/restart.--restart_date
restart/continue date formatted as YYYY-MM-DD-HH--cos_rst
path to the COSMO restart file. Compresed restart files with extension ‘.gz’ or ‘.bz2’ are accepted--cesm_rst
path to the directory containing CESM restart files. Archives, compresed or not, with extension ‘.tar’, ‘.tgz’, ‘.tar.gz’, ‘.tbz’ or ‘.tar.bz2’ are accepted.--gpu_mode
run COSMO on gpu (type: bool, using anything Python can parse as a boolean, default: False). Warning: provide a COSMO executable compiled accordingly.--dummy_day
extend the last chunk by 1 day in order to get last COSMO output (type: bool, using anything Python can parse as a boolean, default: True). Warning: make sure the corresponding input file are available in the COSMO input directory.--gen_oasis
generate OASIS auxiliary files. The simulation will crash after generating these files. This is normal, just transfer the new files back where you need. This is a command line only option, cannot be set in the setup file. Usually this option should be used twice when a new domain is used: 1) create mask_clm.nc to reshape mask in cesm input files. This requires compiling both executables with IOASISDEBUGLVL = 2; 2) create final OASIS files (IOASISDEBUGLVL = 0)--no_submit
do not submit job after case install. This is useful for debug or check but also if one needs to modify the run, transfer or archive job scripts. The case can then be submitted by hand from the case directory. This is a command line only option, cannot be set in the setup file.
--transfer_all
transfer all model input files at once before starting the simulation. If not, only transfer the data needed to run the first chunk (type: bool, using anything Python can parse as a boolean, default: True). This default value will most probably be switched to False in a close future.--input_type
either ‘file’ or ‘symlink’. In the second case, only a link to the original input file is created in the working directory instead of an actual file. Warning the file system where the original input files are stored has to be accessible from the compute nodes. use in conjunction with--transfer_all=1
.
--archive_dir
directory where output and restart files are archived (default: None). If not provided either to the command line or by the setup file, no archiving is performed.--archive_rm
remove original output files from the case directory when archiving (type: bool, using anything Python can parse as a boolean, default: False). Note that this option has no effect on the archiving of restart files who are needed by the potential next run by definition--archive_cmpression
specify which compression algorithm is used before transferring the archive (available choices: ‘none’, ‘gzip’ and ‘bzip2’, default: ‘none’). For heavy output simulations, you might be better off compressing the archived output data yourself.--archive_cesm
archive or not the CESM output (type: bool, using anything Python can parse as a boolean, default: True). The idea is that a CESM output stream might contain more than one time slice. So depending how this is specified, the output file might be needed uppon restart.
Options for the scheduling system. In the xml configuration file, these have to be put under the machine specific node.
--run_time
reserved time for run job (default: ‘24:00:00’ on daint, ‘08:00:00’ on mistral)--transfer_time
reserved time for transfer job (default: ‘02:00:00’)--archive_time
reserved time for archive job (default: ‘03:00:00’)
Options specific to the SLURM scheduling system. In the xml configuration file, these have to be put under the machine specific node.
--account
account to use for submitted job scripts (default: infered from $PROJECT on daint, None on mistral)--partition
queue to witch the run job gets submitted, mostly useful for debug (default: None).
--modules_opt
option for loading modules at run time. Either ‘switch’, ‘none’ or ‘purge’ (default: switch)--pgi_version
specify pgi compiler version at run time (default: None)--shebang
run job script shebang (default: ‘#!/bin/bash’)
None so far
The idea here is that one can perform tests or sensitivity analysis without touching the original project namelists. This is not avialble from the command line.
Any namelist parameter can be changed by adding a <change_par> node directly under the root node with attributes following this example
<change_par file="INPUT_ORG" block="runctl" param="lreproduce" type="py_eval">True</change_par>
- The value of the node is the new value of the namelist parameter.
- don’t give the namelist file path, only the file name is needed.
- type attribute can be any of the valid python types or “py_eval”, in which case python will interpret the value. the default type is string
- an “n” attribute starting at 1 (not 0) can also be given to target one of several blocks sharing the same name in a namelist file, e.g. “gribout” blocks in INPUT_IO.
In a similar way, any namelist parameter can be deleted by adding an empty <del_par> node directly under the root node with attributes following this example
<del_par file="INPUT_ORG" block="runctl" param="lreproduce" />
- don’t give the namelist file path, only the file name is needed.
- an “n” attribute starting at 1 (not 0) can also be given to target one of several blocks sharing the same name in a namelist file, e.g. “gribout” blocks in INPUT_IO.
- Obviouly any value given to that node is ignored
In this section we describe a bit how the utility is implemented and how one can add options or support for a new machine.