CWL Workflows
There is a set of CWL workflow scripts in the repository (/scripts/cwl_workflows
) for each realm. Each workflow breaks the input files up into manageable segment size and perform all the required input processing needed before invoking e3sm_to_cmip
. These scripts have been designed to run on a SLURM cluster in parallel and will process an arbitrarily large set of simulation data in whatever chunk size required.
Setting up your CWL environment
To use the CWL workflows you will need additional dependencies in your environment:
conda install -c conda-forge cwltool nodejs
When CWL runs it needs somewhere to store its intermediate files. By default it will use the systems $TMPDIR
but in some cases that wont work, for example on NERSC the compute nodes wont have access to the login nodes /tmp directory.
An easy solution for this is to create a directory on a shared mount, and run export TMPDIR=/path/to/shared/location
and
then when running the cwltool use the --tmpdir-prefix=$TMPDIR
argument.
Using the CWL Workflows
Each of the directories under scripts/cwl_workflows
holds a single self-contained workflow.
The name of the workflow matches the name of the directory, for example under the mpaso directory is a file named mpaso.cwl
which contains the workflow.
The beginning of each workflow contains an inputs
section which defines the required parameters, for example
inputs:
data_path: string
metadata: File
workflow_output: string
mapfile: File
frequency: int
namelist_path: string
region_path: string
restart_path: string
tables_path: string
cmor_var_list: string[]
timeout: int
partition: string
account: string
Along with each of the cwl workflows is an example yaml parameter file, for example along with mpaso.cwl
is
mpaso-job.yaml
which contains the following:
data_path: /p/user_pub/e3sm/staging/prepub/1_1_ECA/ssp585-BDRD//1deg_atm_60-30km_ocean/ocean/native/model-output/mon/ens1/v0/
workflow_output: /p/user_pub/e3sm/baldwin32/workshop/ssp585/ssp585/output/pp/cmor/ssp585/2015_2100
metadata:
class: File
path: /p/user_pub/e3sm/baldwin32/workshop/ssp585/ssp585/output/pp/cmor/ssp585/2015_2100/user_metadata.json
mapfile:
class: File
path: /export/zender1/data/maps/map_oEC60to30v3_to_cmip6_180x360_aave.20181001.nc
frequency: 5
namelist_path: /p/user_pub/e3sm/baldwin32/workshop/E3SM-1-1-ECA.hist-bgc/mpaso_in
region_path: /p/user_pub/e3sm/baldwin32/resources/oEC60to30v3_Atlantic_region_and_southern_transect.nc
restart_path: /p/user_pub/e3sm/baldwin32/workshop/E3SM-1-1-ECA.hist-bgc/mpaso.rst.1851-01-01_00000.nc
tables_path: /export/baldwin32/projects/cmor/Tables
timeout: 10:00:00
account: e3sm
partition: debug
cmor_var_list: [masso, volo, thetaoga, tosga, soga, sosga, zos, masscello, tos, tob, sos, sob, mlotst, fsitherm, wfo, sfdsi, hfds, tauuo, tauvo, thetao, so, uo, vo, wo, hfsifrazil, zhalfo]
Once the parameter file is complete, the workflow can be executed by calling the cwltool
cwltool --tmpdir-prefix=$TMPDIR ~/projects/e3sm_to_cmip/scripts/cwl_workflows/mpaso/mpaso.cwl mpaso-job.yaml