CESM2: Configuration

The Community Eearth System Model, version 2 (CESM2) is a flexible, modular Earth systems simulator. The code is (presently) avaible for download, free of charge. Once downloaded, some assembly is required. The CESM2 codebase has several dependencies, which can be challenging to compile in such a way that they work together. We address some of those challenges here, in this document.

CESM invokes a somewhat unusual workflow, where scripting tools compile customized executables that include user defined collections of modules and data sets. A subsequent script then submits jobs, which will run that execuable against those data, to the scheduler. Accordingly, it is necessary to provide configuration data related to compilers, compute machines, and scheduler configuration.

A great place – in addition to this site, to get started is the CESM Quickstart Guide: https://escomp.github.io/CESM/versions/master/html/index.html#

Downloading CESM2

The best way to download CESM2 is to clone the git repository. As per the Quickstart Guide: https://escomp.github.io/CESM/versions/master/html/downloading_cesm.html

Note these steps will require git and subversion version control systems. On Sherlock, git is available by default; newer versions of git and subversion are available via the module system, eg:

module load git/
module load subversion/

git clone https://github.com/ESCOMP/CESM.git my_cesm_sandbox
cd my_cesm_sandbox

Optionally, check out a branch or previous version, eg:

git tag --list 'release-cesm2*'
git checkout release-cesm2.0.1

Then, check out the model components (note, this step requires subversion):

./manage_externals/checkout_externals

Configuration files

The three main configuration files are:

config_batch.xml
config_compilers.xml
config_machines.xml

These files are located in *both*:

${CESM_PATH}/cime/config/cesm/machines/
${HOME}/.cime

Where CESM_PATH is the directory into which you cloned the CESM2 repository in the earlier step, and $HOME is your home directory, and of course it may be necesssary to create the ${HOME}/.cime directory and the configureation files within.

Some documentaiton will suggest that CESM2 can be configured by modifying either set of files It will most likely be quickly discovered however, that it is a better practice to leave the files in ${CESM_PATH}/cime/config/cesm/machines/ untouched and defining local machines, batch system(s), and compilers in the ~/.cime location.

Configurations should be hierarchical, so for example the compiler configuration <complier COMPILER="gnu">, in the .../machines path, defines all of the gnu compiler flags – as they are required by CESM2, in general. A refined compiler definition can be written in ~/.cime/config_compilers.xml. For example, <compiler COMPILER="gnu" MACH="CBASE"> defines additional configurations for the gnu compiler on a CBASE type machine, where MACH should correspond to a machine defined in config_machines.xml. Note that the machine names, and various other meta-data elements, are user defined – which gives the user some discreation, with respect to CESM2 configuraiton. Of course, this discretion comes somewhat at the expense of guidance.

Here, we leave completely alone the .../machines configuration files and provide working example configuration files, for Stanford Research Computing’s Sherlock HPC platform. Note that there are numerous ways to accomplish the same task, so these examples might not be optimized. For example, in the _comnpilers example, we define our MPI flags in the <CFLAGS> section, where CESM2 might actually be capable of parsing MPI (and other) options more directly – see the example in the comments of that file.

config_compilers.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- Some options examples:
<MPI_LIB_NAME MPILIB="mvapich2"> mpich</MPI_LIB_NAME>
<SLIBS>
  <append MPILIB="mvapich2"> -mkl=cluster </append>
</SLIBS>
# So NOTE: I think we handle MPI via pkg-config. Also maybe we just support mpich for now. we can do this by creating just one
#  hierarchical entry. If we want to support multiple MPI, we have to either have multiple machines (serc_cbase_{mpi})
#  or multiple full COMPIELR entries.
-->
<config_compilers version="2.0">

        <compiler>
                <CPPDEFS>
                  <append>-DHAVE_IEEE_ARITHMETIC</append>
                  <append>-DCPRGNU</append>
                </CPPDEFS>
        </compiler>

        <compiler COMPILER="gnu" MACH="CBASE">
                <MPIFC> mpifort </MPIFC>
                <MPICC> mpicc </MPICC>
                <MPICXX> mpicxx </MPICXX>
                <SCC> gcc </SCC>
                <SCXX> g++ </SCXX>
                <SFC> gfortran </SFC>
                <CFLAGS>
                        <append> -fPIC </append>
                        <append>$SHELL{nc-config --cflags}</append>
                        <append>$SHELL{nf-config --cflags}</append>
                        <append MPILIB="mpich">$SHELL{pkg-config --cflags mpich}</append>
                        <append MPILIB="mpich_monkey"> $SHELL(pkg-config --cflags $SHELL(dirname $SHELL(dirname $MPICC))/lib/pkgconfig/mpich.pc)</append>
                </CFLAGS>
                <FFLAGS>
                        <append>-fallow-argument-mismatch</append>
                        <append>-fallow-invalid-boz</append>
                        <append>-fPIC</append>
                        <append>$SHELL{nf-config --fflags}</append>
                        <append>-I$ENV{PARALLELIO_INC}</append>
                        <append>-ffree-line-length-none</append>
                        <append>-fcray-pointer</append>
                </FFLAGS>
                <CXX_LDFLAGS>
                        <append>$SHELL{nc-config --cflags}</append>
                </CXX_LDFLAGS>
                <NETCDF_C_PATH>$SHELL{nc-config --prefix}</NETCDF_C_PATH>
                <NETCDF_FORTRAN_PATH>$SHELL{nf-config --prefix}</NETCDF_FORTRAN_PATH>
                <SLIBS>
                  <append>-L$ENV{GPTL_LIB} -lgptl</append>
                  <append>$SHELL{nf-config --flibs}</append>
                  <append>$SHELL{nc-config --libs}</append>
                  <append>-L$ENV{NETLIB_LAPACK_LIB64} -lblas -llapack </append>
                  <append MPILIB="mpich"> $SHELL{pkg-config --libs mpich}</append>
                </SLIBS>
        </compiler>

<compiler MACH="CBASE_DEV" COMPILER="gnu_dev">
                <MPIFC> mpifort </MPIFC>
                <MPICC> mpicc </MPICC>
                <MPICXX> mpicxx </MPICXX>
                <SCC> gcc </SCC>
                <SCXX> g++ </SCXX>
                <SFC> gfortran </SFC>
                <CFLAGS>
                        <append> -fPIC </append>
                        <append> $SHELL{nc-config --cflags}</append>
                        <append> $SHELL{nf-config --cflags}</append>
                        <append MPILIB="mpich_monkey"> $(shell pkg-config --cflags $(shell dirname $(shell dirname $MPICC))/lib/pkgconfig/mpich.pc)</append>
                </CFLAGS>
                <FFLAGS>
                        <append>-fallow-argument-mismatch</append>
                        <append> -fallow-invalid-boz </append>
                        <append>-fPIC</append>
                        <append>$SHELL{nf-config --fflags}</append>
                </FFLAGS>
                <CXX_LDFLAGS>
                        <append> $SHELL{nc-config --cxx4flags} </append>
                </CXX_LDFLAGS>
                <NETCDF_C_PATH> $SHELL{nc-config --prefix} </NETCDF_C_PATH>
                <NETCDF_FORTRAN_PATH> $SHELL{nf-config --prefix} </NETCDF_FORTRAN_PATH>
                <SLIBS>
                  <append> $SHELL{nf-config --flibs} </append>
                  <append> $SHELL{nc-config --libs} </append>
                  <append MPILIB="mpich_monkey"> $(shell pkg-config --libs $(shell dirname $(shell dirname $MPICC))/lib/pkgconfig/mpich.pc) </append>
                  <append MPILIB="mpich2"> $(shell pkg-config --libs $(shell dirname $(shell dirname $(which mpicc)))/lib/pkgconfig/mpich.pc) </append>
                  <append MPILIB="mpich"> $SHELL{pkg-config --libs mpich}  </append>
                </SLIBS>
        </compiler>

</config_compilers>

config_machines.xml

<?xml version="1.0"?>

<config_machines version="2.0">
  <machine MACH="CBASE">
    <DESC>Stanford Sherlock SH3_CBASE</DESC>
    <OS>LINUX</OS>
    <COMPILERS>gnu</COMPILERS>
    <MPILIBS compiler="gnu" >mpich</MPILIBS>
    <CIME_OUTPUT_ROOT>$ENV{SCRATCH}/CESM2/cases</CIME_OUTPUT_ROOT>
    <DIN_LOC_ROOT>$ENV{SCRATCH}/CESM2/cesm_input_data/data</DIN_LOC_ROOT>
    <DIN_LOC_ROOT_CLMFORC>$ENV{SCRATCH}/CESM2/cesm_input_data/data_clmforc</DIN_LOC_ROOT_CLMFORC>
    <DOUT_S_ROOT>$ENV{SCRATCH}/archive/$CASE</DOUT_S_ROOT>
    <BASELINE_ROOT>$ENV{GROUP_HOME}/yoder/archive/cesm_input_data/cesm_baselines</BASELINE_ROOT>
    <CCSM_CPRNC>$ENV{SCRATCH}/CESM2/cesm_input_data/cprnc</CCSM_CPRNC>
    <GMAKE_J>8</GMAKE_J>
    <BATCH_SYSTEM>slurm</BATCH_SYSTEM>
    <SUPPORTED_BY>sherlock</SUPPORTED_BY>
    <MAX_TASKS_PER_NODE>32</MAX_TASKS_PER_NODE>
    <MAX_MPITASKS_PER_NODE>32</MAX_MPITASKS_PER_NODE>
    <mpirun mpilib="default">
      <executable>srun</executable>
    </mpirun>
    <module_system type="module" allow_error="true">
      <init_path lang="perl">$LMOD_ROOT/lmod/init/perl</init_path>
      <init_path lang="python">$LMOD_ROOT/lmod/init/env_modules_python.py</init_path>
      <init_path lang="sh">$LMOD_ROOT/lmod/init/sh</init_path>
      <init_path lang="csh">$LMOD_ROOT/lmod/init/csh</init_path>
      <cmd_path lang="perl">$LMOD_ROOT/8.7.30/libexec/lmod perl</cmd_path>
      <cmd_path lang="python">$LMOD_ROOT/8.7.30/libexec/lmod python</cmd_path>
      <cmd_path lang="sh">module</cmd_path>
      <cmd_path lang="csh">module</cmd_path>
    </module_system>
    <environment_variables>
      <env name="OMP_STACKSIZE">64M</env>
    </environment_variables>
  </machine>

  <machine MACH="SH02">
    <DESC>Stanford Sherloherlock SH02</DESC>
    <OS>LINUX</OS>
    <COMPILERS>gnu</COMPILERS>
    <MPILIBS compiler="gnu" >mpich</MPILIBS>
    <CIME_OUTPUT_ROOT>$ENV{SCRATCH}/CESM2/cases</CIME_OUTPUT_ROOT>
    <DIN_LOC_ROOT>$ENV{SCRATCH}/CESM2/cesm_input_data/data</DIN_LOC_ROOT>
    <DIN_LOC_ROOT_CLMFORC>$ENV{SCRATCH}/CESM2/cesm_input_data/data_clmforc</DIN_LOC_ROOT_CLMFORC>
    <DOUT_S_ROOT>$ENV{SCRATCH}/archive/$CASE</DOUT_S_ROOT>
    <BASELINE_ROOT>$ENV{GROUP_HOME}/archive/cesm_input_data/cesm_baselines</BASELINE_ROOT>
    <CCSM_CPRNC>$ENV{SCRATCH}/CESM2/cesm_input_data/cprnc</CCSM_CPRNC>
    <GMAKE_J>8</GMAKE_J>
    <BATCH_SYSTEM>slurm</BATCH_SYSTEM>
    <SUPPORTED_BY>sherlock</SUPPORTED_BY>
    <MAX_TASKS_PER_NODE>24</MAX_TASKS_PER_NODE>
    <MAX_MPITASKS_PER_NODE>24</MAX_MPITASKS_PER_NODE>
    <mpirun mpilib="default">
      <executable>srun</executable>
    </mpirun>
    <module_system type="module" allow_error="true">
      <init_path lang="perl">$LMOD_ROOT/lmod/init/perl</init_path>
      <init_path lang="python">$LMOD_ROOT/lmod/init/env_modules_python.py</init_path>
      <init_path lang="sh">$LMOD_ROOT/lmod/init/sh</init_path>
      <init_path lang="csh">$LMOD_ROOT/lmod/init/csh</init_path>
      <cmd_path lang="perl">$LMOD_ROOT/8.7.30/libexec/lmod perl</cmd_path>
      <cmd_path lang="python">$LMOD_ROOT/8.7.30/libexec/lmod python</cmd_path>
      <cmd_path lang="sh">module</cmd_path>
      <cmd_path lang="csh">module</cmd_path>
    </module_system>
    <environment_variables>
      <env name="OMP_STACKSIZE">64M</env>
    </environment_variables>
  </machine>

</config_machines>  

config_batch.xml

<?xml version="1.0"?>
<config_batch version="2.1">
  <batch_system type="template" >
    <batch_query args=""></batch_query>
    <batch_submit></batch_submit>
    <batch_redirect></batch_redirect>
    <batch_directive></batch_directive>
    <directives>
      <directive></directive>
    </directives>
  </batch_system>
<!-- -->
<!-- augment default SLURM: -->
<!--  Eventually... use this job construction, so we don't have to wait for a full node? -->
<!-- but to do this, we'll want to construct a different config_machines entry as well. -->
<batch_system type="slurm" >
  <batch_submit>sbatch</batch_submit>
  <batch_cancel>scancel</batch_cancel>
  <batch_directive>#SBATCH</batch_directive>
  <jobid_pattern>(\d+)$</jobid_pattern>
  <depend_string> --dependency=afterok:jobid</depend_string>
  <depend_allow_string> --dependency=afterany:jobid</depend_allow_string>
  <depend_separator>,</depend_separator>
  <walltime_format>%H:%M:%S</walltime_format>
  <batch_mail_flag>--mail-user</batch_mail_flag>
  <batch_mail_type_flag>--mail-type</batch_mail_type_flag>
  <batch_mail_type>none, all, begin, end, fail</batch_mail_type>
  <directives>
    <directive> --job-name=CESM_ </directive>
    <directive> --ntasks= </directive>
    <directive> --output=CESM_ </directive>
    <directive> --error=CESM_ </directive>
  </directives>
</batch_system>

  <batch_system MACH="CBASE" type="slurm">
    <directives>
      <directive> --constraint="[CLASS:SH3_CBASE|CLASS:SH3_CBASE.1]"</directive>
    </directives>
    <!--<walltime_format>%H:%M:%S</walltime_format>-->
    <queues>
        <queue walltimemax="168:00:00" nodemin="1" nodemax="500" default="true">serc</queue>
        <!--<queue walltimemax="7-00:00:00" default="true">serc</queue>-->
        <queue walltimemax="24:00:00">normal</queue>
        <queue walltimemax="24:00:00">owners</queue>
    </queues>
  </batch_system>

  <batch_system MACH="SERC" type="slurm">
    <directives>
      <directive> --constraint="[CLASS:SH3_CBASE|CLASS:SH3_CBASE.1|CLASS:SH3_CPERF|CPU_GEN:SKX]"</directive>
    </directives>
    <!--<walltime_format>%H:%M:%S</walltime_format>-->
    <queues>
        <queue walltimemax="168:00:00" nodemin="1" nodemax="500" default="true">serc</queue>
        <!--<queue walltimemax="7-00:00:00" default="true">serc</queue>-->
        <queue walltimemax="24:00:00">normal</queue>
        <queue walltimemax="24:00:00">owners</queue>
    </queues>
  </batch_system>

  <batch_system MACH="CPERF" type="slurm">
    <directives>
        <directive> --job-name=CESM_ </directive>
        <directive> --ntasks= </directive>
        <directive> --output=CESM_ </directive>
        <directive> --error=CESM_ </directive>
        <directive> --constraint=CLASS:SH3_CPERF</directive>
    </directives>
    <queues>
        <queue walltimemax="7-00:00:00" default="true">serc</queue>
        <queue walltimemax="24:00:00">normal</queue>
        <queue walltimemax="24:00:00">owners</queue>
    </queues>
  </batch_system>

  </config_batch>

CESM2: Dependencies

Most large codes are dependent on several (mostly) standard packages. CESM2 is no exception. In fact, CESM2’s dependencies are numerous and somewhat difficult to compile. Here, we review the basic dependencies, an an environment available to SDSS users via the module system, and how to use the Spack package manager to build an environment.

Overview

The basic dependencies for CESM2 include:

MPI (some implementation…)
NetCDF-c and NetCDF-fortran
HDF5 (should bundle with NetCDF)
Others?

Sherlock module

A module, accessible to SDSS students, is available on the Sherlock HPC platform. To use this module,

module use /home/groups/sh_s-dss/share/sdss/modules/modulefiles/
module load cesm2/

This should load all of the compilers, MPI libraries, and other dependencies necessary to compile and run CESM2 projects.

Build your own

Good luck with this.

The typical problem with dependencies is that only a few might be recognized by a developer, because certain dependencies are available by default (eg, can be found in /usr/lib), or because some components bundle semi-automatically – for example, NetCDF requires HDF5. The standard approach, then, is:

Attempt to compile
Scroll up error messges to find the first error
Identify the missing component
Compile the missign component
Repeat

Not that this may be a nested process, several layers deep.

Note additionally that any Fortran codes will require a highly consisten SW stack. this is to say that (almost) all of the components will need to be compiled with the same version of the same compiler. As such, it can be difficult to cobble together a long list of dependencies from some HPC software module kits.

There are a lot of good reasons to manually build these dependencies on your own, but be prepared to put in some time. Be patient; be methodical. Alternatively, consider using a aid, like Spack, to map out and compile dependencies.

Spack

What is Spack?

Spack is a software package manager designed for scientific and high performance computing (HPC). Spack can be extremely helpful when working with complex dependency graphs. Software “recipes” are written, in Python, for individual software packages. These recipes define dependencies, compiler flags, patches, and other data in a way that Spack’s solver can produce a graph of dependent software packages and a script to install one or more packages. In short Spack is magic – or at least close to it.

There are four basic approaches to provide a software package via spack:

As part of a large, general purpose SW stack, where a package recipe exists
As a small, package specific environmewnt and SW stack, where a package recipe exists
Build an environment that satisfies the package’s dependencies; then compile the SW manually
Write a spack recipe to satisfy the dependencies and/or package itself.

Spack currently does not have a package recipe for CESM2, so 1 and 2 are not options. Additionally, the CESM2 workflow is not really compatible with those options. We focus on the latter two options – one way or another to build an enviornment that will support CESM2’s compile scripts.