Our Experiments
Overview
Teaching: min
Exercises: minQuestions
How can I confirm if everything ran correctly?
How can I troubleshoot errors?
Objectives
We have three cases (or experiments) we are working with. We will take a look at what happened with each one of them.
-
- b.day1.0
- This was our first case that ran initially for 5-days. Let’s set
CONTINUE_RUN=TRUE
and ran it again for another 5-days. This run should have completed successfully. How can we confirm this?
Check your CaseStatus file
Go to your case directory for this case and look at the end of the
CaseStatus
filetail CaseStatus
Now let’s check out the output from this case. Remember, it is located in the DOUT_S_ROOT directory.
cases/b.day1.0> ./xmlquery DOUT_S_ROOT
DOUT_S_ROOT: /glade/scratch/cstan/archive/b.day1.0
We will go there and see if we now have 10 days of data.
cd /glade/scratch/cstan/b.day1.0
cd ocn/hist
We now have two output files for the ocean: b.day1.0.pop.h.nday1.0001-01-01.nc
and b.day1.0.pop.h.nday1.0001-01-06.nc
.
b.day1.0.pop.h.nday1.0001-01-01.nc
is the same file we looked at last week which has the first 5-days.
b.day1.0.pop.h.nday1.0001-01-06.nc
contains the next 5-days that we ran by setting CONTINUE_RUN=TRUE
Use ncview to look at these files.
-
- Second case
- This is our case that we are running for 4-years with daily precip and standard monthly output to use for Assignment #3. Assuming the configuration and namelist changes were entered correctly, this run should have completed successfully.
- Check your
CaseStatus
file. - If errors, check your log file
- If no errors, check your output:
There should be monthly and daily output for the atmosphere. Let’s confirm:
cd /glade/scratch/cstan/archive/run.2/atm/hist
ls
The test1.cam.h0.*.nc
files contain monthly averaged data.
The test1.cam.h1.*.nc
contain daily averaged data.
What is in these files?
We will look at each file using
ncdump -h
to understand what is in the files.What variables are in the h0 files? What variables are in the h1 files?
We set this with the namelist options
fincl2 = 'PRECC', 'PRECL'
nhtfrq = 0, -24
How many times are in the h0 files?
How many times are in the h1 files?
We set this with the
mfilt = 1,1
namelist option.Remember, you can look up namelist options
- mfilt
- Array containing the maximum number of time samples written to a history file. The first value applies to the primary history file, the second through tenth to the auxillary history files. Default: 1,30,30,30,30,30,30,30,30,30
We have lots of history files and we can look at each of them using ncview
, but that is not very useful.
We can read in all the files using Python xarray
, but its a lot of data.
There are some useful tools for postprocessing the data to get timeseries
files and easily take a look at some common diagnostics. We will learn about those later in this class.
-
- BRANCH case:
- This is the branch case we ran with lots of configuration changes and namelist changes. This run produces an error with the configuration provided.
We can review how we created and setup our run by looking at the first line of the README.case
file:
cases/branchwrong> head -n1 README.case
2023-03-05 17:55:30: ./create_newcase --case /glade/u/home/cstan/cases/branchwrong --res f19_g17 --compset B1850 --project UGMU0035
You can see that I initially made a mistake in my create_newcase
by mistyping the project number.
We can review any changes we made to the configuration of the run by looking at CaseStatus
cases/branchwrong> more CaseStatus
2023-03-05 18:01:16: xmlchange success <command> ./xmlchange RUN_TYPE=branch,RUN_REFCASE=b.day1.0,RUN_REFDATE=0001-01-05,CLM_NAMELIST_OPTS=,GET_REFCASE=FALSE,STOP_OPTION=nmonths,STOP_N=1,RESUBMIT=1,CCSM_CO2_PPMV=569.4 </command>
---------------------------------------------------
2023-03-05 18:01:46: xmlchange success <command> ./xmlchange JOB_WALLCLOCK_TIME=2:00:00 </command>
---------------------------------------------------
2023-03-05 18:02:13: case.setup starting
---------------------------------------------------
2023-03-05 18:02:16: case.setup success
---------------------------------------------------
2023-03-05 18:03:08: case.build starting
---------------------------------------------------
2023-03-05 18:03:10: case.build error
ERROR: Missing required pointer_file /glade/scratch/cstan/branchwrong/run/rpointer.ocn.restart ---has pop initial data been prestaged to /glade/scratch/cstan/branchwrong/run?
---------------------------------------------------
2023-03-05 18:08:31: case.build starting
---------------------------------------------------
CESM version is cesm2.1.3-rc.01
Processing externals description file : Externals.cfg
Processing externals description file : Externals_CLM.cfg
Processing externals description file : Externals_POP.cfg
Processing externals description file : Externals_CISM.cfg
Processing externals description file : Externals_CAM.cfg
Checking status of externals: clm, fates, ptclm, mosart, ww3, cime, cice, pop, cvmix, marbl, cism, source_cism, rtm, cam, clubb, carma, cosp2, chem_proc,
./cime
clean sandbox, on cime5.6.32
./components/cam
clean sandbox, on cam_cesm2_1_rel_41
./components/cam/chem_proc
clean sandbox, on tools/proc_atm/chem_proc/release_tags/chem_proc5_0_03_rel
./components/cam/src/physics/carma/base
clean sandbox, on carma/release_tags/carma3_49_rel
./components/cam/src/physics/clubb
clean sandbox, on vendor_clubb_r8099_n03
./components/cam/src/physics/cosp2/src
clean sandbox, on CFMIP/COSPv2.0/tags/v2.1.4cesm/src
./components/cice
clean sandbox, on cice5_cesm2_1_1_20190321
./components/cism
clean sandbox, on cism-release-cesm2.1.2_02
./components/cism/source_cism
clean sandbox, on release-cism2.1.03
./components/clm
clean sandbox, on release-clm5.0.30
./components/clm/src/fates
clean sandbox, on sci.1.30.0_api.8.0.0
./components/clm/tools/PTCLM
clean sandbox, on PTCLM2_20200121
./components/mosart
clean sandbox, on release-cesm2.0.04
./components/pop
clean sandbox, on pop2_cesm2_1_rel_n09
./components/pop/externals/CVMix
clean sandbox, on v0.93-beta
./components/pop/externals/MARBL
clean sandbox, on cesm2.1-n00
./components/rtm
clean sandbox, on release-cesm2.0.04
./components/ww3
clean sandbox, on ww3_181001
2023-03-05 18:19:23: case.build success
---------------------------------------------------
2023-03-05 18:20:23: case.submit starting
---------------------------------------------------
2023-03-05 18:20:29: case.submit error
ERROR: Command: 'qsub -q regular -l walltime=2:00:00 -A UGMU0035 -v ARGS_FOR_SCRIPT='--resubmit' .case.run' failed with error 'qsub: Invalid account, available accounts:
Project, Status, Active
UGMU0041, Normal, True' from dir '/glade/u/home/cstan/cases/branchwrong'
---------------------------------------------------
2023-03-05 18:25:52: xmlchange success <command> ./xmlchange PROJECT=UGMU0041 </command>
---------------------------------------------------
2023-03-05 18:26:38: case.submit starting
---------------------------------------------------
2023-03-05 18:26:45: case.submit success case.run:8818830.chadmin1.ib0.cheyenne.ucar.edu, case.st_archive:8818831.chadmin1.ib0.cheyenne.ucar.edu
---------------------------------------------------
2023-03-05 18:26:50: case.run starting
---------------------------------------------------
2023-03-05 18:26:57: model execution starting
---------------------------------------------------
2023-03-05 18:27:00: model execution success
---------------------------------------------------
2023-03-05 18:27:00: case.run error
ERROR: RUN FAIL: Command 'mpiexec_mpt -p "%g:" -np 576 omplace -tm open64 /glade/scratch/cstan/branchwrong/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /glade/scratch/cstan/branchwrong/run/cesm.log.8818830.chadmin1.ib0.cheyenne.ucar.edu.230305-182650
---------------------------------------------------
Another thing we did that is not documented automatically is to copy the restart files from our b.day1.0
case to our new run directory. This was so the model has a set of restart files to start the run from.
cp /glade/scratch/cstan/archive/b.day1.0/rest/0001-01-06-00000/* /glade/scratch/cstan/branchwrong/run/
How do we figure out what went wrong?
Look at your log file and use grep -i
to find errors.
cases/branchwrong> grep -i error /glade/scratch/cstan/branchwrong/run/cesm.log.8818830.chadmin1.ib0.cheyenne.ucar.edu.230305-182650
16: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
4: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
29: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
33: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
34: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
3: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
22: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
18: ERROR: GETFIL: FAILED to get b.day1.0.cam.r.0001-01-05-00000.nc
...
Why is the error repeated many times?
The model runs on many processors. Each one is reporting the error.
What does the error mean?
This is telling that the model is trying to get a file called b.day1.0.cam.r.0001-01-05-00000.nc
and it is unable to get it.
Let’s go back to our configuration and think back about how we set up this experiment. What do all the configuration changes mean?
We can take a look in env_run.xml
to confirm what each setting means.
- RUN_TYPE=branch
- This is a branch run
- RUN_REFCASE=b.day1.0
- Reference directory containing RUN_REFCASE data - used for hybrid or branch runs
- RUN_REFDATE=0001-01-05
- Reference date for hybrid or branch runs (yyyy-mm-dd)
- CLM_NAMELIST_OPTS=’’
- CLM-specific namelist settings for -namelist option in the CLM build-namelist. CLM_NAMELIST_OPTS is normally set as a compset variable and in general should not be modified for supported compsets. It is recommended that if you want to modify this value for your experiment, you should use your own user-defined component sets via using create_newcase with a compset_file argument. This is an advanced flag and should only be used by expert users.
It seems this option was provided in the NCAR tutorial example, but is not necessary.
- GET_REFCASE=FALSE
- Flag for automatically prestaging the refcase restart dataset. If TRUE, then the refcase data is prestaged into the executable directory
- STOP_OPTION=nmonths
- Sets the run length along with STOP_N and STOP_DATE
- STOP_N=1
- Provides a numerical count for $STOP_OPTION.
- RESUBMIT=1
- If RESUBMIT is greater than 0, then case will automatically resubmit Since we later set our queue time to only 2 hours, there may be a need to resubmit to complete the run.
- CCSM_CO2_PPMV=569.4
- Mechanism for setting the CO2 value in ppmv for CLM if CLM_CO2_TYPE is constant or for POP if OCN_CO2_TYPE is constant. This is the CO2 value that gets propogated to the ocean and land models.
- JOB_WALLCLOCK_TIME=2:00:00
- The machine wallclock setting. This means how long we tell it to run in the queue. The maximum and default are 12:00:00, but we can get our run in more quickly if we tell it we need less time.
Do you see anything in the configuration that could have led to our error?
Look at the
RUN_REFDATE
and the date we used for our restart fileSolution
RUN_REFDATE=0001-01-06 ./xmlchange RUN_REFDATE='0001-01-06'
Fix it!
What did those namelist changes do?
In user_nl_cam
- co2vmr=569.4e-6
- CO2 volume mixing ratio. This is used as the time invariant surface value of CO2 if no time varying values are specified. Default: set by build-namelist.
- ch4vmr = 1583.2e-9
- CH4 volume mixing ratio. This is used as the time invariant surface value of CH4 if no time varying values are specified. Default: set by build-namelist.
- inithist=’MONTHLY’
- Frequency that initial files will be output This produces initial condition files monthly.
Nothing looks questionable there.
Resubmit your case!
Key Points
History and Setup for Diagnostics
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is in our history files?
How do I setup to run the postprocessing and diagnostics packages
Objectives
We will now return to the output from our 4-year case. Let’s go to the atmospheric history directory for our case. If your 4-year case did not run to completion, you are welcome to look at mine.
cd /glade/scratch/cstan/archive/run.2/atm/hist
History vs. Timeseries Files
- History files
- contain all of the variables for a componenet for a particular frequency and are output directly by the model.
- Timeseries files
- usually span a number of timesteps and contain only one major variable. They are created offline.
When NCAR provides output from their model simulations publicly, they typically provide timeseries files for a select set of variables.
Examples:
A history file: f40_test.cam.h0.1993-11.nc
- 1 monthly timestep (Nov 1993)
- 200+ CAM varaibles
A timeseries file: f40_test.cam.h0.PSL.199001-199912.nc
- 120 monthly timesteps (Jan 1990-Dec1999)
- 1 CAM variable (PSL), along with coordinate variables like
time
,lat
,’lon`,etc.
CESM Time Variable
The time coordinate variable in CESM history and timeseries files represents the end of the averaging period for variables that are averages. The time that gets resolved when the data are read in does not match the date in the filename. For monthly averaged data, the filename is correct. This can be a source of much confusion.
Example: run.2.cam.h0.0001-05.nc
- This is a history file for may of year one of our run.
- When you read in this file, the first time is resolved as:
0001-06-01
. This means that Jun 1 of year 0001 is the end of the averaging period and the data contains the average for May of year 1. - To verify the averaging period in the files, consult the
time_bnds
,time_bound
, ortime_bounds
variable. Always check!!!! - This is a convention used by CESM to allow averaged and instantaneous variables to be stored in the same file.
Postprocessing
The process of going from history
files to timeseries
files and to convert 3D atmospheric data from the model coordinate system to selected pressure levels. We will learn how to use the CESM Postprocessing Tools which are primarily written in NCAR Command Language (NCL). NCL is in the process of being converted to Python, but for now, we can use the preprepared NCL scripts without having to know too much NCL.
Diagnostics Packages
There is a large suite of postprocessing and diagnostic packages developed by NCAR using a combination of NCL and Python scripts that automatically generate a variety of different kinds of plots from model output files and used to evaluate a simulation. They all compute a series of pre-defined metrics and display the plots via a website.
There are five main diagnostics packages:
- Atmosphere
- Ice
- Land
- Ocean
- Climate Variability and Diagnostics Package (CVDP)
Postprocessing and Diagnostics Packages Setup
We will setup everything necessary for you to be able to run the postprocessing and diagnostics packages on the NCAR computers.
Setup your .profile
or .tcshrc
If you have never setup a .profile
or .tcshrc
on cheyenne:
cp /glade/u/home/cstan/.profile ~/.profile
If you already have a .profile
(bash users) or a .tcshrc
(tcsh users), look at the corresponding file and add the necessary items from the .profile file to your file. The .prfile file is located in: ~cstan/
Copy the post-processing scripts to the correct location:
Go to your home directory:
cd
Create a scripts directory and go to it:
mkdir scripts
cd scripts
Copy all files needed:
cp -R /glade/u/home/asphilli/CESM_tutorial/* .
You may get an error about being able to copy a particular file. You can ignore the error.
Put the configuration file into the correct location:
mv hluresfile ../.hluresfile
Setup the python environment for the CESM diagnostics and post-processing scripts
cesm_pp_activate
Create a directory for the CESM postprocessing code:
mkdir /glade/scratch/cstan/cesm-postprocess
Run the postprocessing using create_postprocess
and tell it the name of your 4-year case
create_postprocess --caseroot /glade/scratch/cstan/cesm-postprocess/run.2
Go to the postprocessing directory:
cd /glade/scratch/cstan/cesm-postprocess/run.2
Set the location of the model data:
./pp_config --set DOUT_S_ROOT=/glade/scratch/cstan/archive/run.2
Tell the diagnostics what kinds of grids to expect, our version uses:
./pp_config --set ATM_GRID=1.9x2.5
./pp_config --set LND_GRID=1.9x2.5
./pp_config --set ICE_GRID=gx1v7
./pp_config --set OCN_GRID=gx1v7
./pp_config --set ICE_NX=320
./pp_config --set ICE_NY=384
Key Points
Model Diagnostics Packages
Overview
Teaching: 0 min
Exercises: 0 minQuestions
How do I run the model diagnostics packages?
Objectives
Some requirements for the diagnostics packages
Each component diagnostics package has minimum requirements for how much data must be available to run them:
- Ocean: 12months
- Atmosphere, Land: 14 months
- Ice: 24 months
Some other requirements:
- Only complete years can be analyzed by the packages.
- There must be an additional Dec before the 1st analyzed year or an additional Jan and Feb after the last analyzed year.
- If you have 4 complete years, you must set the first year to analyze to 1 and the last year to 3 or the first year to 2 and the last year to 4.
Run a Diagnostics Package
Select and run a diagnostics package of interest to you:
Atmosphere Diagnostics Package
Edit the settings for the
env_diags_atm.xml
file usingpp_config
./pp_config --set ATMDIAG_OUTPUT_ROOT_PATH=/glade/scratch/cstan/diagnostics-output/atm ./pp_config --set ATMDIAG_test_first_yr=1 ./pp_config --set ATMDIAG_test_nyrs=3
Run the monthly climatologies
qsub -A UGMU0041 atm_averages
You can monitor your job status using
qstat -u
Check the log file in logs to make sure everything ran ok. This should run relatively quickly (only a few minutes)Once the averages are done, you can submit the diagnostics script:
qsub -A UGMU0041 atm_diagnostics
This will also run relatively quickly (few minutes).
Once it is done, you can go to the location of the diagnostics and look at the output via a webpage:cd glade/scratch/cstan/diagnostics-output/atm/diag/run.2-obs.1_3 firefox index.html &
It may be slow for your web browser window to launch and display depending on your bandwidth.
For more information on the Atmosphere (AMWG) Diagnostics Package: http://www.cesm.ucar.edu/working_groups/Atmosphere/amwg-diagnostics-package/
Land Diagnostics Package
Edit the settings for the
env_diags_land.xml
file usingpp_config
./pp_config --set LNDDIAG_OUTPUT_ROOT_PATH=/glade/scratch/cstan/diagnostics-output/lnd ./pp_config --set LNDDIAG_clim_first_yr_1=1 ./pp_config --set LNDDIAG_clim_num_yrs_1=3 ./pp_config --set LNDDIAG_trends_first_yr_1=1 ./pp_config --set LNDDIAG_trends_num_yrs_1=3
Run the monthly climatologies
qsub -A UGMU0041 lnd_averages
You can monitor your job status using
qstat -u
Check the log file in logs to make sure everything ran ok. This should run relatively quickly (only a few minutes)Once the averages are done, you can submit the diagnostics script:
qsub -A UGMU0041 lnd_diagnostics
This will also run relatively quickly (few minutes). Once it is done, you can go to the location of the diagnostics and look at the output via a webpage:
cd /glade/scratch/cstan/diagnostics-output/lnd/diag/run.2-obs.1_3 firefox setsIndex.html &
It may be slow for your web browser window to launch and display depending on your bandwidth.
For more information the Land (LMWG) Diagnostics Package: http://www.cesm.ucar.edu/models/cesm1.2/clm/clm_diagpackage.html
Ocean Diagnostics Package
Edit the settings for the
env_diags_ocn.xml
file usingpp_config
./pp_config --set OCNDIAG_YEAR0=1 ./pp_config --set OCNDIAG_YEAR1=3 ./pp_config --set OCNDIAG_TSERIES_YEAR0=1 ./pp_config --set OCNDIAG_TSERIES_YEAR1=3 ./pp_config --set OCNDIAG_TAVGDIR=/glade/scratch/cstan/diagnostics-output/ocn/climo/tavg.\$OCNDIAG_YEAR0.\$OCNDIAG_YEAR1 ./pp_config --set OCNDIAG_WORKDIR=/glade/scratch/cstan/diagnostics-output/ocn/diag/run.2.\$OCNDIAG_YEAR0.\$OCNDIAG_YEAR1
Run the monthly climatologies
qsub -A UGMU0041 ocn_averages
You can monitor your job status using
qstat -u
Check the log file in logs to make sure everything ran ok. This should run relatively quickly (only a few minutes)Once the averages are done, you can submit the diagnostics script:
qsub -A UGMU0041 ocn_diagnostics
This will also run relatively quickly (few minutes). Once it is done, you can go to the location of the diagnostics and look at the output via a webpage:
cd /glade/scratch/cstan/diagnostics-output/ocn/diag/run.2.1_3 firefox index.html &
It may be slow for your web browser window to launch and display depending on your bandwidth.
Ice Diagnostics Package
Edit the settings for the
env_diags_ic.xml
file usingpp_config
./pp_config --set ICEDIAG_BEGYR_CONT=1 ./pp_config --set ICEDIAG_ENDYR_CONT=3 ./pp_config --set ICEDIAG_YRS_TO_AVG=3 ./pp_config --set ICEDIAG_PATH_CLIMO_CONT=/glade/scratch/cstan/diagnostics-output/ice/climo/\$ICEDIAG_CASE_TO_CONT/ ./pp_config --set ICEDIAG_DIAG_ROOT=/glade/scratch/cstan/diagnostics-output/ice/diag/\$ICEDIAG_CASE_TO_CONT/
Run the monthly climatologies
qsub -A UGMU0041 ice_averages
You can monitor your job status using
qstat -u
Check the log file in logs to make sure everything ran ok. This should run relatively quickly (only a few minutes)Once the averages are done, you can submit the diagnostics script:
qsub -A UGMU0041 ice_diagnostics
This will also run relatively quickly (few minutes). Once it is done, you can go to the location of the diagnostics and look at the output via a webpage:
cd /glade/scratch/cstan/diagnostics-output/ice/diag/run.2.1_3 firefox index.html &
It may be slow for your web browser window to launch and display depending on your bandwidth.
Try the CVDP
Our runs are not long enough to run the CVDP, but you can test it on an existing long simulation, the CESM Large Ensemble.
On Cheyenne, we need to get an analysis node to Casper
execdav --account=UGMU0041
cd ~/scripts/CVDP
Open the file namelist
using your preferred text editor
The format of the file is:
Run Name | Path to all data for a simulatoin | Analysis start year | Analysis end year
Modify the rows so that the analysis start and end years are 1979 and 2015.
Open up the file driver.ncl
using your preferred text editor
On line 7: replace user
with your username
On line 19: change False
to True
to output calculations in nceCDF
Run the CVDP by typing
ncl driver.ncl
It will take ~20 minutes. Once it is complete, go to the output directory and open a firefox window
cd /glade/scatch/cstan/CVDP
firefox index.html&
Key Points
Postprocessing
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What are some common run time configuration changes?
How do I make these changess?
Objectives
The process of going from history
files to timeseries
files and to convert 3D atmospheric data from the model coordinate system to selected pressure levels. We will learn how to use the CESM Postprocessing Tools
The post processing scripts are located in your ~/scripts/ directory.
You can find them using ls *create*
.
Open the script in a text editor (e.g. gedit, vi, emacs)
Change the lines of the script relevant for your run, for example, in `atm.create_timeseries.ncl’
run_name = “run.2” styr = 1 enyr = 4 work_dir = “/glade/scratch/cstan/” archive_dir = “/glade/scratch/cstan/archive/”+run_name+”/atm/hist”
Run the postprocessing script
ncl atm.create_timeseries.ncl
The timeseries files are located in:
/glade/scratch/cstan/processed/<case name>/
You can take a quick look at them in ncview
.
To do Assignment #3, you can read them in using xarray
Run the post-processing for whichever component is of interest to you.
Key Points