Loading data directly from ESGF nodes#
This notebook demonstrates searching for and loading CMIP simulations from ESGF, using the intake_esgf package.
This code is taken from the intake-esgf tutorial. Please refer there for a fuller explanation of each step.
The data search uses the standard CMIP directory structure, see here for an explanation. We make use of xarray; see this turorial for an introduction.
# on starting the server, we need to run the line below once, as intake_esgf is not yet in our standard pangeo environment
%pip install intake_esgf
Collecting intake_esgf
Downloading intake_esgf-2024.12.7-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2.2.3)
Requirement already satisfied: dask in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2025.1.0)
Requirement already satisfied: xarray in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2025.1.1)
Requirement already satisfied: netCDF4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (1.7.2)
Collecting globus-sdk (from intake_esgf)
Downloading globus_sdk-3.55.0-py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: requests in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2.32.3)
Requirement already satisfied: tqdm[notebook] in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (4.67.1)
Requirement already satisfied: pyyaml in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (6.0.2)
Requirement already satisfied: click>=8.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (8.1.8)
Requirement already satisfied: cloudpickle>=3.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (3.1.1)
Requirement already satisfied: fsspec>=2021.09.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (2024.12.0)
Requirement already satisfied: packaging>=20.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (24.2)
Requirement already satisfied: partd>=1.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (1.4.2)
Requirement already satisfied: toolz>=0.10.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (1.0.0)
Requirement already satisfied: pyjwt<3.0.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pyjwt[crypto]<3.0.0,>=2.0.0->globus-sdk->intake_esgf) (2.10.1)
Requirement already satisfied: cryptography!=3.4.0,>=3.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from globus-sdk->intake_esgf) (43.0.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (2024.12.14)
Requirement already satisfied: cftime in /srv/conda/envs/notebook/lib/python3.12/site-packages (from netCDF4->intake_esgf) (1.6.4)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.12/site-packages (from netCDF4->intake_esgf) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2025.1)
Requirement already satisfied: ipywidgets>=6 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from tqdm[notebook]->intake_esgf) (8.1.5)
Requirement already satisfied: cffi>=1.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cryptography!=3.4.0,>=3.3.1->globus-sdk->intake_esgf) (1.17.1)
Requirement already satisfied: comm>=0.1.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.2)
Requirement already satisfied: ipython>=6.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (8.17.2)
Requirement already satisfied: traitlets>=4.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (5.14.3)
Requirement already satisfied: widgetsnbextension~=4.0.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (4.0.13)
Requirement already satisfied: jupyterlab_widgets~=3.0.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.13)
Requirement already satisfied: locket in /srv/conda/envs/notebook/lib/python3.12/site-packages (from partd>=1.4.0->dask->intake_esgf) (1.0.0)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->intake_esgf) (1.17.0)
Requirement already satisfied: pycparser in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cffi>=1.12->cryptography!=3.4.0,>=3.3.1->globus-sdk->intake_esgf) (2.22)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.19.2)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.1.7)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.38)
Requirement already satisfied: pygments>=2.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (2.19.1)
Requirement already satisfied: stack-data in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.6.3)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (4.9.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.8.4)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.12/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.13)
Requirement already satisfied: executing>=1.2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (2.1.0)
Requirement already satisfied: asttokens>=2.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.0)
Requirement already satisfied: pure_eval in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.3)
Downloading intake_esgf-2024.12.7-py3-none-any.whl (36 kB)
Downloading globus_sdk-3.55.0-py3-none-any.whl (408 kB)
Installing collected packages: globus-sdk, intake_esgf
Successfully installed globus-sdk-3.55.0 intake_esgf-2024.12.7
Note: you may need to restart the kernel to use updated packages.
import intake_esgf
cat = ESGFCatalog()
cat.variable_info("temperature air surface")
cf_standard_name | variable_units | variable_long_name | |
---|---|---|---|
variable_id | |||
hfls | surface_upward_latent_heat_flux | W m-2 | Surface Upward Latent Heat Flux |
hfss | surface_upward_sensible_heat_flux | W m-2 | Surface Upward Sensible Heat Flux |
rlds | surface_downwelling_longwave_flux_in_air | W m-2 | Surface Downwelling Longwave Radiation |
rsds | surface_downwelling_shortwave_flux_in_air | W m-2 | Surface Downwelling Shortwave Radiation |
sfcWind | wind_speed | m s-1 | Near-Surface Wind Speed |
ta | air_temperature | K | Air Temperature |
tas | air_temperature | K | Near-Surface Air Temperature |
tasmax | air_temperature | K | Daily Maximum Near-Surface Air Temperature |
tasmin | air_temperature | K | Daily Minimum Near-Surface Air Temperature |
vas | northward_wind | m s-1 | Northward Near-Surface Wind |
cat.search(variable_id="tas", experiment_id="historical")
Summary information for 1687 results:
mip_era [CMIP6]
activity_drs [CMIP]
institution_id [IPSL, NASA-GISS, NCAR, MPI-M, MRI, CNRM-CERFA...
source_id [IPSL-CM6A-LR, GISS-E2-1-H, CESM2, GISS-E2-1-G...
experiment_id [historical]
member_id [r9i1p1f1, r7i1p1f1, r4i1p1f1, r27i1p1f1, r5i1...
table_id [ImonGre, ImonAnt, Amon, day, 3hr, 6hrPlevPt, ...
variable_id [tas]
grid_label [grg, gra, gr, gn, gr1, gr2]
dtype: object
cat.search(
project='CMIP6',
experiment_id='historical',
source_id='CESM2',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label='r1i1p1f1' # ensemble member
)
Summary information for 1 results:
mip_era [CMIP6]
activity_drs [CMIP]
institution_id [NCAR]
source_id [CESM2]
experiment_id [historical]
member_id [r1i1p1f1]
table_id [Amon]
variable_id [tas]
grid_label [gn]
dtype: object
dsd = cat.to_dataset_dict() # dsd is a dictionary of xarray datasets
Downloading 243.0 [Mb]...
ds = dsd['tas'] # DataSet: subsetting the dictionary on the variable name gives the xarray DataSet containing the tas data
da = dsd['tas']['tas'] # DataArray: selecting the variable tas on the DataSet gives the xarray DataArray of tas data
print('DataSet dictionary: ', dsd)
print('DataSet: ', ds) #
print('DataArray: ', da) #
DataSet dictionary: {'tas': <xarray.Dataset> Size: 438MB
Dimensions: (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
tas (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
time_bnds (time, nbnd) object 32kB ...
lat_bnds (lat, nbnd) float32 2kB ...
lon_bnds (lon, nbnd) float32 2kB ...
areacella (lat, lon) float32 221kB ...
Attributes: (12/47)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 15
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.001
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-16T23:34:05Z
... ...
branch_time_in_parent: 219000.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
activity_drs: CMIP
member_id: r1i1p1f1}
DataSet: <xarray.Dataset> Size: 438MB
Dimensions: (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
tas (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
time_bnds (time, nbnd) object 32kB ...
lat_bnds (lat, nbnd) float32 2kB ...
lon_bnds (lon, nbnd) float32 2kB ...
areacella (lat, lon) float32 221kB ...
Attributes: (12/47)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 15
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.001
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-16T23:34:05Z
... ...
branch_time_in_parent: 219000.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
activity_drs: CMIP
member_id: r1i1p1f1
DataArray: <xarray.DataArray 'tas' (time: 1980, lat: 192, lon: 288)> Size: 438MB
array([[[245.32208, 245.32208, ..., 245.32208, 245.32208],
[246.10596, 246.06238, ..., 246.15019, 246.12573],
...,
[245.02821, 245.0406 , ..., 244.99951, 245.01454],
[244.50035, 244.50319, ..., 244.49379, 244.49722]],
[[232.51073, 232.51073, ..., 232.51073, 232.51073],
[233.30011, 233.26118, ..., 233.32066, 233.31026],
...,
[244.68976, 244.70775, ..., 244.64677, 244.6693 ],
[243.6899 , 243.6928 , ..., 243.68317, 243.68669]],
...,
[[234.63194, 234.63194, ..., 234.63194, 234.63194],
[235.37543, 235.35039, ..., 235.38136, 235.37898],
...,
[256.5771 , 256.58975, ..., 256.5506 , 256.56418],
[256.69495, 256.69467, ..., 256.69556, 256.69522]],
[[246.79817, 246.79817, ..., 246.79817, 246.79817],
[247.46426, 247.42882, ..., 247.48152, 247.47386],
...,
[244.81926, 244.83385, ..., 244.78955, 244.80447],
[245.01997, 245.01904, ..., 245.02213, 245.021 ]]], dtype=float32)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Attributes: (12/19)
cell_measures: area: areacella
cell_methods: area: time: mean
comment: near-surface (usually, 2 meter) air temperature
description: near-surface (usually, 2 meter) air temperature
frequency: mon
id: tas
... ...
time_label: time-mean
time_title: Temporal mean
title: Near-Surface Air Temperature
type: real
units: K
variable_id: tas
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 4), tight_layout=True)
ds = dsd["tas"]["tas"].mean(dim="time") - 273.15 # to [C]
ds.plot(ax=ax, cmap="bwr", vmin=-40, vmax=40,
extend='both',cbar_kwargs={"label": "tas [C]"})
<matplotlib.collections.QuadMesh at 0x7fc8d2b19a90>

Widening search criteria if data aren’t found#
The default is for intake-esgf NOT to search over all possible sources of data (for performance reasons). But this means we may come back empty handed on a search when in fact the simulation data is available somewhere.
for example, the code below will throw a “NoSearchResults” errror even though this simulation is available on the CEDA ESGF node:
cat.search(
project='CMIP6',
experiment_id='G6sulfur',
source_id='UKESM1-0-LL',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label=['r1i1p1f2'] # ensemble member
)
---------------------------------------------------------------------------
NoSearchResults Traceback (most recent call last)
Cell In[3], line 1
----> 1 cat.search(
2 project='CMIP6',
3 experiment_id='G6sulfur',
4 source_id='UKESM1-0-LL',
5 variable_id='tas', # surface air temperature
6 table_id='Amon', # monthly atmospheric data
7 variant_label=['r1i1p1f2'] # ensemble member
8 )
File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/catalog.py:336, in ESGFCatalog.search(self, quiet, **search)
334 search_time = time.time()
335 dfs = ThreadPool(len(self.indices)).imap_unordered(_search, self.indices)
--> 336 self.df = base.combine_results(
337 tqdm(
338 dfs,
339 disable=quiet,
340 bar_format=base.bar_format,
341 unit="index",
342 unit_scale=False,
343 desc="Searching indices",
344 ascii=False,
345 total=len(self.indices),
346 )
347 )
348 self._set_project()
350 # even though we are using latest=True, because the search is distributed, we
351 # may have different versions from different indices.
File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/base.py:213, in combine_results(dfs)
211 if len(df) == 0:
212 logger.info("\x1b[36;32msearch end \x1b[91;20mno results\033[0m")
--> 213 raise NoSearchResults()
214 # retrieve project information about how to combine results
215 project_id = df["project"].unique()
NoSearchResults:
The solution is to run the following two lines, which turn on all ESGF nodes for the search:
## run the following line to widen search criteria to include all ESGF nodes
intake_esgf.conf.set(all_indices=True)
cat = ESGFCatalog()
cat.search(
project='CMIP6',
experiment_id='G6sulfur',
source_id='UKESM1-0-LL',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label=['r1i1p1f2'] # ensemble member
)
Summary information for 1 results:
mip_era [CMIP6]
activity_drs [GeoMIP]
institution_id [MOHC]
source_id [UKESM1-0-LL]
experiment_id [G6sulfur]
member_id [r1i1p1f2]
table_id [Amon]
variable_id [tas]
grid_label [gn]
dtype: object