Loading data directly from ESGF nodes#
This notebook demonstrates searching for and loading CMIP simulations from ESGF, using the intake_esgf package.
This code is taken from the intake-esgf tutorial. Please refer there for a fuller explanation of each step.
The data search uses the standard CMIP directory structure, see here for an explanation. We make use of xarray; see this turorial for an introduction.
Key steps:
Install and configure
intake_esgffor ESGF data accessSearch for specific climate variables and experiments
Load data into xarray datasets for analysis
Handle cases where data might be on different ESGF nodes
Install intake_esgf#
The intake_esgf package provides a Python interface to search and load data from ESGF nodes. We install it here since it’s not part of the standard Pangeo environment.
%pip install intake_esgf
Collecting intake_esgf
Downloading intake_esgf-2025.9.26-py3-none-any.whl.metadata (6.2 kB)
Requirement already satisfied: dask>=2024.12.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2025.9.1)
Collecting globus-sdk>=3.49.0 (from intake_esgf)
Downloading globus_sdk-3.64.0-py3-none-any.whl.metadata (2.1 kB)
Collecting netcdf4>=1.7.2 (from intake_esgf)
Downloading netCDF4-1.7.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (1.8 kB)
Requirement already satisfied: pandas>=2.2.3 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2.3.2)
Collecting pystac-client>=0.8.6 (from intake_esgf)
Downloading pystac_client-0.9.0-py3-none-any.whl.metadata (3.1 kB)
Requirement already satisfied: pyyaml>=6.0.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (6.0.2)
Requirement already satisfied: requests>=2.32.3 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2.32.4)
Collecting requests-cache>=1.2.1 (from intake_esgf)
Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Requirement already satisfied: tqdm>=4.67.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (4.67.1)
Requirement already satisfied: xarray>=2024.11.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2025.4.0)
Requirement already satisfied: click>=8.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (8.2.1)
Requirement already satisfied: cloudpickle>=3.0.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (3.1.1)
Requirement already satisfied: fsspec>=2021.09.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (2025.7.0)
Requirement already satisfied: packaging>=20.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (25.0)
Requirement already satisfied: partd>=1.4.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (1.4.2)
Requirement already satisfied: toolz>=0.10.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (1.0.0)
Collecting pyjwt<3.0.0,>=2.0.0 (from pyjwt[crypto]<3.0.0,>=2.0.0->globus-sdk>=3.49.0->intake_esgf)
Using cached PyJWT-2.10.1-py3-none-any.whl.metadata (4.0 kB)
Collecting cryptography!=3.4.0,>=3.3.1 (from globus-sdk>=3.49.0->intake_esgf)
Downloading cryptography-46.0.1-cp311-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (3.4.3)
Requirement already satisfied: idna<4,>=2.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (2025.8.3)
Collecting cffi>=2.0.0 (from cryptography!=3.4.0,>=3.3.1->globus-sdk>=3.49.0->intake_esgf)
Downloading cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (2.6 kB)
Requirement already satisfied: pycparser in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from cffi>=2.0.0->cryptography!=3.4.0,>=3.3.1->globus-sdk>=3.49.0->intake_esgf) (2.23)
Requirement already satisfied: cftime in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from netcdf4>=1.7.2->intake_esgf) (1.6.4.post1)
Requirement already satisfied: numpy in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from netcdf4>=1.7.2->intake_esgf) (2.3.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2025.2)
Requirement already satisfied: locket in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from partd>=1.4.0->dask>=2024.12.0->intake_esgf) (1.0.0)
Collecting pystac>=1.10.0 (from pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf)
Downloading pystac-1.14.1-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: jsonschema~=4.18 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (4.25.0)
Requirement already satisfied: attrs>=22.2.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (25.3.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (2025.4.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (0.36.2)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (0.27.0)
Requirement already satisfied: six>=1.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.2.3->intake_esgf) (1.17.0)
Collecting cattrs>=22.2 (from requests-cache>=1.2.1->intake_esgf)
Downloading cattrs-25.2.0-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: platformdirs>=2.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests-cache>=1.2.1->intake_esgf) (4.3.8)
Collecting url-normalize>=1.4 (from requests-cache>=1.2.1->intake_esgf)
Downloading url_normalize-2.2.1-py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: typing-extensions>=4.12.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from cattrs>=22.2->requests-cache>=1.2.1->intake_esgf) (4.14.1)
Downloading intake_esgf-2025.9.26-py3-none-any.whl (61 kB)
Downloading globus_sdk-3.64.0-py3-none-any.whl (416 kB)
Using cached PyJWT-2.10.1-py3-none-any.whl (22 kB)
Downloading cryptography-46.0.1-cp311-abi3-macosx_10_9_universal2.whl (7.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 7.6 MB/s 0:00:00 eta 0:00:01
?25hDownloading cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl (181 kB)
Downloading netCDF4-1.7.2-cp313-cp313-macosx_14_0_arm64.whl (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 11.8 MB/s 0:00:00 eta 0:00:01
?25hDownloading pystac_client-0.9.0-py3-none-any.whl (41 kB)
Downloading pystac-1.14.1-py3-none-any.whl (207 kB)
Downloading requests_cache-1.2.1-py3-none-any.whl (61 kB)
Downloading cattrs-25.2.0-py3-none-any.whl (70 kB)
Downloading url_normalize-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: url-normalize, pyjwt, cffi, cattrs, requests-cache, pystac, netcdf4, cryptography, globus-sdk, pystac-client, intake_esgf
Attempting uninstall: cffi
Found existing installation: cffi 1.17.1
Uninstalling cffi-1.17.1:
Successfully uninstalled cffi-1.17.1━━━━━━━━━━━━━━━━━━━━━━━━ 2/11 [cffi]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11/11 [intake_esgf] [globus-sdk]
Successfully installed cattrs-25.2.0 cffi-2.0.0 cryptography-46.0.1 globus-sdk-3.64.0 intake_esgf-2025.9.26 netcdf4-1.7.2 pyjwt-2.10.1 pystac-1.14.1 pystac-client-0.9.0 requests-cache-1.2.1 url-normalize-2.2.1
Note: you may need to restart the kernel to use updated packages.
Initialize the ESGF catalog and explore available variables#
We create an ESGFCatalog object that provides methods to search for and load climate model data from ESGF nodes. This catalog acts as our interface to the distributed climate data archive.
Before searching for specific data, we can explore what variables are available. The variable_info() method helps us understand the naming conventions and find the right variable IDs for our search.
import intake_esgf
cat = intake_esgf.ESGFCatalog()
cat.variable_info("temperature air surface")
| cf_standard_name | variable_units | variable_long_name | |
|---|---|---|---|
| variable_id | |||
| hfls | surface_upward_latent_heat_flux | W m-2 | Surface Upward Latent Heat Flux |
| hfss | surface_upward_sensible_heat_flux | W m-2 | Surface Upward Sensible Heat Flux |
| rlds | surface_downwelling_longwave_flux_in_air | W m-2 | Surface Downwelling Longwave Radiation |
| rsds | surface_downwelling_shortwave_flux_in_air | W m-2 | Surface Downwelling Shortwave Radiation |
| sfcWind | wind_speed | m s-1 | Near-Surface Wind Speed |
| ta | air_temperature | K | Air Temperature |
| tas | air_temperature | K | Near-Surface Air Temperature |
| tasmax | air_temperature | K | Daily Maximum Near-Surface Air Temperature |
| tasmin | air_temperature | K | Daily Minimum Near-Surface Air Temperature |
| vas | northward_wind | m s-1 | Northward Near-Surface Wind |
cat.search(variable_id="tas", experiment_id="historical")
Summary information for 1687 results:
mip_era [CMIP6]
activity_drs [CMIP]
institution_id [IPSL, NASA-GISS, NCAR, MPI-M, MRI, CNRM-CERFA...
source_id [IPSL-CM6A-LR, GISS-E2-1-H, CESM2, GISS-E2-1-G...
experiment_id [historical]
member_id [r9i1p1f1, r7i1p1f1, r4i1p1f1, r27i1p1f1, r5i1...
table_id [ImonGre, ImonAnt, Amon, day, 3hr, 6hrPlevPt, ...
variable_id [tas]
grid_label [grg, gra, gr, gn, gr1, gr2]
dtype: object
cat.search(
project='CMIP6',
experiment_id='historical',
source_id='CESM2',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label='r1i1p1f1' # ensemble member
)
Summary information for 1 results:
mip_era [CMIP6]
activity_drs [CMIP]
institution_id [NCAR]
source_id [CESM2]
experiment_id [historical]
member_id [r1i1p1f1]
table_id [Amon]
variable_id [tas]
grid_label [gn]
dtype: object
Load data into xarray#
The to_dataset_dict() method downloads and loads the data into xarray datasets. This creates a dictionary where keys are variable names and values are xarray datasets containing the actual climate data.
Note: This step downloads data from remote ESGF nodes, so it may take some time depending on your internet connection and the size of the dataset.
dsd = cat.to_dataset_dict() # dsd is a dictionary of xarray datasets
Downloading 243.0 [Mb]...
ds = dsd['tas'] # DataSet: subsetting the dictionary on the variable name gives the xarray DataSet containing the tas data
da = dsd['tas']['tas'] # DataArray: selecting the variable tas on the DataSet gives the xarray DataArray of tas data
print('DataSet dictionary: ', dsd)
print('DataSet: ', ds) #
print('DataArray: ', da) #
DataSet dictionary: {'tas': <xarray.Dataset> Size: 438MB
Dimensions: (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
tas (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
time_bnds (time, nbnd) object 32kB ...
lat_bnds (lat, nbnd) float32 2kB ...
lon_bnds (lon, nbnd) float32 2kB ...
areacella (lat, lon) float32 221kB ...
Attributes: (12/47)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 15
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.001
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-16T23:34:05Z
... ...
branch_time_in_parent: 219000.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
activity_drs: CMIP
member_id: r1i1p1f1}
DataSet: <xarray.Dataset> Size: 438MB
Dimensions: (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
tas (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
time_bnds (time, nbnd) object 32kB ...
lat_bnds (lat, nbnd) float32 2kB ...
lon_bnds (lon, nbnd) float32 2kB ...
areacella (lat, lon) float32 221kB ...
Attributes: (12/47)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 15
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.001
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-16T23:34:05Z
... ...
branch_time_in_parent: 219000.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
activity_drs: CMIP
member_id: r1i1p1f1
DataArray: <xarray.DataArray 'tas' (time: 1980, lat: 192, lon: 288)> Size: 438MB
array([[[245.32208, 245.32208, ..., 245.32208, 245.32208],
[246.10596, 246.06238, ..., 246.15019, 246.12573],
...,
[245.02821, 245.0406 , ..., 244.99951, 245.01454],
[244.50035, 244.50319, ..., 244.49379, 244.49722]],
[[232.51073, 232.51073, ..., 232.51073, 232.51073],
[233.30011, 233.26118, ..., 233.32066, 233.31026],
...,
[244.68976, 244.70775, ..., 244.64677, 244.6693 ],
[243.6899 , 243.6928 , ..., 243.68317, 243.68669]],
...,
[[234.63194, 234.63194, ..., 234.63194, 234.63194],
[235.37543, 235.35039, ..., 235.38136, 235.37898],
...,
[256.5771 , 256.58975, ..., 256.5506 , 256.56418],
[256.69495, 256.69467, ..., 256.69556, 256.69522]],
[[246.79817, 246.79817, ..., 246.79817, 246.79817],
[247.46426, 247.42882, ..., 247.48152, 247.47386],
...,
[244.81926, 244.83385, ..., 244.78955, 244.80447],
[245.01997, 245.01904, ..., 245.02213, 245.021 ]]], dtype=float32)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Attributes: (12/19)
cell_measures: area: areacella
cell_methods: area: time: mean
comment: near-surface (usually, 2 meter) air temperature
description: near-surface (usually, 2 meter) air temperature
frequency: mon
id: tas
... ...
time_label: time-mean
time_title: Temporal mean
title: Near-Surface Air Temperature
type: real
units: K
variable_id: tas
We create a simple map showing the time-averaged surface air temperature.
Calculate temporal means
Convert from Kelvin to Celsius
Create map
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 4), tight_layout=True)
ds = dsd["tas"]["tas"].mean(dim="time") - 273.15 # to [C]
ds.plot(ax=ax, cmap="bwr", vmin=-40, vmax=40,
extend='both',cbar_kwargs={"label": "tas [C]"})
<matplotlib.collections.QuadMesh at 0x7fc8d2b19a90>
Troubleshooting: When searches return no results#
Sometimes a search returns no results even when the data exists. This often happens because:
Limited search scope: By default, intake_esgf only searches a subset of ESGF nodes for performance
Data on different nodes: The data might be available on a different ESGF node than the default ones
For example, the code below shows a search for G6sulfur experiment data that will likely fail with the default configuration even though the data is available on CEDA ESGF node.
cat.search(
project='CMIP6',
experiment_id='G6sulfur',
source_id='UKESM1-0-LL',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label=['r1i1p1f2'] # ensemble member
)
---------------------------------------------------------------------------
NoSearchResults Traceback (most recent call last)
Cell In[3], line 1
----> 1 cat.search(
2 project='CMIP6',
3 experiment_id='G6sulfur',
4 source_id='UKESM1-0-LL',
5 variable_id='tas', # surface air temperature
6 table_id='Amon', # monthly atmospheric data
7 variant_label=['r1i1p1f2'] # ensemble member
8 )
File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/catalog.py:336, in ESGFCatalog.search(self, quiet, **search)
334 search_time = time.time()
335 dfs = ThreadPool(len(self.indices)).imap_unordered(_search, self.indices)
--> 336 self.df = base.combine_results(
337 tqdm(
338 dfs,
339 disable=quiet,
340 bar_format=base.bar_format,
341 unit="index",
342 unit_scale=False,
343 desc="Searching indices",
344 ascii=False,
345 total=len(self.indices),
346 )
347 )
348 self._set_project()
350 # even though we are using latest=True, because the search is distributed, we
351 # may have different versions from different indices.
File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/base.py:213, in combine_results(dfs)
211 if len(df) == 0:
212 logger.info("\x1b[36;32msearch end \x1b[91;20mno results\033[0m")
--> 213 raise NoSearchResults()
214 # retrieve project information about how to combine results
215 project_id = df["project"].unique()
NoSearchResults:
Solution: Widening search criteria#
To search across all available ESGF nodes, we need to:
Set
all_indices=Truein the intake_esgf configurationCreate a new catalog instance with the updated configuration
This ensures we search all available data sources, though it may take longer to complete.
## run the following line to widen search criteria to include all ESGF nodes
intake_esgf.conf.set(all_indices=True)
cat = ESGFCatalog()
cat.search(
project='CMIP6',
experiment_id='G6sulfur',
source_id='UKESM1-0-LL',
variable_id='tas', # surface air temperature
table_id='Amon', # monthly atmospheric data
variant_label=['r1i1p1f2'] # ensemble member
)
Summary information for 1 results:
mip_era [CMIP6]
activity_drs [GeoMIP]
institution_id [MOHC]
source_id [UKESM1-0-LL]
experiment_id [G6sulfur]
member_id [r1i1p1f2]
table_id [Amon]
variable_id [tas]
grid_label [gn]
dtype: object