Loading data directly from ESGF nodes#

  • This notebook demonstrates searching for and loading CMIP simulations from ESGF, using the intake_esgf package.

  • This code is taken from the intake-esgf tutorial. Please refer there for a fuller explanation of each step.

  • The data search uses the standard CMIP directory structure, see here for an explanation. We make use of xarray; see this turorial for an introduction.

Key steps:

  1. Install and configure intake_esgf for ESGF data access

  2. Search for specific climate variables and experiments

  3. Load data into xarray datasets for analysis

  4. Handle cases where data might be on different ESGF nodes

Install intake_esgf#

The intake_esgf package provides a Python interface to search and load data from ESGF nodes. We install it here since it’s not part of the standard Pangeo environment.

%pip install intake_esgf
Collecting intake_esgf
  Downloading intake_esgf-2025.9.26-py3-none-any.whl.metadata (6.2 kB)
Requirement already satisfied: dask>=2024.12.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2025.9.1)
Collecting globus-sdk>=3.49.0 (from intake_esgf)
  Downloading globus_sdk-3.64.0-py3-none-any.whl.metadata (2.1 kB)
Collecting netcdf4>=1.7.2 (from intake_esgf)
  Downloading netCDF4-1.7.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (1.8 kB)
Requirement already satisfied: pandas>=2.2.3 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2.3.2)
Collecting pystac-client>=0.8.6 (from intake_esgf)
  Downloading pystac_client-0.9.0-py3-none-any.whl.metadata (3.1 kB)
Requirement already satisfied: pyyaml>=6.0.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (6.0.2)
Requirement already satisfied: requests>=2.32.3 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2.32.4)
Collecting requests-cache>=1.2.1 (from intake_esgf)
  Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Requirement already satisfied: tqdm>=4.67.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (4.67.1)
Requirement already satisfied: xarray>=2024.11.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from intake_esgf) (2025.4.0)
Requirement already satisfied: click>=8.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (8.2.1)
Requirement already satisfied: cloudpickle>=3.0.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (3.1.1)
Requirement already satisfied: fsspec>=2021.09.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (2025.7.0)
Requirement already satisfied: packaging>=20.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (25.0)
Requirement already satisfied: partd>=1.4.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (1.4.2)
Requirement already satisfied: toolz>=0.10.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from dask>=2024.12.0->intake_esgf) (1.0.0)
Collecting pyjwt<3.0.0,>=2.0.0 (from pyjwt[crypto]<3.0.0,>=2.0.0->globus-sdk>=3.49.0->intake_esgf)
  Using cached PyJWT-2.10.1-py3-none-any.whl.metadata (4.0 kB)
Collecting cryptography!=3.4.0,>=3.3.1 (from globus-sdk>=3.49.0->intake_esgf)
  Downloading cryptography-46.0.1-cp311-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (3.4.3)
Requirement already satisfied: idna<4,>=2.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests>=2.32.3->intake_esgf) (2025.8.3)
Collecting cffi>=2.0.0 (from cryptography!=3.4.0,>=3.3.1->globus-sdk>=3.49.0->intake_esgf)
  Downloading cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (2.6 kB)
Requirement already satisfied: pycparser in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from cffi>=2.0.0->cryptography!=3.4.0,>=3.3.1->globus-sdk>=3.49.0->intake_esgf) (2.23)
Requirement already satisfied: cftime in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from netcdf4>=1.7.2->intake_esgf) (1.6.4.post1)
Requirement already satisfied: numpy in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from netcdf4>=1.7.2->intake_esgf) (2.3.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pandas>=2.2.3->intake_esgf) (2025.2)
Requirement already satisfied: locket in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from partd>=1.4.0->dask>=2024.12.0->intake_esgf) (1.0.0)
Collecting pystac>=1.10.0 (from pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf)
  Downloading pystac-1.14.1-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: jsonschema~=4.18 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (4.25.0)
Requirement already satisfied: attrs>=22.2.0 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (25.3.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (2025.4.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (0.36.2)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from jsonschema~=4.18->pystac[validation]>=1.10.0->pystac-client>=0.8.6->intake_esgf) (0.27.0)
Requirement already satisfied: six>=1.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas>=2.2.3->intake_esgf) (1.17.0)
Collecting cattrs>=22.2 (from requests-cache>=1.2.1->intake_esgf)
  Downloading cattrs-25.2.0-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: platformdirs>=2.5 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from requests-cache>=1.2.1->intake_esgf) (4.3.8)
Collecting url-normalize>=1.4 (from requests-cache>=1.2.1->intake_esgf)
  Downloading url_normalize-2.2.1-py3-none-any.whl.metadata (5.6 kB)
Requirement already satisfied: typing-extensions>=4.12.2 in /Users/susannebaur/miniconda3/envs/CloudHub/lib/python3.13/site-packages (from cattrs>=22.2->requests-cache>=1.2.1->intake_esgf) (4.14.1)
Downloading intake_esgf-2025.9.26-py3-none-any.whl (61 kB)
Downloading globus_sdk-3.64.0-py3-none-any.whl (416 kB)
Using cached PyJWT-2.10.1-py3-none-any.whl (22 kB)
Downloading cryptography-46.0.1-cp311-abi3-macosx_10_9_universal2.whl (7.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 7.6 MB/s  0:00:00 eta 0:00:01
?25hDownloading cffi-2.0.0-cp313-cp313-macosx_11_0_arm64.whl (181 kB)
Downloading netCDF4-1.7.2-cp313-cp313-macosx_14_0_arm64.whl (2.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 11.8 MB/s  0:00:00 eta 0:00:01
?25hDownloading pystac_client-0.9.0-py3-none-any.whl (41 kB)
Downloading pystac-1.14.1-py3-none-any.whl (207 kB)
Downloading requests_cache-1.2.1-py3-none-any.whl (61 kB)
Downloading cattrs-25.2.0-py3-none-any.whl (70 kB)
Downloading url_normalize-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: url-normalize, pyjwt, cffi, cattrs, requests-cache, pystac, netcdf4, cryptography, globus-sdk, pystac-client, intake_esgf
  Attempting uninstall: cffi
    Found existing installation: cffi 1.17.1
    Uninstalling cffi-1.17.1:
      Successfully uninstalled cffi-1.17.1━━━━━━━━━━━━━━━━━━━━━━━━  2/11 [cffi]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11/11 [intake_esgf] [globus-sdk]
Successfully installed cattrs-25.2.0 cffi-2.0.0 cryptography-46.0.1 globus-sdk-3.64.0 intake_esgf-2025.9.26 netcdf4-1.7.2 pyjwt-2.10.1 pystac-1.14.1 pystac-client-0.9.0 requests-cache-1.2.1 url-normalize-2.2.1
Note: you may need to restart the kernel to use updated packages.

Initialize the ESGF catalog and explore available variables#

We create an ESGFCatalog object that provides methods to search for and load climate model data from ESGF nodes. This catalog acts as our interface to the distributed climate data archive.

Before searching for specific data, we can explore what variables are available. The variable_info() method helps us understand the naming conventions and find the right variable IDs for our search.

import intake_esgf
cat = intake_esgf.ESGFCatalog()
cat.variable_info("temperature air surface")
cf_standard_name variable_units variable_long_name
variable_id
hfls surface_upward_latent_heat_flux W m-2 Surface Upward Latent Heat Flux
hfss surface_upward_sensible_heat_flux W m-2 Surface Upward Sensible Heat Flux
rlds surface_downwelling_longwave_flux_in_air W m-2 Surface Downwelling Longwave Radiation
rsds surface_downwelling_shortwave_flux_in_air W m-2 Surface Downwelling Shortwave Radiation
sfcWind wind_speed m s-1 Near-Surface Wind Speed
ta air_temperature K Air Temperature
tas air_temperature K Near-Surface Air Temperature
tasmax air_temperature K Daily Maximum Near-Surface Air Temperature
tasmin air_temperature K Daily Minimum Near-Surface Air Temperature
vas northward_wind m s-1 Northward Near-Surface Wind
cat.search(variable_id="tas", experiment_id="historical")
Summary information for 1687 results:
mip_era                                                     [CMIP6]
activity_drs                                                 [CMIP]
institution_id    [IPSL, NASA-GISS, NCAR, MPI-M, MRI, CNRM-CERFA...
source_id         [IPSL-CM6A-LR, GISS-E2-1-H, CESM2, GISS-E2-1-G...
experiment_id                                          [historical]
member_id         [r9i1p1f1, r7i1p1f1, r4i1p1f1, r27i1p1f1, r5i1...
table_id          [ImonGre, ImonAnt, Amon, day, 3hr, 6hrPlevPt, ...
variable_id                                                   [tas]
grid_label                             [grg, gra, gr, gn, gr1, gr2]
dtype: object
cat.search(
    project='CMIP6',
    experiment_id='historical',
    source_id='CESM2',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label='r1i1p1f1'  # ensemble member
)
Summary information for 1 results:
mip_era                [CMIP6]
activity_drs            [CMIP]
institution_id          [NCAR]
source_id              [CESM2]
experiment_id     [historical]
member_id           [r1i1p1f1]
table_id                [Amon]
variable_id              [tas]
grid_label                [gn]
dtype: object

Load data into xarray#

The to_dataset_dict() method downloads and loads the data into xarray datasets. This creates a dictionary where keys are variable names and values are xarray datasets containing the actual climate data.

Note: This step downloads data from remote ESGF nodes, so it may take some time depending on your internet connection and the size of the dataset.

dsd = cat.to_dataset_dict() # dsd is a dictionary of xarray datasets
Downloading 243.0 [Mb]...
ds = dsd['tas'] # DataSet: subsetting the dictionary on the variable name gives the xarray DataSet containing the tas data
da = dsd['tas']['tas'] # DataArray: selecting the variable tas on the DataSet gives the xarray DataArray of tas data
print('DataSet dictionary: ', dsd) 
print('DataSet: ', ds) # 
print('DataArray: ', da) # 
DataSet dictionary:  {'tas': <xarray.Dataset> Size: 438MB
Dimensions:    (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
  * time       (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
    tas        (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
    time_bnds  (time, nbnd) object 32kB ...
    lat_bnds   (lat, nbnd) float32 2kB ...
    lon_bnds   (lon, nbnd) float32 2kB ...
    areacella  (lat, lon) float32 221kB ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    case_id:                15
    cesm_casename:          b.e21.BHIST.f09_g17.CMIP6-historical.001
    contact:                cesm_cmip6@ucar.edu
    creation_date:          2019-01-16T23:34:05Z
    ...                     ...
    branch_time_in_parent:  219000.0
    branch_time_in_child:   674885.0
    branch_method:          standard
    further_info_url:       https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
    activity_drs:           CMIP
    member_id:              r1i1p1f1}
DataSet:  <xarray.Dataset> Size: 438MB
Dimensions:    (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
  * time       (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
    tas        (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
    time_bnds  (time, nbnd) object 32kB ...
    lat_bnds   (lat, nbnd) float32 2kB ...
    lon_bnds   (lon, nbnd) float32 2kB ...
    areacella  (lat, lon) float32 221kB ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    case_id:                15
    cesm_casename:          b.e21.BHIST.f09_g17.CMIP6-historical.001
    contact:                cesm_cmip6@ucar.edu
    creation_date:          2019-01-16T23:34:05Z
    ...                     ...
    branch_time_in_parent:  219000.0
    branch_time_in_child:   674885.0
    branch_method:          standard
    further_info_url:       https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
    activity_drs:           CMIP
    member_id:              r1i1p1f1
DataArray:  <xarray.DataArray 'tas' (time: 1980, lat: 192, lon: 288)> Size: 438MB
array([[[245.32208, 245.32208, ..., 245.32208, 245.32208],
        [246.10596, 246.06238, ..., 246.15019, 246.12573],
        ...,
        [245.02821, 245.0406 , ..., 244.99951, 245.01454],
        [244.50035, 244.50319, ..., 244.49379, 244.49722]],

       [[232.51073, 232.51073, ..., 232.51073, 232.51073],
        [233.30011, 233.26118, ..., 233.32066, 233.31026],
        ...,
        [244.68976, 244.70775, ..., 244.64677, 244.6693 ],
        [243.6899 , 243.6928 , ..., 243.68317, 243.68669]],

       ...,

       [[234.63194, 234.63194, ..., 234.63194, 234.63194],
        [235.37543, 235.35039, ..., 235.38136, 235.37898],
        ...,
        [256.5771 , 256.58975, ..., 256.5506 , 256.56418],
        [256.69495, 256.69467, ..., 256.69556, 256.69522]],

       [[246.79817, 246.79817, ..., 246.79817, 246.79817],
        [247.46426, 247.42882, ..., 247.48152, 247.47386],
        ...,
        [244.81926, 244.83385, ..., 244.78955, 244.80447],
        [245.01997, 245.01904, ..., 245.02213, 245.021  ]]], dtype=float32)
Coordinates:
  * lat      (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon      (lon) float64 2kB 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time     (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Attributes: (12/19)
    cell_measures:  area: areacella
    cell_methods:   area: time: mean
    comment:        near-surface (usually, 2 meter) air temperature
    description:    near-surface (usually, 2 meter) air temperature
    frequency:      mon
    id:             tas
    ...             ...
    time_label:     time-mean
    time_title:     Temporal mean
    title:          Near-Surface Air Temperature
    type:           real
    units:          K
    variable_id:    tas

We create a simple map showing the time-averaged surface air temperature.

  • Calculate temporal means

  • Convert from Kelvin to Celsius

  • Create map

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 4), tight_layout=True)
ds = dsd["tas"]["tas"].mean(dim="time") - 273.15  # to [C]
ds.plot(ax=ax, cmap="bwr", vmin=-40, vmax=40, 
        extend='both',cbar_kwargs={"label": "tas [C]"})
<matplotlib.collections.QuadMesh at 0x7fc8d2b19a90>
../../_images/073f188da6c83f3fd0998910abbe659ccb6dbb34eaa43535ba0aa2653a4a4574.png

Troubleshooting: When searches return no results#

Sometimes a search returns no results even when the data exists. This often happens because:

  1. Limited search scope: By default, intake_esgf only searches a subset of ESGF nodes for performance

  2. Data on different nodes: The data might be available on a different ESGF node than the default ones

For example, the code below shows a search for G6sulfur experiment data that will likely fail with the default configuration even though the data is available on CEDA ESGF node.

cat.search(
    project='CMIP6',
    experiment_id='G6sulfur',
    source_id='UKESM1-0-LL',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label=['r1i1p1f2']  # ensemble member
)
---------------------------------------------------------------------------
NoSearchResults                           Traceback (most recent call last)
Cell In[3], line 1
----> 1 cat.search(
      2     project='CMIP6',
      3     experiment_id='G6sulfur',
      4     source_id='UKESM1-0-LL',
      5     variable_id='tas',  # surface air temperature
      6     table_id='Amon',    # monthly atmospheric data
      7     variant_label=['r1i1p1f2']  # ensemble member
      8 )

File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/catalog.py:336, in ESGFCatalog.search(self, quiet, **search)
    334 search_time = time.time()
    335 dfs = ThreadPool(len(self.indices)).imap_unordered(_search, self.indices)
--> 336 self.df = base.combine_results(
    337     tqdm(
    338         dfs,
    339         disable=quiet,
    340         bar_format=base.bar_format,
    341         unit="index",
    342         unit_scale=False,
    343         desc="Searching indices",
    344         ascii=False,
    345         total=len(self.indices),
    346     )
    347 )
    348 self._set_project()
    350 # even though we are using latest=True, because the search is distributed, we
    351 # may have different versions from different indices.

File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/base.py:213, in combine_results(dfs)
    211 if len(df) == 0:
    212     logger.info("\x1b[36;32msearch end \x1b[91;20mno results\033[0m")
--> 213     raise NoSearchResults()
    214 # retrieve project information about how to combine results
    215 project_id = df["project"].unique()

NoSearchResults: 

Solution: Widening search criteria#

To search across all available ESGF nodes, we need to:

  1. Set all_indices=True in the intake_esgf configuration

  2. Create a new catalog instance with the updated configuration

This ensures we search all available data sources, though it may take longer to complete.

## run the following line to widen search criteria to include all ESGF nodes
intake_esgf.conf.set(all_indices=True)
cat = ESGFCatalog()


cat.search(
    project='CMIP6',
    experiment_id='G6sulfur',
    source_id='UKESM1-0-LL',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label=['r1i1p1f2']  # ensemble member
)
Summary information for 1 results:
mip_era                 [CMIP6]
activity_drs           [GeoMIP]
institution_id           [MOHC]
source_id         [UKESM1-0-LL]
experiment_id        [G6sulfur]
member_id            [r1i1p1f2]
table_id                 [Amon]
variable_id               [tas]
grid_label                 [gn]
dtype: object