Loading data directly from ESGF nodes

Loading data directly from ESGF nodes#

  • This notebook demonstrates searching for and loading CMIP simulations from ESGF, using the intake_esgf package.

  • This code is taken from the intake-esgf tutorial. Please refer there for a fuller explanation of each step.

  • The data search uses the standard CMIP directory structure, see here for an explanation. We make use of xarray; see this turorial for an introduction.

# on starting the server, we need to run the line below once, as intake_esgf is not yet in our standard pangeo environment
%pip install intake_esgf
Collecting intake_esgf
  Downloading intake_esgf-2024.12.7-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: pandas in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2.2.3)
Requirement already satisfied: dask in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2025.1.0)
Requirement already satisfied: xarray in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2025.1.1)
Requirement already satisfied: netCDF4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (1.7.2)
Collecting globus-sdk (from intake_esgf)
  Downloading globus_sdk-3.55.0-py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: requests in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (2.32.3)
Requirement already satisfied: tqdm[notebook] in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (4.67.1)
Requirement already satisfied: pyyaml in /srv/conda/envs/notebook/lib/python3.12/site-packages (from intake_esgf) (6.0.2)
Requirement already satisfied: click>=8.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (8.1.8)
Requirement already satisfied: cloudpickle>=3.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (3.1.1)
Requirement already satisfied: fsspec>=2021.09.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (2024.12.0)
Requirement already satisfied: packaging>=20.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (24.2)
Requirement already satisfied: partd>=1.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (1.4.2)
Requirement already satisfied: toolz>=0.10.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask->intake_esgf) (1.0.0)
Requirement already satisfied: pyjwt<3.0.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pyjwt[crypto]<3.0.0,>=2.0.0->globus-sdk->intake_esgf) (2.10.1)
Requirement already satisfied: cryptography!=3.4.0,>=3.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from globus-sdk->intake_esgf) (43.0.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->intake_esgf) (2024.12.14)
Requirement already satisfied: cftime in /srv/conda/envs/notebook/lib/python3.12/site-packages (from netCDF4->intake_esgf) (1.6.4)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.12/site-packages (from netCDF4->intake_esgf) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas->intake_esgf) (2025.1)
Requirement already satisfied: ipywidgets>=6 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from tqdm[notebook]->intake_esgf) (8.1.5)
Requirement already satisfied: cffi>=1.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cryptography!=3.4.0,>=3.3.1->globus-sdk->intake_esgf) (1.17.1)
Requirement already satisfied: comm>=0.1.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.2)
Requirement already satisfied: ipython>=6.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (8.17.2)
Requirement already satisfied: traitlets>=4.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (5.14.3)
Requirement already satisfied: widgetsnbextension~=4.0.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (4.0.13)
Requirement already satisfied: jupyterlab_widgets~=3.0.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.13)
Requirement already satisfied: locket in /srv/conda/envs/notebook/lib/python3.12/site-packages (from partd>=1.4.0->dask->intake_esgf) (1.0.0)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->intake_esgf) (1.17.0)
Requirement already satisfied: pycparser in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cffi>=1.12->cryptography!=3.4.0,>=3.3.1->globus-sdk->intake_esgf) (2.22)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.19.2)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.1.7)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.38)
Requirement already satisfied: pygments>=2.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (2.19.1)
Requirement already satisfied: stack-data in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.6.3)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (4.9.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.8.4)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.12/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.13)
Requirement already satisfied: executing>=1.2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (2.1.0)
Requirement already satisfied: asttokens>=2.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (3.0.0)
Requirement already satisfied: pure_eval in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack-data->ipython>=6.1.0->ipywidgets>=6->tqdm[notebook]->intake_esgf) (0.2.3)
Downloading intake_esgf-2024.12.7-py3-none-any.whl (36 kB)
Downloading globus_sdk-3.55.0-py3-none-any.whl (408 kB)
Installing collected packages: globus-sdk, intake_esgf
Successfully installed globus-sdk-3.55.0 intake_esgf-2024.12.7
Note: you may need to restart the kernel to use updated packages.
import intake_esgf
cat = ESGFCatalog()
cat.variable_info("temperature air surface")
cf_standard_name variable_units variable_long_name
variable_id
hfls surface_upward_latent_heat_flux W m-2 Surface Upward Latent Heat Flux
hfss surface_upward_sensible_heat_flux W m-2 Surface Upward Sensible Heat Flux
rlds surface_downwelling_longwave_flux_in_air W m-2 Surface Downwelling Longwave Radiation
rsds surface_downwelling_shortwave_flux_in_air W m-2 Surface Downwelling Shortwave Radiation
sfcWind wind_speed m s-1 Near-Surface Wind Speed
ta air_temperature K Air Temperature
tas air_temperature K Near-Surface Air Temperature
tasmax air_temperature K Daily Maximum Near-Surface Air Temperature
tasmin air_temperature K Daily Minimum Near-Surface Air Temperature
vas northward_wind m s-1 Northward Near-Surface Wind
cat.search(variable_id="tas", experiment_id="historical")
Summary information for 1687 results:
mip_era                                                     [CMIP6]
activity_drs                                                 [CMIP]
institution_id    [IPSL, NASA-GISS, NCAR, MPI-M, MRI, CNRM-CERFA...
source_id         [IPSL-CM6A-LR, GISS-E2-1-H, CESM2, GISS-E2-1-G...
experiment_id                                          [historical]
member_id         [r9i1p1f1, r7i1p1f1, r4i1p1f1, r27i1p1f1, r5i1...
table_id          [ImonGre, ImonAnt, Amon, day, 3hr, 6hrPlevPt, ...
variable_id                                                   [tas]
grid_label                             [grg, gra, gr, gn, gr1, gr2]
dtype: object
cat.search(
    project='CMIP6',
    experiment_id='historical',
    source_id='CESM2',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label='r1i1p1f1'  # ensemble member
)
Summary information for 1 results:
mip_era                [CMIP6]
activity_drs            [CMIP]
institution_id          [NCAR]
source_id              [CESM2]
experiment_id     [historical]
member_id           [r1i1p1f1]
table_id                [Amon]
variable_id              [tas]
grid_label                [gn]
dtype: object
dsd = cat.to_dataset_dict() # dsd is a dictionary of xarray datasets
Downloading 243.0 [Mb]...
ds = dsd['tas'] # DataSet: subsetting the dictionary on the variable name gives the xarray DataSet containing the tas data
da = dsd['tas']['tas'] # DataArray: selecting the variable tas on the DataSet gives the xarray DataArray of tas data
print('DataSet dictionary: ', dsd) 
print('DataSet: ', ds) # 
print('DataArray: ', da) # 
DataSet dictionary:  {'tas': <xarray.Dataset> Size: 438MB
Dimensions:    (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
  * time       (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
    tas        (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
    time_bnds  (time, nbnd) object 32kB ...
    lat_bnds   (lat, nbnd) float32 2kB ...
    lon_bnds   (lon, nbnd) float32 2kB ...
    areacella  (lat, lon) float32 221kB ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    case_id:                15
    cesm_casename:          b.e21.BHIST.f09_g17.CMIP6-historical.001
    contact:                cesm_cmip6@ucar.edu
    creation_date:          2019-01-16T23:34:05Z
    ...                     ...
    branch_time_in_parent:  219000.0
    branch_time_in_child:   674885.0
    branch_method:          standard
    further_info_url:       https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
    activity_drs:           CMIP
    member_id:              r1i1p1f1}
DataSet:  <xarray.Dataset> Size: 438MB
Dimensions:    (time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
  * time       (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
    tas        (time, lat, lon) float32 438MB 245.3 245.3 245.3 ... 245.0 245.0
    time_bnds  (time, nbnd) object 32kB ...
    lat_bnds   (lat, nbnd) float32 2kB ...
    lon_bnds   (lon, nbnd) float32 2kB ...
    areacella  (lat, lon) float32 221kB ...
Attributes: (12/47)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    case_id:                15
    cesm_casename:          b.e21.BHIST.f09_g17.CMIP6-historical.001
    contact:                cesm_cmip6@ucar.edu
    creation_date:          2019-01-16T23:34:05Z
    ...                     ...
    branch_time_in_parent:  219000.0
    branch_time_in_child:   674885.0
    branch_method:          standard
    further_info_url:       https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
    activity_drs:           CMIP
    member_id:              r1i1p1f1
DataArray:  <xarray.DataArray 'tas' (time: 1980, lat: 192, lon: 288)> Size: 438MB
array([[[245.32208, 245.32208, ..., 245.32208, 245.32208],
        [246.10596, 246.06238, ..., 246.15019, 246.12573],
        ...,
        [245.02821, 245.0406 , ..., 244.99951, 245.01454],
        [244.50035, 244.50319, ..., 244.49379, 244.49722]],

       [[232.51073, 232.51073, ..., 232.51073, 232.51073],
        [233.30011, 233.26118, ..., 233.32066, 233.31026],
        ...,
        [244.68976, 244.70775, ..., 244.64677, 244.6693 ],
        [243.6899 , 243.6928 , ..., 243.68317, 243.68669]],

       ...,

       [[234.63194, 234.63194, ..., 234.63194, 234.63194],
        [235.37543, 235.35039, ..., 235.38136, 235.37898],
        ...,
        [256.5771 , 256.58975, ..., 256.5506 , 256.56418],
        [256.69495, 256.69467, ..., 256.69556, 256.69522]],

       [[246.79817, 246.79817, ..., 246.79817, 246.79817],
        [247.46426, 247.42882, ..., 247.48152, 247.47386],
        ...,
        [244.81926, 244.83385, ..., 244.78955, 244.80447],
        [245.01997, 245.01904, ..., 245.02213, 245.021  ]]], dtype=float32)
Coordinates:
  * lat      (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon      (lon) float64 2kB 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time     (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
Attributes: (12/19)
    cell_measures:  area: areacella
    cell_methods:   area: time: mean
    comment:        near-surface (usually, 2 meter) air temperature
    description:    near-surface (usually, 2 meter) air temperature
    frequency:      mon
    id:             tas
    ...             ...
    time_label:     time-mean
    time_title:     Temporal mean
    title:          Near-Surface Air Temperature
    type:           real
    units:          K
    variable_id:    tas
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 4), tight_layout=True)
ds = dsd["tas"]["tas"].mean(dim="time") - 273.15  # to [C]
ds.plot(ax=ax, cmap="bwr", vmin=-40, vmax=40, 
        extend='both',cbar_kwargs={"label": "tas [C]"})
<matplotlib.collections.QuadMesh at 0x7fc8d2b19a90>
../../_images/073f188da6c83f3fd0998910abbe659ccb6dbb34eaa43535ba0aa2653a4a4574.png

Widening search criteria if data aren’t found#

The default is for intake-esgf NOT to search over all possible sources of data (for performance reasons). But this means we may come back empty handed on a search when in fact the simulation data is available somewhere.

for example, the code below will throw a “NoSearchResults” errror even though this simulation is available on the CEDA ESGF node:

cat.search(
    project='CMIP6',
    experiment_id='G6sulfur',
    source_id='UKESM1-0-LL',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label=['r1i1p1f2']  # ensemble member
)
---------------------------------------------------------------------------
NoSearchResults                           Traceback (most recent call last)
Cell In[3], line 1
----> 1 cat.search(
      2     project='CMIP6',
      3     experiment_id='G6sulfur',
      4     source_id='UKESM1-0-LL',
      5     variable_id='tas',  # surface air temperature
      6     table_id='Amon',    # monthly atmospheric data
      7     variant_label=['r1i1p1f2']  # ensemble member
      8 )

File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/catalog.py:336, in ESGFCatalog.search(self, quiet, **search)
    334 search_time = time.time()
    335 dfs = ThreadPool(len(self.indices)).imap_unordered(_search, self.indices)
--> 336 self.df = base.combine_results(
    337     tqdm(
    338         dfs,
    339         disable=quiet,
    340         bar_format=base.bar_format,
    341         unit="index",
    342         unit_scale=False,
    343         desc="Searching indices",
    344         ascii=False,
    345         total=len(self.indices),
    346     )
    347 )
    348 self._set_project()
    350 # even though we are using latest=True, because the search is distributed, we
    351 # may have different versions from different indices.

File /srv/conda/envs/notebook/lib/python3.12/site-packages/intake_esgf/base.py:213, in combine_results(dfs)
    211 if len(df) == 0:
    212     logger.info("\x1b[36;32msearch end \x1b[91;20mno results\033[0m")
--> 213     raise NoSearchResults()
    214 # retrieve project information about how to combine results
    215 project_id = df["project"].unique()

NoSearchResults: 

The solution is to run the following two lines, which turn on all ESGF nodes for the search:

## run the following line to widen search criteria to include all ESGF nodes
intake_esgf.conf.set(all_indices=True)
cat = ESGFCatalog()


cat.search(
    project='CMIP6',
    experiment_id='G6sulfur',
    source_id='UKESM1-0-LL',
    variable_id='tas',  # surface air temperature
    table_id='Amon',    # monthly atmospheric data
    variant_label=['r1i1p1f2']  # ensemble member
)
Summary information for 1 results:
mip_era                 [CMIP6]
activity_drs           [GeoMIP]
institution_id           [MOHC]
source_id         [UKESM1-0-LL]
experiment_id        [G6sulfur]
member_id            [r1i1p1f2]
table_id                 [Amon]
variable_id               [tas]
grid_label                 [gn]
dtype: object