Working with the GeoMIP (and CMIP6) cloud-store

Contents

Working with the GeoMIP (and CMIP6) cloud-store#

This notebook demonstrates usage of the Google Cloud Store hosted version of cloud-optimised (zarr) data for the G6sulfur GeoMIP scenario#

This is data which was ingested into the store from ESGF by Julius Busecke, supported by Reflective, in August 2025.

See here for an overview of how to work with this kind of data: https://pangeo-data.github.io/pangeo-cmip6-cloud/accessing_data.html#loading-an-esm-collection

And here for some more specific guidance leap-stc/cmip6-leap-feedstock

import intake # see https://intake-esm.readthedocs.io/en/stable/index.html

# uncomment/comment lines to swap catalogs; use the first, unless deliberately testing 
url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json" # Only stores that pass current tests
#url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_noqc.json" # Only stores that fail current tests
# url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_retracted.json" # Only stores that have been retracted by ESGF
col = intake.open_esm_datastore(url)

# form query dictionary
query = dict(experiment_id=['ssp245', 'ssp585', 'G6sulfur'],
             table_id='Amon',
             variable_id=['tas'],
             )

# the line above returns all data across these scenarios, but we really only want..
# ..models for which all three scenarios are available, so we subset:

# subset catalog and get some metrics grouped by 'source_id'
col_subset = col.search(require_all_on=['source_id', 'institution_id'], **query)
col_subset.df.groupby('source_id')[['experiment_id', 'variable_id', 'table_id']].nunique() # this line just shows us the data we have

	experiment_id	variable_id	table_id
source_id
CNRM-ESM2-1	3	1	1
IPSL-CM6A-LR	3	1	1
MPI-ESM1-2-LR	3	1	1
UKESM1-0-LL	3	1	1

from xmip.preprocessing import combined_preprocessing
ddict = col_subset.to_dataset_dict(preprocess=combined_preprocessing)

--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'

100.00% [12/12 00:07<00:00]

Example Usage#

We will recreate figure 1 from Visioni et al., 2021; https://acp.copernicus.org/articles/21/10039/2021/acp-21-10039-2021.html

# the xarray datasets themselves are stored in the dictionary ddict, which has keys in the format shown below:
ds = ddict['GeoMIP.MOHC.UKESM1-0-LL.G6sulfur.Amon.gn']
ds.mean(dim=['time', 'variant_label']).tas.plot()

<matplotlib.collections.QuadMesh at 0x7fe8883afbc0>

../../_images/676d7324d6e56c4f7299bd0143fa97642ffb1a4ffad49529d750066e2de804aa.png

## we can iterate over the returned datasets to plot across models, scenarios and ensemble members:

import matplotlib.pyplot as plt
import xarray as xr
import numpy as np

fig, axs = plt.subplots(2, 2, figsize=(8, 8), sharey=True, sharex=True)

models = list(set([key.split('.')[2] for key in ddict.keys()])) # get a list of the unique models in the ddict
ax_dict = dict(zip(models, range(len(models)))) # we are going to loop over the datasets, not the models, so define a dict to decide which ax to plot onto

color_dict = {'G6sulfur':'darkred', 'ssp585':'gray', 'ssp245':'seagreen'}
light_color_dict = {'G6sulfur':'lightcoral', 'ssp585':'lightgray', 'ssp245':'lightgreen'}

for key, ds in ddict.items():
    print(f"Processing: {key}")
    model, experiment = key.split('.')[2], key.split('.')[3]
    
    global_mean_tas = ds.sel(time=slice('2015', '2100'))['tas'].mean(dim=['y', 'x','sub_experiment_id'], skipna=True)
    annual_global_mean_tas = global_mean_tas.groupby(global_mean_tas.time.dt.year).mean() # strictly, I should weight by month length here, but I don't for this quick visualisation

    ens_mean = annual_global_mean_tas.mean('variant_label')
    ens_max = annual_global_mean_tas.max('variant_label')
    ens_min = annual_global_mean_tas.min('variant_label')
    
    ax = axs.flatten()[ax_dict[model]]
    ax.set_title(model)
    ax.plot(ens_mean.year.values, ens_mean.values, label=experiment, c=color_dict[experiment])
    ax.fill_between(ens_max.year.values, ens_min.values, ens_max.values, color=light_color_dict[experiment], alpha=0.5)
        
for ax in axs.flatten():
    ax.legend()
    
plt.show()

Processing: GeoMIP.CNRM-CERFACS.CNRM-ESM2-1.G6sulfur.Amon.gr
Processing: ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Amon.gr
Processing: ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp245.Amon.gn
Processing: ScenarioMIP.CNRM-CERFACS.CNRM-ESM2-1.ssp245.Amon.gr
Processing: GeoMIP.IPSL.IPSL-CM6A-LR.G6sulfur.Amon.gr
Processing: ScenarioMIP.CNRM-CERFACS.CNRM-ESM2-1.ssp585.Amon.gr
Processing: GeoMIP.MPI-M.MPI-ESM1-2-LR.G6sulfur.Amon.gn
Processing: GeoMIP.MOHC.UKESM1-0-LL.G6sulfur.Amon.gn
Processing: ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp245.Amon.gr
Processing: ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Amon.gn
Processing: ScenarioMIP.MOHC.UKESM1-0-LL.ssp585.Amon.gn
Processing: ScenarioMIP.MOHC.UKESM1-0-LL.ssp245.Amon.gn

../../_images/b45f56568830a3c00a20f7914aa0f15fc1d92df8d48bb8f9312595f933047e6f.png