Data API#

This tutorial is separated into three main parts: the first two parts shows how to find and get data to do impact calculations and should be enough for most users. The third part provides more detailed information on how the API is built.

Contents#

Finding datasets#

from climada.util.api_client import Client
client = Client()

Data types and data type groups#

The datasets are first separated into ‘data_type_groups’, which represent the main classes of CLIMADA (exposures, hazard, vulnerability, …). So far, data is available for exposures and hazard. Then, data is separated into data_types, representing the different hazards and exposures available in CLIMADA

import pandas as pd
data_types = client.list_data_type_infos()

dtf = pd.DataFrame(data_types)
dtf.sort_values(['data_type_group', 'data_type'])
data_type data_type_group status description properties
3 crop_production exposures active None [{'property': 'crop', 'mandatory': True, 'desc...
0 litpop exposures active None [{'property': 'res_arcsec', 'mandatory': False...
5 centroids hazard active None []
2 river_flood hazard active None [{'property': 'res_arcsec', 'mandatory': False...
4 storm_europe hazard active None [{'property': 'country_iso3alpha', 'mandatory'...
1 tropical_cyclone hazard active None [{'property': 'res_arcsec', 'mandatory': True,...

Datasets and Properties#

For each data type, the single datasets can be differentiated based on properties. The following function provides a table listing the properties and possible values. This table does not provide information on properties that can be combined but the search can be refined in order to find properties to query a unique dataset. Note that a maximum of 10 property values are shown here, but many more countries are available for example.

litpop_dataset_infos = client.list_dataset_infos(data_type='litpop')
all_properties = client.get_property_values(litpop_dataset_infos)
all_properties.keys()
dict_keys(['res_arcsec', 'exponents', 'fin_mode', 'spatial_coverage', 'country_iso3alpha', 'country_name', 'country_iso3num'])

Basic impact calculation#

We here show how to make a basic impact calculation with tropical cyclones for Haiti, for the year 2040, rcp4.5 and generated with 10 synthetic tracks. For more technical details on the API, see below.

Wrapper functions to open datasets as CLIMADA objects#

The wrapper functions client.get_hazard()#

gets the dataset information, downloads the data and opens it as a hazard instance

tc_dataset_infos = client.list_dataset_infos(data_type='tropical_cyclone')
client.get_property_values(tc_dataset_infos, known_property_values = {'country_name':'Haiti'})
{'res_arcsec': ['150'],
 'climate_scenario': ['rcp26', 'rcp45', 'rcp85', 'historical', 'rcp60'],
 'ref_year': ['2040', '2060', '2080'],
 'nb_synth_tracks': ['50', '10'],
 'spatial_coverage': ['country'],
 'tracks_year_range': ['1980_2020'],
 'country_iso3alpha': ['HTI'],
 'country_name': ['Haiti'],
 'country_iso3num': ['332'],
 'resolution': ['150 arcsec']}
client = Client()
tc_haiti = client.get_hazard('tropical_cyclone', properties={'country_name': 'Haiti', 'climate_scenario': 'rcp45', 'ref_year':'2040', 'nb_synth_tracks':'10'})
tc_haiti.plot_intensity(0);
https://climada.ethz.ch/data-api/v1/dataset	climate_scenario=rcp45	country_name=Haiti	data_type=tropical_cyclone	limit=100000	name=None	nb_synth_tracks=10	ref_year=2040	status=active	version=None
2022-07-01 15:55:23,593 - climada.util.api_client - WARNING - Download failed: /Users/szelie/climada/data/hazard/tropical_cyclone/tropical_cyclone_10synth_tracks_150arcsec_rcp45_HTI_2040/v1/tropical_cyclone_10synth_tracks_150arcsec_rcp45_HTI_2040.hdf5 has the wrong size:8189651 instead of 7781902, retrying...
2022-07-01 15:55:26,786 - climada.hazard.base - INFO - Reading /Users/szelie/climada/data/hazard/tropical_cyclone/tropical_cyclone_10synth_tracks_150arcsec_rcp45_HTI_2040/v1/tropical_cyclone_10synth_tracks_150arcsec_rcp45_HTI_2040.hdf5
2022-07-01 15:55:27,129 - climada.util.plot - WARNING - Error parsing coordinate system 'GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],CS[ellipsoidal,2],AXIS["geodetic latitude (Lat)",north,ORDER[1],ANGLEUNIT["degree",0.0174532925199433]],AXIS["geodetic longitude (Lon)",east,ORDER[2],ANGLEUNIT["degree",0.0174532925199433]],USAGE[SCOPE["Horizontal component of 3D system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]'. Using projection PlateCarree in plot.
../_images/6aa0195d0bef0381e4309e02eef93d4f554ce2b50625235f5ca7f0302c32bc15.png

The wrapper functions client.get_litpop()#

gets the default litpop, with exponents (1,1) and ‘produced capital’ as financial mode. If no country is given, the global dataset will be downloaded.

litpop_default = client.get_property_values(litpop_dataset_infos, known_property_values = {'fin_mode':'pc', 'exponents':'(1,1)'})
litpop = client.get_litpop(country='Haiti')
https://climada.ethz.ch/data-api/v1/dataset	country_name=Haiti	data_type=litpop	exponents=(1,1)	limit=100000	name=None	status=active	version=None
2022-07-01 15:55:31,047 - climada.entity.exposures.base - INFO - Reading /Users/szelie/climada/data/exposures/litpop/LitPop_150arcsec_HTI/v1/LitPop_150arcsec_HTI.hdf5

Get the default impact function for tropical cyclones#

from climada.entity.impact_funcs import ImpactFuncSet, ImpfTropCyclone

imp_fun = ImpfTropCyclone.from_emanuel_usa()
imp_fun.check()
imp_fun.plot()

imp_fun_set = ImpactFuncSet([imp_fun])

litpop.impact_funcs = imp_fun_set
2022-01-31 22:30:21,359 - climada.entity.impact_funcs.base - WARNING - For intensity = 0, mdd != 0 or paa != 0. Consider shifting the origin of the intensity scale. In impact.calc the impact is always null at intensity = 0.
../_images/dd389960244ddd934ca8e11a7770fc3c0c621da7f84641ff6cb2e9f8b7cd5171.png

Calculate the impact#

from climada.engine import ImpactCalc
impact = ImpactCalc(litpop, imp_fun_set, tc_haiti).impact()

Getting other Exposures#

crop_dataset_infos = client.list_dataset_infos(data_type='crop_production')

client.get_property_values(crop_dataset_infos)
{'crop': ['whe', 'soy', 'ric', 'mai'],
 'irrigation_status': ['noirr', 'firr'],
 'unit': ['USD', 'Tonnes'],
 'spatial_coverage': ['global']}
rice_exposure = client.get_exposures(exposures_type='crop_production', properties = {'crop':'ric', 'unit': 'USD','irrigation_status': 'noirr'})

Getting base centroids to generate new hazard files#

centroids = client.get_centroids()
centroids.plot()
https://climada.ethz.ch/data-api/v1/dataset	data_type=centroids	extent=(-180, 180, -90, 90)	limit=100000	name=None	res_arcsec_land=150	res_arcsec_ocean=1800	status=active	version=None
2022-07-01 15:59:42,013 - climada.hazard.centroids.centr - INFO - Reading /Users/szelie/climada/data/centroids/earth_centroids_150asland_1800asoceans_distcoast_regions/v1/earth_centroids_150asland_1800asoceans_distcoast_region.hdf5
2022-07-01 15:59:44,273 - climada.util.plot - WARNING - Error parsing coordinate system 'GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],CS[ellipsoidal,2],AXIS["geodetic latitude (Lat)",north,ORDER[1],ANGLEUNIT["degree",0.0174532925199433]],AXIS["geodetic longitude (Lon)",east,ORDER[2],ANGLEUNIT["degree",0.0174532925199433]],USAGE[SCOPE["Horizontal component of 3D system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]'. Using projection PlateCarree in plot.
<GeoAxesSubplot:>
../_images/dd630ded7703c1a3143773ed774a6a149cf7631359482022532d96c12ceb302e.png

For many hazards, limiting the latitude extent to [-60,60] is sufficient and will reduce the computational ressources required

centroids_nopoles = client.get_centroids(extent=[-180,180,-60,50])
centroids_nopoles.plot()
https://climada.ethz.ch/data-api/v1/dataset	data_type=centroids	extent=(-180, 180, -90, 90)	limit=100000	name=None	res_arcsec_land=150	res_arcsec_ocean=1800	status=active	version=None
2022-07-01 15:59:27,602 - climada.hazard.centroids.centr - INFO - Reading /Users/szelie/climada/data/centroids/earth_centroids_150asland_1800asoceans_distcoast_regions/v1/earth_centroids_150asland_1800asoceans_distcoast_region.hdf5
2022-07-01 15:59:29,255 - climada.util.plot - WARNING - Error parsing coordinate system 'GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble",MEMBER["World Geodetic System 1984 (Transit)"],MEMBER["World Geodetic System 1984 (G730)"],MEMBER["World Geodetic System 1984 (G873)"],MEMBER["World Geodetic System 1984 (G1150)"],MEMBER["World Geodetic System 1984 (G1674)"],MEMBER["World Geodetic System 1984 (G1762)"],MEMBER["World Geodetic System 1984 (G2139)"],ELLIPSOID["WGS 84",6378137,298.257223563,LENGTHUNIT["metre",1]],ENSEMBLEACCURACY[2.0]],PRIMEM["Greenwich",0,ANGLEUNIT["degree",0.0174532925199433]],CS[ellipsoidal,2],AXIS["geodetic latitude (Lat)",north,ORDER[1],ANGLEUNIT["degree",0.0174532925199433]],AXIS["geodetic longitude (Lon)",east,ORDER[2],ANGLEUNIT["degree",0.0174532925199433]],USAGE[SCOPE["Horizontal component of 3D system."],AREA["World."],BBOX[-90,-180,90,180]],ID["EPSG",4326]]'. Using projection PlateCarree in plot.
<GeoAxesSubplot:>
../_images/3d83dd80297cbd20dae946014bf4838ef3974ae7f5713f0f1f06ef159f1a6b9e.png

centroids are also available per country:

centroids_hti = client.get_centroids(country='HTI')
https://climada.ethz.ch/data-api/v1/dataset	data_type=centroids	extent=(-180, 180, -90, 90)	limit=100000	name=None	res_arcsec_land=150	res_arcsec_ocean=1800	status=active	version=None
2022-07-01 16:01:24,328 - climada.hazard.centroids.centr - INFO - Reading /Users/szelie/climada/data/centroids/earth_centroids_150asland_1800asoceans_distcoast_regions/v1/earth_centroids_150asland_1800asoceans_distcoast_region.hdf5

Technical Information#

For programmatical access to the CLIMADA data API there is a specific REST call wrapper class: climada.util.client.Client.

Server#

The CLIMADA data file server is hosted on https://data.iac.ethz.ch that can be accessed via a REST API at https://climada.ethz.ch. For REST API details, see the documentation.

Client#

Client?
Init signature: Client()
Docstring:     
Python wrapper around REST calls to the CLIMADA data API server.
    
Init docstring:
Constructor of Client.

Data API host and chunk_size (for download) are configurable values.
Default values are 'climada.ethz.ch' and 8096 respectively.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:     
client = Client()
client.chunk_size
8192

The url to the API server and the chunk size for the file download can be configured in ‘climada.conf’. Just replace the corresponding default values:

    "data_api": {
        "host": "https://climada.ethz.ch",
        "chunk_size": 8192,
        "cache_db": "{local_data.system}/.downloads.db"
    }

The other configuration value affecting the data_api client, cache_db, is the path to an SQLite database file, which is keeping track of the files that are successfully downloaded from the api server. Before the Client attempts to download any file from the server, it checks whether the file has been downloaded before and if so, whether the previously downloaded file still looks good (i.e., size and time stamp are as expected). If all of this is the case, the file is simply read from disk without submitting another request.

Metadata#

Unique Identifiers#

Any dataset can be identified with data_type, name and version. The combination of the three is unique in the API servers’ underlying database. However, sometimes the name is already enough for identification. All datasets have a UUID, a universally unique identifier, which is part of their individual url. E.g., the uuid of the dataset https://climada.ethz.ch/rest/dataset/b1c76120-4e60-4d8f-99c0-7e1e7b7860ec is “b1c76120-4e60-4d8f-99c0-7e1e7b7860ec”. One can retrieve their meta data by:

client.get_dataset_info_by_uuid('b1c76120-4e60-4d8f-99c0-7e1e7b7860ec')
DatasetInfo(uuid='b1c76120-4e60-4d8f-99c0-7e1e7b7860ec', data_type=DataTypeShortInfo(data_type='litpop', data_type_group='exposures'), name='LitPop_assets_pc_150arcsec_SGS', version='v1', status='active', properties={'res_arcsec': '150', 'exponents': '(3,0)', 'fin_mode': 'pc', 'spatial_coverage': 'country', 'date_creation': '2021-09-23', 'climada_version': 'v2.2.0', 'country_iso3alpha': 'SGS', 'country_name': 'South Georgia and the South Sandwich Islands', 'country_iso3num': '239'}, files=[FileInfo(uuid='b1c76120-4e60-4d8f-99c0-7e1e7b7860ec', url='https://data.iac.ethz.ch/climada/b1c76120-4e60-4d8f-99c0-7e1e7b7860ec/LitPop_assets_pc_150arcsec_SGS.hdf5', file_name='LitPop_assets_pc_150arcsec_SGS.hdf5', file_format='hdf5', file_size=1086488, check_sum='md5:27bc1846362227350495e3d946dfad5e')], doi=None, description="LitPop asset value exposure per country: Gridded physical asset values by country, at a resolution of 150 arcsec. Values are total produced capital values disaggregated proportionally to the cube of nightlight intensity (Lit^3, based on NASA Earth at Night). The following values were used as parameters in the LitPop.from_countries() method:{'total_values': 'None', 'admin1_calc': 'False','reference_year': '2018', 'gpw_version': '4.11'}Reference: Eberenz et al., 2020. https://doi.org/10.5194/essd-12-817-2020", license='Attribution 4.0 International (CC BY 4.0)', activation_date='2021-09-13 09:08:28.358559+00:00', expiration_date=None)

or by filtering:

Data Set Status#

The datasets of climada.ethz.ch may have the following stati:

  • active: the default for real life data

  • preliminary: when the dataset is already uploaded but some information or file is still missing

  • expired: when a dataset is inactivated again

  • test_dataset: data sets that are used in unit or integration tests have this status in order to be taken seriously by accident When collecting a list of datasets with get_datasets, the default dataset status will be ‘active’. With the argument status=None this filter can be turned off.

DatasetInfo Objects and DataFrames#

As stated above get_dataset (or get_dataset_by_uuid) return a DatasetInfo object and get_datasets a list thereof.

from climada.util.api_client import DatasetInfo
DatasetInfo?
Init signature:
DatasetInfo(
    uuid: str,
    data_type: climada.util.api_client.DataTypeShortInfo,
    name: str,
    version: str,
    status: str,
    properties: dict,
    files: list,
    doi: str,
    description: str,
    license: str,
    activation_date: str,
    expiration_date: str,
) -> None
Docstring:      dataset data from CLIMADA data API.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:     

where files is a list of FileInfo objects:

from climada.util.api_client import FileInfo
FileInfo?
Init signature:
FileInfo(
    uuid: str,
    url: str,
    file_name: str,
    file_format: str,
    file_size: int,
    check_sum: str,
) -> None
Docstring:      file data from CLIMADA data API.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:     

Convert into DataFrame#

There are conveinience functions to easily convert datasets into pandas DataFrames, get_datasets and expand_files:

client.into_datasets_df?
Signature: client.into_datasets_df(dataset_infos)
Docstring:
Convenience function providing a DataFrame of datasets with properties.

Parameters
----------
dataset_infos : list of DatasetInfo
     as returned by list_dataset_infos

Returns
-------
pandas.DataFrame
    of datasets with properties as found in query by arguments
File:      c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:      function
from climada.util.api_client import Client
client = Client()
litpop_datasets = client.list_dataset_infos(data_type='litpop', properties={'country_name': 'South Georgia and the South Sandwich Islands'})
litpop_df = client.into_datasets_df(litpop_datasets)
litpop_df
data_type data_type_group uuid name version status doi description license activation_date expiration_date res_arcsec exponents fin_mode spatial_coverage date_creation climada_version country_iso3alpha country_name country_iso3num
0 litpop exposures b1c76120-4e60-4d8f-99c0-7e1e7b7860ec LitPop_assets_pc_150arcsec_SGS v1 active None LitPop asset value exposure per country: Gridd... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:08:28.358559+00:00 None 150 (3,0) pc country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239
1 litpop exposures 3d516897-5f87-46e6-b673-9e6c00d110ec LitPop_pop_150arcsec_SGS v1 active None LitPop population exposure per country: Gridde... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:09:10.634374+00:00 None 150 (0,1) pop country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239
2 litpop exposures a6864a65-36a2-4701-91bc-81b1355103b5 LitPop_150arcsec_SGS v1 active None LitPop asset value exposure per country: Gridd... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:09:30.907938+00:00 None 150 (1,1) pc country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239

Download#

The wrapper functions get_exposures or get_hazard fetch the information, download the file and opens the file as a climada object. But one can also just download dataset files using the method download_dataset which takes a DatasetInfo object as argument and downloads all files of the dataset to a directory in the local file system.

client.download_dataset?
Signature:
client.download_dataset(
    dataset,
    target_dir=WindowsPath('C:/Users/me/climada/data'),
    organize_path=True,
)
Docstring:
Download all files from a given dataset to a given directory.

Parameters
----------
dataset : DatasetInfo
    the dataset
target_dir : Path, optional
    target directory for download, by default `climada.util.constants.SYSTEM_DIR`
organize_path: bool, optional
    if set to True the files will end up in subdirectories of target_dir:
    [target_dir]/[data_type_group]/[data_type]/[name]/[version]
    by default True

Returns
-------
download_dir : Path
    the path to the directory containing the downloaded files,
    will be created if organize_path is True
downloaded_files : list of Path
    the downloaded files themselves

Raises
------
Exception
    when one of the files cannot be downloaded
File:      c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:      method

Cache#

The method avoids superfluous downloads by keeping track of all downloads in a sqlite db file. The client will make sure that the same file is never downloaded to the same target twice.

Examples#

# Let's have a look at an example for downloading a litpop dataset first
ds = litpop_datasets[0]  # litpop_datasets is a list and download_dataset expects a single object as argument.
download_dir, ds_files = client.download_dataset(ds)
ds_files[0], ds_files[0].is_file()
(WindowsPath('C:/Users/me/climada/data/exposures/litpop/LitPop_assets_pc_150arcsec_SGS/v1/LitPop_assets_pc_150arcsec_SGS.hdf5'),
 True)
# Another example for downloading a hazard (tropical cyclone) dataset
ds_tc = tc_dataset_infos[0] 
download_dir, ds_files = client.download_dataset(ds_tc)
ds_files[0], ds_files[0].is_file()
(PosixPath('/home/yuyue/climada/data/hazard/tropical_cyclone/tropical_cyclone_50synth_tracks_150arcsec_rcp26_BRA_2040/v1/tropical_cyclone_50synth_tracks_150arcsec_rcp26_BRA_2040.hdf5'),
 True)

If the dataset contains only one file (which is most commonly the case) this file can also be downloaded and accessed in a single step, using the get_dataset_file method:

from climada.util.api_client import Client
Client().get_dataset_file(
    data_type='litpop',
    properties={'country_name': 'South Georgia and the South Sandwich Islands', 'fin_mode': 'pop'})
WindowsPath('C:/Users/me/climada/data/exposures/litpop/LitPop_pop_150arcsec_SGS/v1/LitPop_pop_150arcsec_SGS.hdf5')

Local File Cache#

By default, the API Client downloads files into the ~/climada/data directory.

In the course of time obsolete files may be accumulated within this directory, because there is a newer version of these files available from the CLIMADA data API, or because the according dataset got expired altogether.
To prevent file rot and free disk space, it’s possible to remove all outdated files at once, by simply calling Client().purge_cache(). This will remove all files that were ever downloaded with the api_client.Client and for which a newer version exists, even when the newer version has not been downloaded yet.

Offline Mode#

The API Client is silently used in many methods and functions of CLIMADA, including the installation test that is run to see whether the CLIMADA installation was successful. Most methods of the client send GET requests to the API server assuming the latter is accessible through a working internet connection. If this is not the case, the functionality of CLIMADA is severely limited if not altogether lost. Often this is an unnecessary restriction, e.g., when a user wants to access a file through the API Client that is already downloaded and available in the local filesystem.

In such cases the API Client runs in offline mode. In this mode the client falls back to previous results for the same call in case there is no internet connection or the server is not accessible.

To turn this feature off and make sure that all results are current and up to date - at the cost of failing when there is no internet connection - one has to disable tha cache. This can be done programmatically, by initializing the API Client with the optional argument cache_enabled:

client = Client(cache_enabled=False)

Or it can be done through configuration. Edit the climada.conf file in the working directory or in ~/climada/ and change the “cache_enabled” value, like this:

...
    "data_api": {
        ...
        "cache_enabled": false
    },
...

While cache_enabled is true (default), every result from the server is stored as a json file in ~/climada/data/.apicache/ by a unique name derived from the method and arguments of the call. If the very same call is made again later, at a time where the server is not accessible, the client just comes back to the cached result from the previous call.