Data API

This tutorial is separated into three main parts: the first two parts shows how to find and get data to do impact calculations and should be enough for most users. The third part provides more detailed information on how the API is built.

Contents

Finding datasets

[1]:
from climada.util.api_client import Client
client = Client()

Data types and data type groups

The datasets are first separated into ‘data_type_groups’, which represent the main classes of CLIMADA (exposures, hazard, vulnerability, …). So far, data is available for exposures and hazard. Then, data is separated into data_types, representing the different hazards and exposures available in CLIMADA

[2]:
import pandas as pd
data_types = client.list_data_type_infos()

dtf = pd.DataFrame(data_types)
dtf.sort_values(['data_type_group', 'data_type'])
[2]:
data_type data_type_group status description properties
3 crop_production exposures active None [{'property': 'crop', 'mandatory': True, 'desc...
0 litpop exposures active None [{'property': 'res_arcsec', 'mandatory': False...
5 centroids hazard active None []
2 river_flood hazard active None [{'property': 'res_arcsec', 'mandatory': False...
4 storm_europe hazard active None [{'property': 'country_iso3alpha', 'mandatory'...
1 tropical_cyclone hazard active None [{'property': 'res_arcsec', 'mandatory': True,...

Datasets and Properties

For each data type, the single datasets can be differentiated based on properties. The following function provides a table listing the properties and possible values. This table does not provide information on properties that can be combined but the search can be refined in order to find properties to query a unique dataset. Note that a maximum of 10 property values are shown here, but many more countries are available for example.

[3]:
litpop_dataset_infos = client.list_dataset_infos(data_type='litpop')
[4]:
all_properties = client.get_property_values(litpop_dataset_infos)
[5]:
all_properties.keys()
[5]:
dict_keys(['res_arcsec', 'exponents', 'fin_mode', 'spatial_coverage', 'country_iso3alpha', 'country_name', 'country_iso3num'])

Refining the search:

[6]:
# as datasets are usually available per country, chosing a country or global dataset reduces the options
# here we want to see which datasets are available for litpop globally:
client.get_property_values(litpop_dataset_infos, known_property_values = {'spatial_coverage':'global'})
[6]:
{'res_arcsec': ['150'],
 'exponents': ['(0,1)', '(1,1)', '(3,0)'],
 'fin_mode': ['pop', 'pc'],
 'spatial_coverage': ['global']}
[7]:
#and here for Switzerland:
client.get_property_values(litpop_dataset_infos, known_property_values = {'country_name':'Switzerland'})
[7]:
{'res_arcsec': ['150'],
 'exponents': ['(3,0)', '(0,1)', '(1,1)'],
 'fin_mode': ['pc', 'pop'],
 'spatial_coverage': ['country'],
 'country_iso3alpha': ['CHE'],
 'country_name': ['Switzerland'],
 'country_iso3num': ['756']}

Basic impact calculation

We here show how to make a basic impact calculation with tropical cyclones for Haiti, for the year 2040, rcp4.5 and generated with 10 synthetic tracks. For more technical details on the API, see below.

Wrapper functions to open datasets as CLIMADA objects

The wrapper functions client.get_hazard()

gets the dataset information, downloads the data and opens it as a hazard instance

[8]:
tc_dataset_infos = client.list_dataset_infos(data_type='tropical_cyclone')
client.get_property_values(tc_dataset_infos, known_property_values = {'country_name':'Haiti'})
[8]:
{'res_arcsec': ['150'],
 'climate_scenario': ['rcp26', 'rcp45', 'rcp85', 'historical', 'rcp60'],
 'ref_year': ['2040', '2060', '2080'],
 'nb_synth_tracks': ['50', '10'],
 'spatial_coverage': ['country'],
 'tracks_year_range': ['1980_2020'],
 'country_iso3alpha': ['HTI'],
 'country_name': ['Haiti'],
 'country_iso3num': ['332'],
 'resolution': ['150 arcsec']}
[9]:
client = Client()
tc_haiti = client.get_hazard('tropical_cyclone', properties={'country_name': 'Haiti', 'climate_scenario': 'rcp45', 'ref_year':'2040', 'nb_synth_tracks':'10'})
tc_haiti.plot_intensity(0)
[9]:
<GeoAxesSubplot:title={'center':'TC max intensity at each point'}>
../_images/tutorial_climada_util_api_client_17_1.png

The wrapper functions client.get_litpop_default()

gets the default litpop, with exponents (1,1) and ‘produced capital’ as financial mode. If no country is given, the global dataset will be downloaded.

[10]:
litpop_default = client.get_property_values(litpop_dataset_infos, known_property_values = {'fin_mode':'pc', 'exponents':'(1,1)'})
[11]:
litpop = client.get_litpop_default(country='Haiti')

Get the default impact function for tropical cyclones

[12]:
from climada.entity.impact_funcs import ImpactFuncSet, ImpfTropCyclone

imp_fun = ImpfTropCyclone.from_emanuel_usa()
imp_fun.check()
imp_fun.plot()

imp_fun_set = ImpactFuncSet()
imp_fun_set.append(imp_fun)

litpop.impact_funcs = imp_fun_set
2022-01-31 22:30:21,359 - climada.entity.impact_funcs.base - WARNING - For intensity = 0, mdd != 0 or paa != 0. Consider shifting the origin of the intensity scale. In impact.calc the impact is always null at intensity = 0.
../_images/tutorial_climada_util_api_client_22_1.png

Calculate the impact

[13]:
from climada.engine import Impact
impact = Impact()
impact.calc(litpop, imp_fun_set, tc_haiti)

Getting other Exposures

[14]:
crop_dataset_infos = client.list_dataset_infos(data_type='crop_production')

client.get_property_values(crop_dataset_infos)
[14]:
{'crop': ['whe', 'soy', 'ric', 'mai'],
 'irrigation_status': ['noirr', 'firr'],
 'unit': ['USD', 'Tonnes'],
 'spatial_coverage': ['global']}
[15]:
rice_exposure = client.get_exposures(exposures_type='crop_production', properties = {'crop':'ric', 'unit': 'USD','irrigation_status': 'noirr'})

Technical Information

For programmatical access to the CLIMADA data API there is a specific REST call wrapper class: climada.util.client.Client.

Server

The CLIMADA data file server is hosted on https://data.iac.ethz.ch that can be accessed via a REST API at https://climada.ethz.ch. For REST API details, see the documentation.

Client

[16]:
Client?
Init signature: Client()
Docstring:
Python wrapper around REST calls to the CLIMADA data API server.

Init docstring:
Constructor of Client.

Data API host and chunk_size (for download) are configurable values.
Default values are 'climada.ethz.ch' and 8096 respectively.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:

[17]:
client = Client()
client.chunk_size
[17]:
8192

The url to the API server and the chunk size for the file download can be configured in ‘climada.conf’. Just replace the corresponding default values:

"data_api": {
    "host": "https://climada.ethz.ch",
    "chunk_size": 8192,
    "cache_db": "{local_data.system}/.downloads.db"
}

The other configuration value affecting the data_api client, cache_db, is the path to an SQLite database file, which is keeping track of the files that are successfully downloaded from the api server. Before the Client attempts to download any file from the server, it checks whether the file has been downloaded before and if so, whether the previously downloaded file still looks good (i.e., size and time stamp are as expected). If all of this is the case, the file is simply read from disk without submitting another request.

Metadata

Unique Identifiers

Any dataset can be identified with data_type, name and version. The combination of the three is unique in the API servers’ underlying database. However, sometimes the name is already enough for identification. All datasets have a UUID, a universally unique identifier, which is part of their individual url. E.g., the uuid of the dataset https://climada.ethz.ch/rest/dataset/b1c76120-4e60-4d8f-99c0-7e1e7b7860ec is “b1c76120-4e60-4d8f-99c0-7e1e7b7860ec”. One can retrieve their meta data by:

[18]:
client.get_dataset_info_by_uuid('b1c76120-4e60-4d8f-99c0-7e1e7b7860ec')
[18]:
DatasetInfo(uuid='b1c76120-4e60-4d8f-99c0-7e1e7b7860ec', data_type=DataTypeShortInfo(data_type='litpop', data_type_group='exposures'), name='LitPop_assets_pc_150arcsec_SGS', version='v1', status='active', properties={'res_arcsec': '150', 'exponents': '(3,0)', 'fin_mode': 'pc', 'spatial_coverage': 'country', 'date_creation': '2021-09-23', 'climada_version': 'v2.2.0', 'country_iso3alpha': 'SGS', 'country_name': 'South Georgia and the South Sandwich Islands', 'country_iso3num': '239'}, files=[FileInfo(uuid='b1c76120-4e60-4d8f-99c0-7e1e7b7860ec', url='https://data.iac.ethz.ch/climada/b1c76120-4e60-4d8f-99c0-7e1e7b7860ec/LitPop_assets_pc_150arcsec_SGS.hdf5', file_name='LitPop_assets_pc_150arcsec_SGS.hdf5', file_format='hdf5', file_size=1086488, check_sum='md5:27bc1846362227350495e3d946dfad5e')], doi=None, description="LitPop asset value exposure per country: Gridded physical asset values by country, at a resolution of 150 arcsec. Values are total produced capital values disaggregated proportionally to the cube of nightlight intensity (Lit^3, based on NASA Earth at Night). The following values were used as parameters in the LitPop.from_countries() method:{'total_values': 'None', 'admin1_calc': 'False','reference_year': '2018', 'gpw_version': '4.11'}Reference: Eberenz et al., 2020. https://doi.org/10.5194/essd-12-817-2020", license='Attribution 4.0 International (CC BY 4.0)', activation_date='2021-09-13 09:08:28.358559+00:00', expiration_date=None)

or by filtering:

Data Set Status

The datasets of climada.ethz.ch may have the following stati: - active: the deault for real life data - preliminary: when the dataset is already uploaded but some information or file is still missing - expired: when a dataset is inactivated again - test_dataset: data sets that are used in unit or integration tests have this status in order to be taken seriously by accident When collecting a list of datasets with get_datasets, the default dataset status will be ‘active’. With the argument status=None this filter can be turned off.

DatasetInfo Objects and DataFrames

As stated above get_dataset (or get_dataset_by_uuid) return a DatasetInfo object and get_datasets a list thereof.

[19]:
from climada.util.api_client import DatasetInfo
DatasetInfo?
Init signature:
DatasetInfo(
    uuid: str,
    data_type: climada.util.api_client.DataTypeShortInfo,
    name: str,
    version: str,
    status: str,
    properties: dict,
    files: list,
    doi: str,
    description: str,
    license: str,
    activation_date: str,
    expiration_date: str,
) -> None
Docstring:      dataset data from CLIMADA data API.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:

where files is a list of FileInfo objects:

[20]:
from climada.util.api_client import FileInfo
FileInfo?
Init signature:
FileInfo(
    uuid: str,
    url: str,
    file_name: str,
    file_format: str,
    file_size: int,
    check_sum: str,
) -> None
Docstring:      file data from CLIMADA data API.
File:           c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:           type
Subclasses:

Convert into DataFrame

There are conveinience functions to easily convert datasets into pandas DataFrames, get_datasets and expand_files:

[21]:
client.into_datasets_df?
Signature: client.into_datasets_df(dataset_infos)
Docstring:
Convenience function providing a DataFrame of datasets with properties.

Parameters
----------
dataset_infos : list of DatasetInfo
     as returned by list_dataset_infos

Returns
-------
pandas.DataFrame
    of datasets with properties as found in query by arguments
File:      c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:      function

[22]:
from climada.util.api_client import Client
client = Client()
litpop_datasets = client.list_dataset_infos(data_type='litpop', properties={'country_name': 'South Georgia and the South Sandwich Islands'})
litpop_df = client.into_datasets_df(litpop_datasets)
litpop_df
[22]:
data_type data_type_group uuid name version status doi description license activation_date expiration_date res_arcsec exponents fin_mode spatial_coverage date_creation climada_version country_iso3alpha country_name country_iso3num
0 litpop exposures b1c76120-4e60-4d8f-99c0-7e1e7b7860ec LitPop_assets_pc_150arcsec_SGS v1 active None LitPop asset value exposure per country: Gridd... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:08:28.358559+00:00 None 150 (3,0) pc country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239
1 litpop exposures 3d516897-5f87-46e6-b673-9e6c00d110ec LitPop_pop_150arcsec_SGS v1 active None LitPop population exposure per country: Gridde... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:09:10.634374+00:00 None 150 (0,1) pop country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239
2 litpop exposures a6864a65-36a2-4701-91bc-81b1355103b5 LitPop_150arcsec_SGS v1 active None LitPop asset value exposure per country: Gridd... Attribution 4.0 International (CC BY 4.0) 2021-09-13 09:09:30.907938+00:00 None 150 (1,1) pc country 2021-09-23 v2.2.0 SGS South Georgia and the South Sandwich Islands 239

Download

The wrapper functions get_exposures or get_hazard fetch the information, download the file and opens the file as a climada object. But one can also just download dataset files using the method download_dataset which takes a DatasetInfo object as argument and downloads all files of the dataset to a directory in the local file system.

[23]:
client.download_dataset?
Signature:
client.download_dataset(
    dataset,
    target_dir=WindowsPath('C:/Users/me/climada/data'),
    organize_path=True,
)
Docstring:
Download all files from a given dataset to a given directory.

Parameters
----------
dataset : DatasetInfo
    the dataset
target_dir : Path, optional
    target directory for download, by default `climada.util.constants.SYSTEM_DIR`
organize_path: bool, optional
    if set to True the files will end up in subdirectories of target_dir:
    [target_dir]/[data_type_group]/[data_type]/[name]/[version]
    by default True

Returns
-------
download_dir : Path
    the path to the directory containing the downloaded files,
    will be created if organize_path is True
downloaded_files : list of Path
    the downloaded files themselves

Raises
------
Exception
    when one of the files cannot be downloaded
File:      c:\users\me\polybox\workshop\climada_python\climada\util\api_client.py
Type:      method

Cache

The method avoids superfluous downloads by keeping track of all downloads in a sqlite db file. The client will make sure that the same file is never downloaded to the same target twice.

Examples

[24]:
ds = litpop_datasets[0]
download_dir, ds_files = client.download_dataset(ds)
ds_files[0], ds_files[0].is_file()
[24]:
(WindowsPath('C:/Users/me/climada/data/exposures/litpop/LitPop_assets_pc_150arcsec_SGS/v1/LitPop_assets_pc_150arcsec_SGS.hdf5'),
 True)