Hazard Emulator

Given a database of hazard events, the subpackage climada.hazard.emulator provides tools to sample time series of events according to a climate scenario in a specific georegion.

The given event database is supposed to be divided into a (smaller) set of observed hazard events and a (much larger) set of simulated hazard events. The database of observed events is used to statistically fit the frequency and intensity of events in a fixed georegion to (observed) climate indices. Then, given a hypothetical (future) time series of these climate indices (a “climate scenario”), a “hazard emulator” can draw random samples from the larger database of simulated hazard events that mimic the expected occurrence of events under the given climate scenario in the specified georegion.

The concept and algorithm as applied to tropical cyclones is originally due to Tobias Geiger (unpublished as of now) and has been generalized within this package by Thomas Vogt.

This notebook illustrates the functionality through the example of tropical cyclones in the Pacific Ocean under the RCP 2.6 climate scenario according to the MIROC5 global circulation model (GCM).

About the input data used for this notebook

For historical reasons, this example loads tropical cyclone windfields that have been precomputed with the old MATLAB version of CLIMADA. However, the computation can be done with current Python-based versions of CLIMADA, as well. Since windfield computation is quite time-consuming, the windfield computation is not part of this notebook, but precomputed windfields are used.

The example is based on simulated TC tracks provided by Kerry Emanuel for ISIMIP (version 2b). The tracks and precomputed windfields are placed in the following directories:

$CLIMADA_DIR/data/emulator/tracks/*.mat
$CLIMADA_DIR/data/emulator/windfields/*.mat

Precomputed windfields for the IBTrACS TCs are in

$CLIMADA_DIR/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat

The climate index time series for the different GCMs and RCPs should be available in

$CLIMADA_DIR/data/emulator/climate_index/*.csv

Accordingly, we define an input data directory as follows:

[1]:
from climada.util.constants import DEMO_DIR
EMULATOR_DATA_DIR = DEMO_DIR.joinpath("emulator")
2020-08-07 09:52:01,780 - climada - DEBUG - Loading default config file: /home/tovogt/code/climada_python/climada/conf/defaults.conf

Load the input data

First, we choose the georegion of interest: a TC ocean basin (Eastern North Pacific). Only hazard intensities observable within this region will be loaded:

[2]:
from climada.hazard.emulator.geo import TCRegion
reg = TCRegion(tc_basin="EP")

Next, we load the database of observed events which is made up of IBTrACS storms within a known reliable time period:

[3]:
import datetime as dt
import numpy as np
import shapely
from climada.hazard import TropCyclone, TCTracks
from climada.hazard.base import DEF_VAR_MAT
from climada.hazard.emulator.const import TC_BASIN_NORM_PERIOD

def _ibtracs_id2meta(id_int):
    """Derive storm meta data from ibtracs storm ID (int)"""
    id_str = str(int(id_int))
    hemisphere = 'N' if id_str[7] == '0' else 'S'
    id_str = id_str[:7] + hemisphere + id_str[8:]
    year = int(id_str[:4])
    days = int(id_str[4:7])
    date = dt.datetime(year, 1, 1) + dt.timedelta(days - 1)
    return (id_str, year, date.month, date.day, hemisphere)

def ibtracs_windfields(region, period=None):
    """Load subset of precomputed windfields for ibtracs TCs (1950-2015)

    Parameters
    ----------
    region : TCRegion object
        The geographical region to consider.
    period : pair of ints (minyear, maxyear)
        First and last year to consider.

    Returns
    -------
    windfields : climada.hazard.TropCyclone object
    """
    var_names = DEF_VAR_MAT
    var_names['var_name']['even_id'] = "ID_no"

    fname = 'GLB_0360as_hazard_1950-2015.mat'
    path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
    windfields = TropCyclone()
    windfields.read_mat(path, var_names=var_names)
    ibtracs_meta = [_ibtracs_id2meta(i) for i in windfields.event_id]
    dates = [dt.date(*m[1:4]).toordinal() for m in ibtracs_meta]
    windfields.date = np.array(dates, dtype=np.int64)
    windfields.event_name = [m[0] for m in ibtracs_meta]
    windfields.event_id = np.arange(len(ibtracs_meta))

    # identify centroids in specified region
    lat, lon = windfields.centroids.lat, windfields.centroids.lon
    windfields.centroids.region_id \
        = shapely.vectorized.contains(region.shape, lon, lat)

    # select windfields in specified period and region
    if period is not None:
        period = [f"{period[0]}-01-01", f"{period[0]}-12-31"]
    windfields = windfields.select(date=period, reg_id=1)

    return windfields

def precompute_ibtracs_windfields():
    """This is how you would precompute the IBTrACS windfields in climada_python"""
    tracks = TCTracks()
    tracks.read_ibtracs_netcdf(year_range=(1950, 2019), estimate_missing=True)
    tracks.equal_timestep(time_step_h=1)
    fname = 'GLB_0360as_hazard_1950-2019.hdf5'
    path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
    windfields = TropCyclone()
    windfields.set_from_tracks(tracks)
    windfields.write_hdf5(path)

norm_period = TC_BASIN_NORM_PERIOD[reg.tc_basin[:2]]
windfields_obs = ibtracs_windfields(reg, period=norm_period)
2020-08-07 09:52:04,652 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat
2020-08-07 09:52:06,129 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat

As a database of simulated TC events we use the TC tracks provided by Kerry Emanuel for ISIMIP2b:

[4]:
import pandas as pd

def emanuel_meta():
    meta_path = EMULATOR_DATA_DIR.joinpath("emanuel_fnames.csv")
    if meta_path.exists():
        return pd.read_csv(meta_path)

    pattern = "(temp_|Trial(?P<trial>[0-9])_GB_dk)" \
              "(?P<gcm>[0-9a-z]+)_?" \
              "((?P<rcp>piControl|20th|rcp[0-9]{2})cal)(|_full)_" \
              "(?P<hemisphere>N|S)_0360as\.mat"
    prog = re.compile(pattern)
    df = []
    for path in glob.glob(str(EMULATOR_DATA_DIR.joinpath("windfields", "*.mat"))):
        m = prog.match(path.name)
        try:
            haz = h5py.File(path, "r")['hazard']
        except OSError:
            continue
        is_rcp85 = "rcp85" if m.group("trial") is None else m.group("rcp")
        df.append({
            "basename": fname[:-13],
            "windfield_fname": fname,
            "minyear": int(haz['yyyy'][0,0]),
            "maxyear": int(haz['yyyy'][-1,0]),
            "gcm": gcm_trans_inv(m.group("gcm"), is_rcp85),
            "rcp": m.group("rcp"),
            "hemisphere": m.group("hemisphere"),
            "trial": 0 if is_rcp85 == "rcp85" else int(m.group("trial")),
            "tracks_per_year": 600 if is_rcp85 == "rcp85" else 300,
        })
    cols = ["basename", "windfield_fname", "minyear", "maxyear",
            "gcm", "rcp",  "hemisphere", "trial", "tracks_per_year"]
    df = pd.DataFrame(df, columns=cols)
    df = df.sort_values(by=["gcm", "rcp", "minyear", "hemisphere"])
    df.to_csv(meta_path, index=None)
    return df

def emanuel_windfields(region, gcm=None, rcp=None, period=None, trial=None):
    """ Load pre-calculated windfields for simulated storm tracks

    Parameters
    ----------
    region : TCRegion object
        The geographical region to consider. This is not optional since
        windfields are separated by hemisphere.
    gcm : list of str, optional
        Name of GCMs, such as "MPI-ESM-MR".
    rcp : list of str, optional
        Name of RCPs, such as "rcp26". The historical data ("20th") doesn't need
        to be selected explicitly.
    period : pair of ints (minyear, maxyear), optional
        First and last year to consider.
    trial : list of int, optional
        Trials to include in the selection. By default, 2 and 3 are excluded
        and 0 is only used for rcp85.

    Returns
    -------
    windfields : climada.hazard.TropCyclone object
    """
    meta = emanuel_meta()
    meta = meta[meta['hemisphere'] == region.hemisphere]

    if trial is None:
        trial = [1, 4]
        if rcp is not None and "rcp85" in rcp:
            trial.append(0)
    meta = meta[meta['trial'].isin(trial)]

    if gcm is not None:
        meta = meta[meta['gcm'].isin(gcm)]

    if rcp is not None:
        meta = meta[(meta['rcp'] == '20th') | meta['rcp'].isin(rcp)]

    # intersection with specified period
    if period is not None:
        meta = meta[(period[0] <= meta['maxyear']) & (meta['minyear'] <= period[1])]

    if meta.shape[0] == 0:
        raise Exception("Given gcm/rcp/period matches no trials!")

    hazards = []
    for idx, row in meta.iterrows():
        fname = row['windfield_fname']
        path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
        haz = TropCyclone()
        haz.read_mat(path)
        haz.event_name = [f"{fname}-{n}" for n in haz.event_name]
        # some datasets include centroids beyond 60° that are irrelevant for TC hazards
        cutidx = 901186 if region.hemisphere == 'N' else 325229
        haz.centroids.region_id = np.zeros_like(haz.centroids.lat)
        haz.centroids.region_id[:cutidx] = 1
        haz = haz.select(reg_id=1)
        hazards.append(haz)
    windfields = TropCyclone()
    windfields.concatenate(hazards)

    # identify centroids in specified region
    lat, lon = windfields.centroids.lat, windfields.centroids.lon
    windfields.centroids.region_id \
        = shapely.vectorized.contains(region.shape, lon, lat)

    # select windfields in specified period and region
    if period is not None:
        period = (f"{period[0]}-01-01", f"{period[1]}-12-31")
    windfields = windfields.select(date=period, reg_id=1)

    return windfields

# one database for sampling, and one for the statistical calibration (bias correction) according to the chosen climate scenario:
windfields_pool = emanuel_windfields(reg, gcm=["MIROC5"], period=(1950, 2100))
windfields_rcp = emanuel_windfields(reg, gcm=["MIROC5"], rcp=["rcp26"], period=(1950, 2015))
2020-08-07 09:52:09,031 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:09,802 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:13,037 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:14,463 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:20,945 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat
2020-08-07 09:52:22,863 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat
2020-08-07 09:52:29,945 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat
2020-08-07 09:52:31,705 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat
2020-08-07 09:52:41,342 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:42,160 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:45,797 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:47,438 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat

Extract events that affect the region of interest

From the Hazard objects, we extract those events that actually “affect” the georegion of interest and store for each the maximum intensity observed within the region:

[5]:
from climada.hazard.emulator.stats import haz_max_events

# for this example, we regard regions as `affected` if they face at least 34 knots wind speeds
KNOTS_2_MS = 0.514444
MIN_WIND_KT = 34
MIN_WIND_MS = MIN_WIND_KT * KNOTS_2_MS

tc_events_obs = haz_max_events(windfields_obs, min_thresh=MIN_WIND_MS)
tc_events_pool = haz_max_events(windfields_pool, min_thresh=MIN_WIND_MS)
tc_events_rcp = haz_max_events(windfields_rcp, min_thresh=MIN_WIND_MS)
2020-08-07 09:52:55,232 - climada.hazard.emulator.stats - INFO - Condensing 5688 hazards to 316 max events ...
2020-08-07 09:52:56,148 - climada.hazard.emulator.stats - INFO - Condensing 58401 hazards to 15474 max events ...
2020-08-07 09:52:56,908 - climada.hazard.emulator.stats - INFO - Condensing 10908 hazards to 2921 max events ...

From the simulated TC tracks in ISIMIP we can extract a time series of expected global annual TC frequencies under the given RCP:

[6]:
import scipy.io

def emanuel_frequency_normalization(gcm, rcp, period):
    """ Frequency normalization factors for given GCM and RCP, in 1950-2100

    Parameters
    ----------
    gcm : str
        Name of GCM, such as "MPI-ESM-MR".
    rcp : str
        Name of RCP, such as "rcp26".
    period : pair of ints (minyear, maxyear)
        First and last year to consider.

    Returns
    -------
    freq_norm : DataFrame { year, freq }
        Information about the relative surplus of simulated events, i.e.,
        if `freq_norm` specifies the value 0.2 in some year, then it is
        assumed that the number of events simulated for that year is 5 times as
        large as it is estimated to be.
    """
    meta = emanuel_meta()
    meta = meta[meta['hemisphere'] == 'N']
    meta = meta[(meta['trial'] != 2) & (meta['trial'] != 3)]
    if rcp != "rcp85":
        meta = meta[meta['trial'] != 0]
    meta = meta[(meta['gcm'] == gcm)]
    meta = meta[(meta['rcp'] == '20th') | (meta['rcp'] == rcp)]
    freq = []
    for idx, row in meta.iterrows():
        path = EMULATOR_DATA_DIR.joinpath("tracks", f"{row['basename']}.mat")
        tracks = scipy.io.loadmat(path, variable_names=['yearstore', 'freqyear'])
        freq.append(pd.DataFrame({
            'year': np.unique(tracks['yearstore'].ravel()),
            'freq': tracks['freqyear'].ravel() / row['tracks_per_year'],
        }))
    freq = pd.concat(freq, ignore_index=True)
    freq = freq[(period[0] <= freq['year']) & (freq['year'] <= period[1])]
    freq = freq.sort_values(by=["year"]).reset_index(drop=True)
    return freq

freq = emanuel_frequency_normalization("MIROC5", "rcp26", (1950, 2015))

Initialize and calibrate the hazard emulator

We have all data that is required to set up a hazard emulator:

[7]:
from climada.hazard.emulator.emulator import EventPool, HazardEmulator
em = HazardEmulator(tc_events_rcp, tc_events_obs, reg, freq, pool=EventPool(tc_events_pool))
2020-08-07 09:52:57,405 - climada.hazard.emulator.random - INFO - Results of intensity normalization by subsampling:
2020-08-07 09:52:57,405 - climada.hazard.emulator.random - INFO - - drop 66% of entries satisfying 'intensity > 37.89053267580431'
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events before dropping is 37.8905
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events after dropping is 33.1730
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of observed events is 32.5577

We calibrate the emulator, i.e., we determine a statistical connection between climate indices (GMT and ENSO in this example) and tc_events_rcp:

[8]:
def climate_index(gcm, rcp, index, running_mean=21):
    """ Load time series of a climate index (e.g. GMT) for a given GCM/RCP

    The time period is 1861-2100 (1861-2299 for rcp26)

    The data is concatenated from historical and future datasets, applying a
    21-year running mean in the case of GMT-based indices.

    CAUTION: For the running mean, the data is *extended* at the edges by
    repeating the edge values; thereby any trend present in the data will
    become attenuated at the edges!

    GMT data is relative to piControl mean over 500 year reference period.

    Parameters
    ----------
    gcm : str
        Name of GCM, such as "MPI-ESM-MR".
    rcp : str
        Name of RCP, such as "rcp26".
    index : str
        Name of index, one of ["gmt", "gmtTR", "esoi", "nao", "nino34", "pdo"],

            GMT : Global mean (surface) temperature
            GMT TR : GMT in the tropics, between -30 and +30 degrees latitude
            ESOI : El Nino southern oscillation index
            NAO : North Atlantic Oscillation
            NINO34 : Nino 3.4 sea surface temperature index
            PDO : Pacific decadal oscillation
    running_mean : int
        For GMT data, the running mean period. Defaults to 21.

    Returns
    -------
    ci : DataFrame { year, month, `index` }
        Monthly data of given climate index.
    """
    index_path = EMULATOR_DATA_DIR.joinpath("climate_index")
    base_min, base_max, avg_interval = ({
        'gmt': (1971, 2000, ''),
        'gmtTR': (1971, 2000, ''),
        'esoi': (1950, 1979, '_3m'),
        'nao': (1950, 1979, '_3m'),
        'nino34': (1950, 1979, '_3m'),
        'pdo': (1971, 2000, ''),
    })[index]

    allmin = 1861
    allmax = 2299 if rcp == 'rcp26' else 2100

    ci = pd.DataFrame()
    periods = [('historical', allmin, 2005), (rcp, 2006, allmax)]
    for pname, minyear, maxyear in periods:
        fname = f"{index}-index_monthly_{gcm}-{pname}_{minyear}-{maxyear}" \
                f"_base-{base_min}-{base_max}{avg_interval}.csv"
        path = index_path.joinpath(fname)
        if index == 'pdo':
            tmp = pd.read_csv(path, delim_whitespace=True, skiprows=1, header=None)
            cols = ['time', 'pdo']
            tmp.columns = cols
        else:
            tmp = pd.read_csv(path)
            if 'Unnamed: 0' in tmp.columns:
                del tmp['Unnamed: 0']
        ci = ci.append(tmp).reset_index(drop=True)
    year_month_day = ci['time'].str.split("-", expand=True)
    ci['year'] = year_month_day[0].astype(int)
    ci['month'] = year_month_day[1].astype(int)
    ci = ci.drop(labels=['time'], axis=1)
    if ci['year'].max() == 2099:
        ci2100 = ci[ci['year'] == 2099]
        ci2100['year'] = 2100
        ci = ci.append(ci2100)

    if index in ['gmt', 'gmtTR']:
        # define GMT change wrt piControl mean and apply running mean
        minyear, maxyear = ({
            'GFDL-ESM2M': (1, 500),
            'IPSL-CM5A-LR': (1800, 2299),
            'MIROC5': (2000, 2499),
            'HadGEM2-ES': (1900, 2399),
        })[gcm]
        fname = f"{index}-index_monthly_{gcm}-piControl_{minyear}-{maxyear}" \
                f"_base-{minyear}-{maxyear}.csv"
        path = index_path.joinpath(fname)
        ci[index] = ci[index] - pd.read_csv(path)['gmt'].mean()

        N = running_mean
        halfwin = N // 2
        padded_data = np.pad(ci[index].to_numpy(), halfwin, mode='edge')
        ci[index] = np.convolve(padded_data, np.ones(N)/N, mode='valid')

    return ci

ci = [climate_index("MIROC5", "rcp26", ci) for ci in ["gmt", "esoi"]]
em.calibrate_statistics(ci)

Now that the emulator is calibrated, we use GMT and ENSO time series to predict TC statistics under the chosen climate scenario:

[9]:
em.predict_statistics(ci)
2020-08-07 09:52:57,505 - climada.hazard.emulator.emulator - INFO - Predicting TCs with new climate index dataset...

Draw samples according to climate scenario

The emulator can now be used to sample hypothetical events within an arbitrary time period covered by the climate index time series used above:

[10]:
draws = em.draw_realizations(100, (2020, 2050))
2020-08-07 09:52:57,527 - climada.hazard.emulator.emulator - INFO - Drawing 100 realizations for period  (2020, 2050)
2020 ... 2050 ... 2050

The returned object draws is a DataFrame with each row corresponding to a storm event from the hazard pool windfields_pool (see above): The column real_id assigns one of 100 realizations to each of the events while the columns id and name are the unique ID and name used in windfields_pool to identify this hazard event. The column year indicates the year in which the event would occur under the hypothetical climate scenario and will usually differ from the date associated with the event in windfields_pool.

[11]:
print(draws[:25])
       id                                           name  year  real_id
0   29344   Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-6076  2020        0
1   28893   Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-5272  2020        0
2   50512  Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-14879  2020        0
3     210     Trial1_GB_dkmiroc_20thcal_N_0360as.mat-328  2020        0
4   30111   Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-7503  2020        0
5   27807   Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-3499  2020        0
6   25513  Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-27994  2020        0
7    8970   Trial1_GB_dkmiroc_20thcal_N_0360as.mat-16451  2020        1
8   41882  Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-28187  2020        1
9   33033  Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-12694  2020        1
10  13170   Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-7212  2020        1
11  37879  Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-21292  2020        1
12   1252    Trial1_GB_dkmiroc_20thcal_N_0360as.mat-2161  2020        1
13  20287  Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-19475  2020        2
14  30980   Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-9074  2020        2
15  32587  Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-11892  2020        2
16  56887  Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-25691  2020        2
17  42985   Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-1651  2020        2
18  54592  Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-21784  2020        2
19   2511    Trial1_GB_dkmiroc_20thcal_N_0360as.mat-4481  2020        3
20   4082    Trial1_GB_dkmiroc_20thcal_N_0360as.mat-7410  2020        3
21   9685    Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-960  2020        3
22  50617  Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-15050  2020        3
23  20518  Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-19822  2020        3
24  41643  Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-27806  2020        3