Hazard Emulator¶
Given a database of hazard events, the subpackage climada.hazard.emulator
provides tools to sample time series of events according to a climate scenario in a specific georegion.
The given event database is supposed to be divided into a (smaller) set of observed hazard events and a (much larger) set of simulated hazard events. The database of observed events is used to statistically fit the frequency and intensity of events in a fixed georegion to (observed) climate indices. Then, given a hypothetical (future) time series of these climate indices (a “climate scenario”), a “hazard emulator” can draw random samples from the larger database of simulated hazard events that mimic the expected occurrence of events under the given climate scenario in the specified georegion.
The concept and algorithm as applied to tropical cyclones is originally due to Tobias Geiger (unpublished as of now) and has been generalized within this package by Thomas Vogt.
This notebook illustrates the functionality through the example of tropical cyclones in the Pacific Ocean under the RCP 2.6 climate scenario according to the MIROC5 global circulation model (GCM).
About the input data used for this notebook¶
For historical reasons, this example loads tropical cyclone windfields that have been precomputed with the old MATLAB version of CLIMADA. However, the computation can be done with current Python-based versions of CLIMADA, as well. Since windfield computation is quite time-consuming, the windfield computation is not part of this notebook, but precomputed windfields are used.
The example is based on simulated TC tracks provided by Kerry Emanuel for ISIMIP (version 2b). The tracks and precomputed windfields are placed in the following directories:
$CLIMADA_DIR/data/emulator/tracks/*.mat
$CLIMADA_DIR/data/emulator/windfields/*.mat
Precomputed windfields for the IBTrACS TCs are in
$CLIMADA_DIR/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat
The climate index time series for the different GCMs and RCPs should be available in
$CLIMADA_DIR/data/emulator/climate_index/*.csv
Accordingly, we define an input data directory as follows:
[1]:
from climada.util.constants import DEMO_DIR
EMULATOR_DATA_DIR = DEMO_DIR.joinpath("emulator")
2020-08-07 09:52:01,780 - climada - DEBUG - Loading default config file: /home/tovogt/code/climada_python/climada/conf/defaults.conf
Load the input data¶
First, we choose the georegion of interest: a TC ocean basin (Eastern North Pacific). Only hazard intensities observable within this region will be loaded:
[2]:
from climada.hazard.emulator.geo import TCRegion
reg = TCRegion(tc_basin="EP")
Next, we load the database of observed events which is made up of IBTrACS storms within a known reliable time period:
[3]:
import datetime as dt
import numpy as np
import shapely
from climada.hazard import TropCyclone, TCTracks
from climada.hazard.base import DEF_VAR_MAT
from climada.hazard.emulator.const import TC_BASIN_NORM_PERIOD
def _ibtracs_id2meta(id_int):
"""Derive storm meta data from ibtracs storm ID (int)"""
id_str = str(int(id_int))
hemisphere = 'N' if id_str[7] == '0' else 'S'
id_str = id_str[:7] + hemisphere + id_str[8:]
year = int(id_str[:4])
days = int(id_str[4:7])
date = dt.datetime(year, 1, 1) + dt.timedelta(days - 1)
return (id_str, year, date.month, date.day, hemisphere)
def ibtracs_windfields(region, period=None):
"""Load subset of precomputed windfields for ibtracs TCs (1950-2015)
Parameters
----------
region : TCRegion object
The geographical region to consider.
period : pair of ints (minyear, maxyear)
First and last year to consider.
Returns
-------
windfields : climada.hazard.TropCyclone object
"""
var_names = DEF_VAR_MAT
var_names['var_name']['even_id'] = "ID_no"
fname = 'GLB_0360as_hazard_1950-2015.mat'
path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
windfields = TropCyclone()
windfields.read_mat(path, var_names=var_names)
ibtracs_meta = [_ibtracs_id2meta(i) for i in windfields.event_id]
dates = [dt.date(*m[1:4]).toordinal() for m in ibtracs_meta]
windfields.date = np.array(dates, dtype=np.int64)
windfields.event_name = [m[0] for m in ibtracs_meta]
windfields.event_id = np.arange(len(ibtracs_meta))
# identify centroids in specified region
lat, lon = windfields.centroids.lat, windfields.centroids.lon
windfields.centroids.region_id \
= shapely.vectorized.contains(region.shape, lon, lat)
# select windfields in specified period and region
if period is not None:
period = [f"{period[0]}-01-01", f"{period[0]}-12-31"]
windfields = windfields.select(date=period, reg_id=1)
return windfields
def precompute_ibtracs_windfields():
"""This is how you would precompute the IBTrACS windfields in climada_python"""
tracks = TCTracks()
tracks.read_ibtracs_netcdf(year_range=(1950, 2019), estimate_missing=True)
tracks.equal_timestep(time_step_h=1)
fname = 'GLB_0360as_hazard_1950-2019.hdf5'
path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
windfields = TropCyclone()
windfields.set_from_tracks(tracks)
windfields.write_hdf5(path)
norm_period = TC_BASIN_NORM_PERIOD[reg.tc_basin[:2]]
windfields_obs = ibtracs_windfields(reg, period=norm_period)
2020-08-07 09:52:04,652 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat
2020-08-07 09:52:06,129 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/GLB_0360as_hazard_1950-2015.mat
As a database of simulated TC events we use the TC tracks provided by Kerry Emanuel for ISIMIP2b:
[4]:
import pandas as pd
def emanuel_meta():
meta_path = EMULATOR_DATA_DIR.joinpath("emanuel_fnames.csv")
if meta_path.exists():
return pd.read_csv(meta_path)
pattern = "(temp_|Trial(?P<trial>[0-9])_GB_dk)" \
"(?P<gcm>[0-9a-z]+)_?" \
"((?P<rcp>piControl|20th|rcp[0-9]{2})cal)(|_full)_" \
"(?P<hemisphere>N|S)_0360as\.mat"
prog = re.compile(pattern)
df = []
for path in glob.glob(str(EMULATOR_DATA_DIR.joinpath("windfields", "*.mat"))):
m = prog.match(path.name)
try:
haz = h5py.File(path, "r")['hazard']
except OSError:
continue
is_rcp85 = "rcp85" if m.group("trial") is None else m.group("rcp")
df.append({
"basename": fname[:-13],
"windfield_fname": fname,
"minyear": int(haz['yyyy'][0,0]),
"maxyear": int(haz['yyyy'][-1,0]),
"gcm": gcm_trans_inv(m.group("gcm"), is_rcp85),
"rcp": m.group("rcp"),
"hemisphere": m.group("hemisphere"),
"trial": 0 if is_rcp85 == "rcp85" else int(m.group("trial")),
"tracks_per_year": 600 if is_rcp85 == "rcp85" else 300,
})
cols = ["basename", "windfield_fname", "minyear", "maxyear",
"gcm", "rcp", "hemisphere", "trial", "tracks_per_year"]
df = pd.DataFrame(df, columns=cols)
df = df.sort_values(by=["gcm", "rcp", "minyear", "hemisphere"])
df.to_csv(meta_path, index=None)
return df
def emanuel_windfields(region, gcm=None, rcp=None, period=None, trial=None):
""" Load pre-calculated windfields for simulated storm tracks
Parameters
----------
region : TCRegion object
The geographical region to consider. This is not optional since
windfields are separated by hemisphere.
gcm : list of str, optional
Name of GCMs, such as "MPI-ESM-MR".
rcp : list of str, optional
Name of RCPs, such as "rcp26". The historical data ("20th") doesn't need
to be selected explicitly.
period : pair of ints (minyear, maxyear), optional
First and last year to consider.
trial : list of int, optional
Trials to include in the selection. By default, 2 and 3 are excluded
and 0 is only used for rcp85.
Returns
-------
windfields : climada.hazard.TropCyclone object
"""
meta = emanuel_meta()
meta = meta[meta['hemisphere'] == region.hemisphere]
if trial is None:
trial = [1, 4]
if rcp is not None and "rcp85" in rcp:
trial.append(0)
meta = meta[meta['trial'].isin(trial)]
if gcm is not None:
meta = meta[meta['gcm'].isin(gcm)]
if rcp is not None:
meta = meta[(meta['rcp'] == '20th') | meta['rcp'].isin(rcp)]
# intersection with specified period
if period is not None:
meta = meta[(period[0] <= meta['maxyear']) & (meta['minyear'] <= period[1])]
if meta.shape[0] == 0:
raise Exception("Given gcm/rcp/period matches no trials!")
hazards = []
for idx, row in meta.iterrows():
fname = row['windfield_fname']
path = EMULATOR_DATA_DIR.joinpath("windfields", fname)
haz = TropCyclone()
haz.read_mat(path)
haz.event_name = [f"{fname}-{n}" for n in haz.event_name]
# some datasets include centroids beyond 60° that are irrelevant for TC hazards
cutidx = 901186 if region.hemisphere == 'N' else 325229
haz.centroids.region_id = np.zeros_like(haz.centroids.lat)
haz.centroids.region_id[:cutidx] = 1
haz = haz.select(reg_id=1)
hazards.append(haz)
windfields = TropCyclone()
windfields.concatenate(hazards)
# identify centroids in specified region
lat, lon = windfields.centroids.lat, windfields.centroids.lon
windfields.centroids.region_id \
= shapely.vectorized.contains(region.shape, lon, lat)
# select windfields in specified period and region
if period is not None:
period = (f"{period[0]}-01-01", f"{period[1]}-12-31")
windfields = windfields.select(date=period, reg_id=1)
return windfields
# one database for sampling, and one for the statistical calibration (bias correction) according to the chosen climate scenario:
windfields_pool = emanuel_windfields(reg, gcm=["MIROC5"], period=(1950, 2100))
windfields_rcp = emanuel_windfields(reg, gcm=["MIROC5"], rcp=["rcp26"], period=(1950, 2015))
2020-08-07 09:52:09,031 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:09,802 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:13,037 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:14,463 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:20,945 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat
2020-08-07 09:52:22,863 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat
2020-08-07 09:52:29,945 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat
2020-08-07 09:52:31,705 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat
2020-08-07 09:52:41,342 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:42,160 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_20thcal_N_0360as.mat
2020-08-07 09:52:45,797 - climada.hazard.base - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
2020-08-07 09:52:47,438 - climada.hazard.centroids.centr - INFO - Reading /home/tovogt/code/climada_python/data/emulator/windfields/Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat
Extract events that affect the region of interest¶
From the Hazard objects, we extract those events that actually “affect” the georegion of interest and store for each the maximum intensity observed within the region:
[5]:
from climada.hazard.emulator.stats import haz_max_events
# for this example, we regard regions as `affected` if they face at least 34 knots wind speeds
KNOTS_2_MS = 0.514444
MIN_WIND_KT = 34
MIN_WIND_MS = MIN_WIND_KT * KNOTS_2_MS
tc_events_obs = haz_max_events(windfields_obs, min_thresh=MIN_WIND_MS)
tc_events_pool = haz_max_events(windfields_pool, min_thresh=MIN_WIND_MS)
tc_events_rcp = haz_max_events(windfields_rcp, min_thresh=MIN_WIND_MS)
2020-08-07 09:52:55,232 - climada.hazard.emulator.stats - INFO - Condensing 5688 hazards to 316 max events ...
2020-08-07 09:52:56,148 - climada.hazard.emulator.stats - INFO - Condensing 58401 hazards to 15474 max events ...
2020-08-07 09:52:56,908 - climada.hazard.emulator.stats - INFO - Condensing 10908 hazards to 2921 max events ...
From the simulated TC tracks in ISIMIP we can extract a time series of expected global annual TC frequencies under the given RCP:
[6]:
import scipy.io
def emanuel_frequency_normalization(gcm, rcp, period):
""" Frequency normalization factors for given GCM and RCP, in 1950-2100
Parameters
----------
gcm : str
Name of GCM, such as "MPI-ESM-MR".
rcp : str
Name of RCP, such as "rcp26".
period : pair of ints (minyear, maxyear)
First and last year to consider.
Returns
-------
freq_norm : DataFrame { year, freq }
Information about the relative surplus of simulated events, i.e.,
if `freq_norm` specifies the value 0.2 in some year, then it is
assumed that the number of events simulated for that year is 5 times as
large as it is estimated to be.
"""
meta = emanuel_meta()
meta = meta[meta['hemisphere'] == 'N']
meta = meta[(meta['trial'] != 2) & (meta['trial'] != 3)]
if rcp != "rcp85":
meta = meta[meta['trial'] != 0]
meta = meta[(meta['gcm'] == gcm)]
meta = meta[(meta['rcp'] == '20th') | (meta['rcp'] == rcp)]
freq = []
for idx, row in meta.iterrows():
path = EMULATOR_DATA_DIR.joinpath("tracks", f"{row['basename']}.mat")
tracks = scipy.io.loadmat(path, variable_names=['yearstore', 'freqyear'])
freq.append(pd.DataFrame({
'year': np.unique(tracks['yearstore'].ravel()),
'freq': tracks['freqyear'].ravel() / row['tracks_per_year'],
}))
freq = pd.concat(freq, ignore_index=True)
freq = freq[(period[0] <= freq['year']) & (freq['year'] <= period[1])]
freq = freq.sort_values(by=["year"]).reset_index(drop=True)
return freq
freq = emanuel_frequency_normalization("MIROC5", "rcp26", (1950, 2015))
Initialize and calibrate the hazard emulator¶
We have all data that is required to set up a hazard emulator:
[7]:
from climada.hazard.emulator.emulator import EventPool, HazardEmulator
em = HazardEmulator(tc_events_rcp, tc_events_obs, reg, freq, pool=EventPool(tc_events_pool))
2020-08-07 09:52:57,405 - climada.hazard.emulator.random - INFO - Results of intensity normalization by subsampling:
2020-08-07 09:52:57,405 - climada.hazard.emulator.random - INFO - - drop 66% of entries satisfying 'intensity > 37.89053267580431'
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events before dropping is 37.8905
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events after dropping is 33.1730
2020-08-07 09:52:57,406 - climada.hazard.emulator.random - INFO - - mean intensity of observed events is 32.5577
We calibrate the emulator, i.e., we determine a statistical connection between climate indices (GMT and ENSO in this example) and tc_events_rcp
:
[8]:
def climate_index(gcm, rcp, index, running_mean=21):
""" Load time series of a climate index (e.g. GMT) for a given GCM/RCP
The time period is 1861-2100 (1861-2299 for rcp26)
The data is concatenated from historical and future datasets, applying a
21-year running mean in the case of GMT-based indices.
CAUTION: For the running mean, the data is *extended* at the edges by
repeating the edge values; thereby any trend present in the data will
become attenuated at the edges!
GMT data is relative to piControl mean over 500 year reference period.
Parameters
----------
gcm : str
Name of GCM, such as "MPI-ESM-MR".
rcp : str
Name of RCP, such as "rcp26".
index : str
Name of index, one of ["gmt", "gmtTR", "esoi", "nao", "nino34", "pdo"],
GMT : Global mean (surface) temperature
GMT TR : GMT in the tropics, between -30 and +30 degrees latitude
ESOI : El Nino southern oscillation index
NAO : North Atlantic Oscillation
NINO34 : Nino 3.4 sea surface temperature index
PDO : Pacific decadal oscillation
running_mean : int
For GMT data, the running mean period. Defaults to 21.
Returns
-------
ci : DataFrame { year, month, `index` }
Monthly data of given climate index.
"""
index_path = EMULATOR_DATA_DIR.joinpath("climate_index")
base_min, base_max, avg_interval = ({
'gmt': (1971, 2000, ''),
'gmtTR': (1971, 2000, ''),
'esoi': (1950, 1979, '_3m'),
'nao': (1950, 1979, '_3m'),
'nino34': (1950, 1979, '_3m'),
'pdo': (1971, 2000, ''),
})[index]
allmin = 1861
allmax = 2299 if rcp == 'rcp26' else 2100
ci = pd.DataFrame()
periods = [('historical', allmin, 2005), (rcp, 2006, allmax)]
for pname, minyear, maxyear in periods:
fname = f"{index}-index_monthly_{gcm}-{pname}_{minyear}-{maxyear}" \
f"_base-{base_min}-{base_max}{avg_interval}.csv"
path = index_path.joinpath(fname)
if index == 'pdo':
tmp = pd.read_csv(path, delim_whitespace=True, skiprows=1, header=None)
cols = ['time', 'pdo']
tmp.columns = cols
else:
tmp = pd.read_csv(path)
if 'Unnamed: 0' in tmp.columns:
del tmp['Unnamed: 0']
ci = ci.append(tmp).reset_index(drop=True)
year_month_day = ci['time'].str.split("-", expand=True)
ci['year'] = year_month_day[0].astype(int)
ci['month'] = year_month_day[1].astype(int)
ci = ci.drop(labels=['time'], axis=1)
if ci['year'].max() == 2099:
ci2100 = ci[ci['year'] == 2099]
ci2100['year'] = 2100
ci = ci.append(ci2100)
if index in ['gmt', 'gmtTR']:
# define GMT change wrt piControl mean and apply running mean
minyear, maxyear = ({
'GFDL-ESM2M': (1, 500),
'IPSL-CM5A-LR': (1800, 2299),
'MIROC5': (2000, 2499),
'HadGEM2-ES': (1900, 2399),
})[gcm]
fname = f"{index}-index_monthly_{gcm}-piControl_{minyear}-{maxyear}" \
f"_base-{minyear}-{maxyear}.csv"
path = index_path.joinpath(fname)
ci[index] = ci[index] - pd.read_csv(path)['gmt'].mean()
N = running_mean
halfwin = N // 2
padded_data = np.pad(ci[index].to_numpy(), halfwin, mode='edge')
ci[index] = np.convolve(padded_data, np.ones(N)/N, mode='valid')
return ci
ci = [climate_index("MIROC5", "rcp26", ci) for ci in ["gmt", "esoi"]]
em.calibrate_statistics(ci)
Now that the emulator is calibrated, we use GMT and ENSO time series to predict TC statistics under the chosen climate scenario:
[9]:
em.predict_statistics(ci)
2020-08-07 09:52:57,505 - climada.hazard.emulator.emulator - INFO - Predicting TCs with new climate index dataset...
Draw samples according to climate scenario¶
The emulator can now be used to sample hypothetical events within an arbitrary time period covered by the climate index time series used above:
[10]:
draws = em.draw_realizations(100, (2020, 2050))
2020-08-07 09:52:57,527 - climada.hazard.emulator.emulator - INFO - Drawing 100 realizations for period (2020, 2050)
2020 ... 2050 ... 2050
The returned object draws
is a DataFrame
with each row corresponding to a storm event from the hazard pool windfields_pool
(see above): The column real_id
assigns one of 100 realizations to each of the events while the columns id
and name
are the unique ID and name used in windfields_pool
to identify this hazard event. The column year
indicates the year in which the event would occur under the hypothetical climate scenario and will usually differ from the date
associated with the event in windfields_pool
.
[11]:
print(draws[:25])
id name year real_id
0 29344 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-6076 2020 0
1 28893 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-5272 2020 0
2 50512 Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-14879 2020 0
3 210 Trial1_GB_dkmiroc_20thcal_N_0360as.mat-328 2020 0
4 30111 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-7503 2020 0
5 27807 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-3499 2020 0
6 25513 Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-27994 2020 0
7 8970 Trial1_GB_dkmiroc_20thcal_N_0360as.mat-16451 2020 1
8 41882 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-28187 2020 1
9 33033 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-12694 2020 1
10 13170 Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-7212 2020 1
11 37879 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-21292 2020 1
12 1252 Trial1_GB_dkmiroc_20thcal_N_0360as.mat-2161 2020 1
13 20287 Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-19475 2020 2
14 30980 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-9074 2020 2
15 32587 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-11892 2020 2
16 56887 Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-25691 2020 2
17 42985 Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-1651 2020 2
18 54592 Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-21784 2020 2
19 2511 Trial1_GB_dkmiroc_20thcal_N_0360as.mat-4481 2020 3
20 4082 Trial1_GB_dkmiroc_20thcal_N_0360as.mat-7410 2020 3
21 9685 Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-960 2020 3
22 50617 Trial1_GB_dkmiroc_rcp85cal_N_0360as.mat-15050 2020 3
23 20518 Trial1_GB_dkmiroc_rcp26cal_N_0360as.mat-19822 2020 3
24 41643 Trial1_GB_dkmiroc_rcp60cal_N_0360as.mat-27806 2020 3