Unsequa - a module for uncertainty and sensitivity analysis#

This is a tutorial for the unsequa module in CLIMADA. A detailled description can be found in Kropf (2021).

Table of Contents

Uncertainty and sensitivity analysis#

Before doing an uncertainty quantification in CLIMADA, it is imperative that you get first comfortable with the different notions of uncertainty in the modelling world (see e.g.Pianosi (2016) or Douglas-Smith(2020) for a review). In particular, note that the uncertaintity values will only be as good as the input from the user. In addition, not all uncertainties can be numerically quantified, and even worse, some unkonwns are unknown. This means that sometimes, quantifying uncertainty can lead to false confidence in the output!. For a more philosophical discussion about the types of uncertainties in climate research see Knüsel (2020) and Otth (2022).

In this module, it is possible to perform global uncertainty analysis, as well as a sensitivity analysis. The word global is meant as opposition to the ‘one-factor-at-a-time’ (OAT) strategy. The OAT strategy, which consists in analyzing the effect of varying one model input factor at a time while keeping all other fixed, is popular among modellers, but has major shortcomings Saltelli (2010), Saltelli(2019) and should not be used.

A rough schemata of how to perform uncertainty and sensitivity analysis (taken from Kropf(2021))

Uncertainty and Sensitivity analysis
  1. Kropf, C.M. et al. Uncertainty and sensitivity analysis for global probabilistic weather and climate risk modelling: an implementation in the CLIMADA platform (2021)

  2. Pianosi, F. et al. Sensitivity analysis of environmental models: A systematic review with practical workflow. Environmental Modelling & Software 79, 214–232 (2016). 3.Douglas-Smith, D., Iwanaga, T., Croke, B. F. W. & Jakeman, A. J. Certain trends in uncertainty and sensitivity analysis: An overview of software tools and techniques. Environmental Modelling & Software 124, 104588 (2020)

  3. Knüsel, B. Epistemological Issues in Data-Driven Modeling in Climate Research. (ETH Zurich, 2020)

  4. Saltelli, A. et al. Why so many published sensitivity analyses are false: A systematic review of sensitivity analysis practices. Environmental Modelling & Software 114, 29–39 (2019)

  5. Saltelli, A. & Annoni, P. How to avoid a perfunctory sensitivity analysis. Environmental Modelling & Software 25, 1508–1517 (2010)

Unsequa Module Structure#

The unsequa module contains several key classes.

The model input parameters and their distribution are specified as

  • InputVar: defines input uncertainty variables

The input parameter sampling, Monte-Carlo uncertainty distribution calculation and the sensitivity index computation are done in

  • CalcImpact: compute uncertainties for outputs of climada.engine.impact.calc (child class of Calc)

  • CalcCostBenefit: compute uncertainties for outputs of climada.engine.cost_benefit.calc (child class of Calc)

The results are stored in

  • UncOutput: store the uncertainty and sensitivity analysis results. Contains also several plotting methods. This is a class which only stores data.

  • UncImpactOutput: subclass with dataframes specifically for climada.engine.impact.calc uncertainty and sensitivity analysis results.

  • UncCostBenefitOutput: subclass with dataframes specifically for climada.engine.cost_benefit.calc uncertainty and sensitivity analysis results.

InputVar#

The InputVar class is used to define uncertainty variables.

Attribute

Type

Description

func

function

Model variable defined as a function of the uncertainty input parameters

distr_dict

dict

Dictionary of the probability density distributions of the uncertainty input parameters

An input uncertainty parameter is a numerical input value that has a certain probability density distribution in your model, such as the total exposure asset value, the slope of the vulnerability function, the exponents of the litpop exposure, the value of the discount rate, the cost of an adaptation measure, …

The probability densitity distributions (values of distr_dict) of the input uncertainty parameters (keyword arguments of the func and keys of the distr_dict) can be any of the ones defined in scipy.stats.

Several helper methods exist to make generic InputVar for Exposures, ImpactFuncSet, Hazard, Entity (including DiscRates and Measures). These are described in details in the tutorial Helper methods for InputVar. These are a good bases for your own computations.

Example - custom continuous uncertainty parameter#

Suppose we assume that the GDP value used to scale the exposure has a relative error of +-10%.

import warnings
warnings.filterwarnings('ignore') #Ignore warnings for making the tutorial's pdf. 

#Define the base exposure
from climada.util.constants import EXP_DEMO_H5
from climada.entity import Exposures
exp_base = Exposures.from_hdf5(EXP_DEMO_H5)
2023-08-03 13:57:46,512 - climada.entity.exposures.base - INFO - Reading /Users/ckropf/climada/demo/data/exp_demo_today.h5
# Define the function that returns an exposure with scaled total assed value
# Here x_exp is the input uncertainty parameter and exp_func the inputvar.func.
def exp_func(x_exp, exp_base=exp_base):
    exp = exp_base.copy()
    exp.gdf.value *= x_exp
    return exp
# Define the Uncertainty Variable with +-10% total asset value
# The probability density distribution of the input uncertainty parameter x_exp is sp.stats.uniform(0.9, 0.2)
from climada.engine.unsequa import InputVar
import scipy as sp

exp_distr = {"x_exp": sp.stats.uniform(0.9, 0.2),
              }
exp_iv = InputVar(exp_func, exp_distr)
# Uncertainty parameters
exp_iv.labels
['x_exp']
# Evaluate for a given value of the uncertainty parameters
exp095 = exp_iv.func(x_exp = 0.95)
print(f"Base value is {exp_base.gdf['value'].sum()}, and the value for x_exp=0.95 is {exp095.gdf['value'].sum()}")
Base value is 657053294559.9105, and the value for x_exp=0.95 is 624200629831.9148
# Defined distribution
exp_iv.plot(figsize=(5, 3));
../_images/c00649d4894ab368b5ac00022584cc8b1860bc90b70e4172485a2bc46845f2ac.png

Example - custom categorical uncertainty parameter#

Suppose we want to test different exponents (m=1,2 ; n=1,2) for the LitPop exposure for the country Switzerland.

from climada.entity import LitPop

m_min, m_max = (1, 2)
n_min, n_max = (1, 2)

# Define the function
# Note that this here works, but might be slow because the method LitPop is called everytime the the function
# is evaluated, and LitPop is relatively slow. 
def litpop_cat(m, n):
    exp = Litpop.from_countries('CHE', res_arcsec=150, exponent=[m, n])
    return exp
# A faster method would be to first create a dictionnary with all the exposures. This however
# requires more memory and precomputation time (here ~3-4mins)
exp = LitPop()
litpop_dict = {}
for m in range(m_min, m_max + 1):
    for n in range(n_min, n_max + 1):
        exp_mn = LitPop.from_countries('CHE', res_arcsec=150, exponents=[m, n]);
        litpop_dict[(m, n)] = exp_mn
        
def litpop_cat(m, n, litpop_dict=litpop_dict):
    return litpop_dict[(m, n)]
#Define the distribution dictionnary
import scipy as sp
from climada.engine.unsequa import InputVar

distr_dict = {
    'm': sp.stats.randint(low=m_min, high=m_max+1),
    'n': sp.stats.randint(low=n_min, high=n_max+1)
    }

cat_iv = InputVar(litpop_cat, distr_dict) # One can use either of the above definitions of litpop_cat
# Uncertainty parameters
cat_iv.labels
['m', 'n']
cat_iv.evaluate(m=1, n=2).plot_raster();
2023-08-03 13:57:57,563 - climada.util.coordinates - INFO - Raster from resolution 0.04166666000000063 to 0.04166666000000063.
../_images/d7ce64b89e6faf4d68208e16aad5e692883707d51c7a56575ebd1fb75013c11f.png
cat_iv.plot(figsize=(10, 3));
../_images/8e9ac3840e3e4dbb1546c1011b850ffe3e62e5d4667f0ef976348c09f39f9c2a.png

UncOutput#

The UncOutput class is used to store data from sampling, uncertainty and sensitivity analysis. An UncOutput object can be saved and loaded from .hdf5. The classes UncImpactOuput and UncCostBenefitOutput are extensions of UncOutput specific for CalcImpact and CalcCostBenefit, respectively.

Data attributes

Attribute

Type

Description

samples_df

pandas.dataframe

Each row represents a sample obtained from the input parameters (one per column) distributions

UncImpactOutput

aai_agg_unc_df

pandas.dataframe

Uncertainty data for aai_agg

freq_curve_unc_df

pandas.dataframe

Uncertainty data for freq_curve. One return period per column.

eai_exp_unc_df

pandas.dataframe

Uncertainty data for eai_exp. One exposure point per column.

at_event_unc_df

pandas.dataframe

Uncertainty data for at_event. One event per column.

UncCostBenefitOutput

imp_meas_present_unc_df

pandas.dataframe

Uncertainty data for imp_meas_present. One measure per column.

imp_meas_future_unc_df

pandas.dataframe

Uncertainty data for imp_meas_present. One measure per column

tot_climate_risk_unc_df

pandas.dataframe

Uncertainty data for tot_climate_risk. One measure per column.

benefit_unc_df

pandas.dataframe

Uncertainty data for benefit. One measure per column.

cost_ben_ratio_unc_df

pandas.dataframe

Uncertainty data for cost_ben_ratio. One measure per column.

cost_benefit_kwargs

dictionary

Keyword arguments for climada.engine.cost_benefit.calc.

Metadata and input data attributes

These attributes are used for book-keeping and characterize the sample, uncertainty and sensitivity data. These attributes are set by the methods from classes CalcImpact and CalcCostBenefit used to generate sample, uncertainty and sensitivity data.

Attribute

Type

Description

sampling_method

str

The sampling method as defined in SALib. Possible choices: ‘saltelli’, ‘fast_sampler’, ‘latin’, ‘morris’, ‘dgsm’, ‘ff’

sampling_kwargs

dict

Keyword arguments for the sampling_method.

n_samples

int

Effective number of samples (number of rows of samples_df)

param_labels

list(str)

Name of all the uncertainty input parameters

problem_sa

dict

The description of the uncertainty variables and their distribution as used in SALib.

sensitivity_method

str

Sensitivity analysis method from SALib.analyse Possible choices: ‘fast’, ‘rbd_fact’, ‘morris’, ‘sobol’, ‘delta’, ‘ff’. Note that in Salib, sampling methods and sensitivity analysis methods should be used in specific pairs.

sensitivity_kwargs

dict

Keyword arguments for sensitivity_method.

unit

str

Unit of the exposures value

Example from file#

Here we show an example loaded from file. In the sections below this class is extensively used and further examples can be found.

# Download the test file from the API
# Requires internet connection
from climada.util.constants import TEST_UNC_OUTPUT_IMPACT
from climada.util.api_client import Client
apiclient = Client()
ds = apiclient.get_dataset_info(name=TEST_UNC_OUTPUT_IMPACT, status='test_dataset')
_target_dir, [filename] = apiclient.download_dataset(ds)
# If you produced your own data, you do not need the API. Just replace 'filename' with the path to your file.
from climada.engine.unsequa import UncOutput
unc_imp = UncOutput.from_hdf5(filename)
2023-08-03 13:58:02,377 - climada.engine.unsequa.unc_output - INFO - Reading /Users/ckropf/climada/data/unc_output/unc_output_impact/test_unc_output_impact/v1/test_unc_output_impact.hdf5
unc_imp.plot_uncertainty(metric_list=['aai_agg'], figsize=(12,5));
../_images/6012f211b8c368a206e62450f2cc9dbcc883c0cd811325e899f285c494d5a9b3.png
# Download the test file from the API
# Requires internet connection
from climada.util.constants import TEST_UNC_OUTPUT_COSTBEN
from climada.util.api_client import Client
apiclient = Client()
ds = apiclient.get_dataset_info(name=TEST_UNC_OUTPUT_COSTBEN, status='test_dataset')
_target_dir, [filename] = apiclient.download_dataset(ds)
# If you produced your own data, you do not need the API. Just replace 'filename' with the path to your file.
from climada.engine.unsequa import UncOutput
unc_cb = UncOutput.from_hdf5(filename)
2023-08-03 13:58:03,127 - climada.engine.unsequa.unc_output - INFO - Reading /Users/ckropf/climada/data/unc_output/unc_output_costben/test_unc_output_costben/v1/test_unc_output_costben.hdf5
unc_cb.get_uncertainty().tail()
Mangroves Benef Beach nourishment Benef Seawall Benef Building code Benef Mangroves CostBen Beach nourishment CostBen Seawall CostBen Building code CostBen no measure - risk - future no measure - risk_transf - future ... Beach nourishment - cost_ins - future Seawall - risk - future Seawall - risk_transf - future Seawall - cost_meas - future Seawall - cost_ins - future Building code - risk - future Building code - risk_transf - future Building code - cost_meas - future Building code - cost_ins - future tot_climate_risk
35 2.375510e+08 1.932608e+08 234557.682554 1.584398e+08 6.347120 10.277239 4.350910e+04 66.742129 6.337592e+08 0.0 ... 1 6.335246e+08 0 1.020539e+10 1 4.753194e+08 0 1.057461e+10 1 6.337592e+08
36 9.272772e+07 7.643803e+07 9554.257314 5.501366e+07 16.260133 25.984286 1.068151e+06 192.217876 2.200547e+08 0.0 ... 1 2.200451e+08 0 1.020539e+10 1 1.650410e+08 0 1.057461e+10 1 2.200547e+08
37 1.464219e+08 1.179927e+08 192531.748810 8.979471e+07 10.297402 16.833137 5.300629e+04 117.764285 3.591788e+08 0.0 ... 1 3.589863e+08 0 1.020539e+10 1 2.693841e+08 0 1.057461e+10 1 3.591788e+08
38 9.376369e+07 7.722882e+07 10681.112247 5.555413e+07 12.965484 20.736269 7.703765e+05 153.475031 2.222165e+08 0.0 ... 1 2.222058e+08 0 8.228478e+09 1 1.666624e+08 0 8.526172e+09 1 2.222165e+08
39 9.376369e+07 7.722882e+07 10681.112247 5.555413e+07 16.080478 25.718218 9.554617e+05 190.347852 2.222165e+08 0.0 ... 1 2.222058e+08 0 1.020539e+10 1 1.666624e+08 0 1.057461e+10 1 2.222165e+08

5 rows × 29 columns

CalcImpact#

Set the InputVars#

In this example, we model the impact function for tropical cyclones on the parametric function suggested in Emanuel (2015) with 4 parameters. The exposures total value varies between 80% and 120%. For that hazard, we assume to have no good error estimate and thus do not define an InputVar for the hazard.

#Define the input variable functions
import numpy as np

from climada.entity import ImpactFunc, ImpactFuncSet, Exposures
from climada.util.constants import EXP_DEMO_H5, HAZ_DEMO_H5
from climada.hazard import Hazard

def impf_func(G=1, v_half=84.7, vmin=25.7, k=3, _id=1):
    
    def xhi(v, v_half, vmin):
        return max([(v - vmin), 0]) / (v_half - vmin)

    def sigmoid_func(v, G, v_half, vmin, k):
        return G * xhi(v, v_half, vmin)**k / (1 + xhi(v, v_half, vmin)**k)

    #In-function imports needed only for parallel computing on Windows
    import numpy as np 
    from climada.entity import ImpactFunc, ImpactFuncSet 
    intensity_unit = 'm/s'
    intensity = np.linspace(0, 150, num=100)
    mdd = np.repeat(1, len(intensity))
    paa = np.array([sigmoid_func(v, G, v_half, vmin, k) for v in intensity])
    imp_fun = ImpactFunc("TC", _id, intensity, mdd, paa, intensity_unit)
    imp_fun.check()
    impf_set = ImpactFuncSet([imp_fun])
    return impf_set

haz = Hazard.from_hdf5(HAZ_DEMO_H5)
exp_base = Exposures.from_hdf5(EXP_DEMO_H5)
#It is a good idea to assign the centroids to the base exposures in order to avoid repeating this
# potentially costly operation for each sample.
exp_base.assign_centroids(haz) 
def exp_base_func(x_exp, exp_base):
    exp = exp_base.copy()
    exp.gdf.value *= x_exp
    return exp
from functools import partial
exp_func = partial(exp_base_func, exp_base=exp_base)
2023-08-03 13:58:03,258 - climada.hazard.base - INFO - Reading /Users/ckropf/climada/demo/data/tc_fl_1990_2004.h5
2023-08-03 13:58:03,286 - climada.entity.exposures.base - INFO - Reading /Users/ckropf/climada/demo/data/exp_demo_today.h5
2023-08-03 13:58:03,306 - climada.entity.exposures.base - INFO - Matching 50 exposures with 2500 centroids.
2023-08-03 13:58:03,307 - climada.util.coordinates - INFO - No exact centroid match found. Reprojecting coordinates to nearest neighbor closer than the threshold = 100
# Visualization of the parametrized impact function
impf_func(G=0.8, v_half=80, vmin=30,k=5).plot();
../_images/e6db4a7073ea11a7a7e555754d35e2a578164c3211523a94dae9ff6fe50dc1c1.png
#Define the InputVars

import scipy as sp
from climada.engine.unsequa import InputVar

exp_distr = {"x_exp": sp.stats.beta(10, 1.1)} #This is not really a reasonable distribution but is used 
                                              #here to show that you can use any scipy distribution.
              
exp_iv = InputVar(exp_func, exp_distr)

impf_distr = {
    "G": sp.stats.truncnorm(0.5, 1.5),
    "v_half": sp.stats.uniform(35, 65),
    "vmin": sp.stats.uniform(0, 15),
    "k": sp.stats.uniform(1, 4)
    }
impf_iv = InputVar(impf_func, impf_distr)
import matplotlib.pyplot as plt
ax = exp_iv.plot(figsize=(6,4));
plt.yticks(fontsize=16);
plt.xticks(fontsize=16);
../_images/6ed1f5702b76383d5eb1fd0a30c5bbdb675e802cbc2028ddd343eab1191a9a6b.png

Compute uncertainty and sensitivity using default methods#

First, we define the UncImpact object with our uncertainty variables.

from climada.engine.unsequa import CalcImpact

calc_imp = CalcImpact(exp_iv, impf_iv, haz)

Next, we generate samples for the uncertainty parameters using the default methods. Note that depending on the chosen Salib method, the effective number of samples differs from the input variable N. For the default ‘saltelli’, with calc_second_order=True, the effective number is N(2D+2), with D the number of uncertainty parameters. See SAlib for more information.

output_imp = calc_imp.make_sample(N=2**7, sampling_kwargs={'skip_values': 2**8})
output_imp.get_samples_df().tail()
2023-08-03 13:58:05,110 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 1536
x_exp G v_half vmin k
1531 0.876684 1.242977 53.662109 2.080078 4.539062
1532 0.876684 0.790617 44.013672 2.080078 4.539062
1533 0.876684 0.790617 53.662109 13.681641 4.539062
1534 0.876684 0.790617 53.662109 2.080078 3.960938
1535 0.876684 0.790617 53.662109 2.080078 4.539062

The resulting samples can be visualized in plots.

output_imp.plot_sample(figsize=(15,8));
../_images/f8bcb11f39c15b9970d3aef90cb488ef6dccc3fea0f50c31121dae6bd015d6ca.png

Now we can compute the value of the impact metrics for all the samples. In this example, we additionaly chose to restrict the return periods 50, 100, and 250 years. By default, eai_exp and at_event are not stored.

output_imp = calc_imp.uncertainty(output_imp, rp = [50, 100, 250])
2023-08-03 13:58:06,602 - climada.entity.exposures.base - INFO - Exposures matching centroids already found for TC
2023-08-03 13:58:06,604 - climada.engine.impact_calc - INFO - Calculating impact for 250 assets (>0) and 216 events.
2023-08-03 13:58:06,605 - climada.engine.impact_calc - INFO - cover and/or deductible columns detected, going to calculate insured impact
2023-08-03 13:58:06,611 - climada.engine.impact - WARNING - The Impact.tot_value attribute is deprecated.Use Exposures.affected_total_value to calculate the affected total exposure value based on a specific hazard intensity threshold
2023-08-03 13:58:06,612 - climada.engine.unsequa.calc_base - INFO - 

Estimated computaion time: 0:00:21.811200

The distributions of metrics ouputs are stored as dictionaries of pandas dataframe. The metrics are directly taken from the output of climada.impact.calc. For each metric, on dataframe is made.

#All the computed uncertainty metrics attribute
output_imp.uncertainty_metrics
['aai_agg', 'freq_curve', 'tot_value']
#One uncertainty dataframe
output_imp.get_unc_df('aai_agg').tail()
aai_agg
1531 2.905571e+09
1532 3.755172e+09
1533 1.063119e+09
1534 2.248718e+09
1535 1.848139e+09

Accessing the uncertainty is in general done via the method get_uncertainty(). If none are specified, all metrics are returned.

output_imp.get_uncertainty().tail()
aai_agg rp50 rp100 rp250 tot_value
1531 2.905571e+09 8.324391e+10 1.162643e+11 1.510689e+11 5.760281e+11
1532 3.755172e+09 1.096005e+11 1.460838e+11 1.809413e+11 5.760281e+11
1533 1.063119e+09 2.892734e+10 4.720869e+10 6.807561e+10 5.760281e+11
1534 2.248718e+09 6.468855e+10 8.653474e+10 1.085266e+11 5.760281e+11
1535 1.848139e+09 5.294874e+10 7.395191e+10 9.609003e+10 5.760281e+11

The distributions of the one-dimensioanl metrics (eai_exp and at_event are never shown with this method) can be vizualised with plots.

output_imp.plot_uncertainty(figsize=(12,12));
../_images/fc39b95270f6ede0278cf981673671a2150d5594375daa53cca8f9ae53430309.png
# Specific plot for the return period distributions
output_imp.plot_rp_uncertainty(figsize=(14.3,8));
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
../_images/7b225719469cac5e0b206a045cf22a1be5f5079f5ec4bfdbba981c2fe8e4a5bb.png

Now that a distribution of the impact metrics has been computed for each sample, we can also compute the sensitivity indices for each metrics to each uncertainty parameter. Note that the chosen method for the sensitivity analysis should correpond to its sampling partner as defined in the SAlib package.

The sensitivity indices dictionnaries outputs from the SAlib methods are stored in the same structure of nested dictionnaries as the metrics distributions. Note that depending on the chosen sensitivity analysis method the returned indices dictionnary will return specific types of sensitivity indices with specific names. Please get familiar with SAlib for more information.

Note that in our case, several of the second order sensitivity indices are negative. For the default method sobol, this indicates that the algorithm has not converged and cannot give realiable values for these sensitivity indices. If this happens, please use a larger number of samples. Here we will focus on the first-order indices.

output_imp = calc_imp.sensitivity(output_imp)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)

Similarly to the uncertainty case, the data is stored in dataframe attributes.

output_imp.sensitivity_metrics
['aai_agg', 'freq_curve', 'tot_value']
output_imp.get_sens_df('aai_agg').tail()
si param param2 aai_agg
65 S2_conf k x_exp NaN
66 S2_conf k G NaN
67 S2_conf k v_half NaN
68 S2_conf k vmin NaN
69 S2_conf k k NaN

To obtain the sensitivity interms of a particular sensitivity index, use the method get_sensisitivity(). If none is specified, the value of the index for all metrics is returned.

output_imp.get_sensitivity('S1')
si param param2 aai_agg rp50 rp100 rp250 tot_value
0 S1 x_exp None 0.001040 0.000993 0.000930 0.001150 1.005253
1 S1 G None 0.073408 0.075781 0.084662 0.093718 0.000000
2 S1 v_half None 0.514220 0.553640 0.596659 0.619366 0.000000
3 S1 vmin None 0.012642 0.014407 0.012068 0.010065 0.000000
4 S1 k None 0.213491 0.189862 0.134867 0.095861 0.000000

Sometimes, it is useful to simply know what is the largest sensitivity index for each metric.

output_imp.get_largest_si(salib_si='S1')
metric param param2 si
0 aai_agg v_half None 0.514220
1 rp50 v_half None 0.553640
2 rp100 v_half None 0.596659
3 rp250 v_half None 0.619366
4 tot_value x_exp None 1.005253

The value of the sensitivity indices can be plotted for each metric that is one-dimensional (eai_exp and at_event are not shown in this plot).

We see that both the errors in freq_curve and in aai_agg are mostly determined by x_exp and v_half. Finally, we see small differences in the sensitivity of the different return periods.

# Default for 'sobol' is to plot 'S1' sensitivity index.
output_imp.plot_sensitivity(figsize=(12,8));
../_images/acb89b723ade9869f5d3bb5d051423bfed2efcae55aad64e56eb9fd6f5ae6a86.png

Note that since we have quite a few measures, the imp_meas_fut and imp_meas_pres plots are too crowded. We can select only the other metrics easily. In addition, instead of showing first order sensitivity ‘S1’, we can plot the total sensitivity ‘ST’.

output_imp.plot_sensitivity(salib_si = 'ST', figsize=(12,8));
../_images/5131e1cb28591c0914167df562f47ab25da0750950e75cd32d8ff965a039c2ab.png

One can also vizualise the second-order sensitivity indices in the form of a correlation matrix.

output_imp.plot_sensitivity_second_order(figsize=(12,8));
../_images/2088841082f8f56fc6b27ac463786ed7a57d001dd3c06e2e35a5641528fb52e3.png

A few non-default parameters#

We shall use the same uncertainty variables as in the previous section but show a few possibilities to use non-default method arguments.

# Sampling method "latin" hypercube instead of `saltelli`.
from climada.engine.unsequa import CalcImpact

calc_imp2 = CalcImpact(exp_iv, impf_iv, haz)
output_imp2 = calc_imp2.make_sample(N=1000, sampling_method='latin')
2023-08-03 13:58:25,948 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 1000
output_imp2.plot_sample(figsize=(15,8));
../_images/b0d37013375104ba66343e7a6ce4d88649d8733debb6b5a5ad18a38dbf2cff51.png
# Compute also the distribution of the metric `eai_exp`
# To speed-up the comutations, we can use more than one process 
# Note that for large dataset a single process might be more efficient
import time

calc_imp2 = CalcImpact(exp_iv, impf_iv, haz)
output_imp2 = calc_imp2.make_sample(N=1000, sampling_method='latin')

start = time.time()
output_imp2 = calc_imp2.uncertainty(output_imp2, rp = [50, 100, 250], calc_eai_exp=True, calc_at_event=True, processes=4)
end = time.time()
time_passed = end-start
print(f'Time passed with pool: {time_passed}')
2023-08-03 13:58:29,176 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 1000
2023-08-03 13:58:29,183 - climada.entity.exposures.base - INFO - Exposures matching centroids already found for TC
2023-08-03 13:58:29,186 - climada.engine.impact_calc - INFO - Calculating impact for 250 assets (>0) and 216 events.
2023-08-03 13:58:29,187 - climada.engine.impact_calc - INFO - cover and/or deductible columns detected, going to calculate insured impact
2023-08-03 13:58:29,199 - climada.engine.impact - WARNING - The Impact.tot_value attribute is deprecated.Use Exposures.affected_total_value to calculate the affected total exposure value based on a specific hazard intensity threshold
2023-08-03 13:58:29,200 - climada.engine.unsequa.calc_base - INFO - 

Estimated computaion time: 0:00:05.650000

Time passed with pool: 9.783061027526855
from climada.engine.unsequa import CalcImpact
import time
calc_imp2 = CalcImpact(exp_iv, impf_iv, haz)
output_imp2 = calc_imp2.make_sample(N=1000, sampling_method='latin')

start2 = time.time()
output_imp2 = calc_imp2.uncertainty(output_imp2, rp = [50, 100, 250], calc_eai_exp=True, calc_at_event=True)
end2 = time.time()
time_passed_nopool = end2-start2
print(f'Time passed without pool: {time_passed_nopool}')
2023-08-03 13:58:40,759 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 1000
2023-08-03 13:58:40,770 - climada.entity.exposures.base - INFO - Exposures matching centroids already found for TC
2023-08-03 13:58:40,773 - climada.engine.impact_calc - INFO - Calculating impact for 250 assets (>0) and 216 events.
2023-08-03 13:58:40,775 - climada.engine.impact_calc - INFO - cover and/or deductible columns detected, going to calculate insured impact
2023-08-03 13:58:40,787 - climada.engine.impact - WARNING - The Impact.tot_value attribute is deprecated.Use Exposures.affected_total_value to calculate the affected total exposure value based on a specific hazard intensity threshold
2023-08-03 13:58:40,788 - climada.engine.unsequa.calc_base - INFO - 

Estimated computaion time: 0:00:26.300000

Time passed without pool: 12.707012176513672
# Add the original value of the impacts (without uncertainty) to the uncertainty plot
from climada.engine import ImpactCalc
imp = ImpactCalc(exp_base, impf_func(), haz).impact(assign_centroids=False)
aai_agg_o = imp.aai_agg
freq_curve_o = imp.calc_freq_curve([50, 100, 250]).impact
orig_list = [aai_agg_o] + list(freq_curve_o) +[1]
2023-08-03 13:58:53,476 - climada.engine.impact_calc - INFO - Calculating impact for 250 assets (>0) and 216 events.
2023-08-03 13:58:53,478 - climada.engine.impact_calc - INFO - cover and/or deductible columns detected, going to calculate insured impact
# plot the aai_agg and freq_curve uncertainty only
# use logarithmic x-scale
output_imp2.plot_uncertainty(metric_list=['aai_agg', 'freq_curve'], orig_list=orig_list, log=True, figsize=(12,8));
../_images/93b28b253e1ea6a6ad3d958c8bdc537f32bebabc4131e2cac8feba4ced4d2f73.png
# Use the method 'rbd_fast' which is recommend in pair with 'latin'. In addition, change one of the kwargs 
# (M=15) of the salib sampling method.
output_imp2 = calc_imp2.sensitivity(output_imp2, sensitivity_method='rbd_fast', sensitivity_kwargs = {'M': 15})

Since we computed the distribution and sensitivity indices for the total impact at each exposure point, we can plot a map of the largest sensitivity index in each exposure location. For every location, the most sensitive parameter is v_half, meaning that the average annual impact at each location is most sensitivity to the ucnertainty in the impact function slope scaling parameter.

output_imp2.plot_sensitivity_map();
../_images/fb5d3b505351af1c7d81a0a3cc2ea44a74d6d39558f9aa8aa6d1f1efe0a701ec.png
output_imp2.get_largest_si(salib_si='S1', metric_list=['eai_exp']).tail()
metric param param2 si
45 45 v_half None 0.472143
46 46 v_half None 0.472143
47 47 v_half None 0.472143
48 48 v_half None 0.467659
49 49 v_half None 0.472143

CalcCostBenefit#

The uncertainty and sensitivity analysis for CostBenefit is completely analogous to the Impact case. It is slightly more complex as there are more input variables.

Set the Input Vars#

import copy
from climada.util.constants import ENT_DEMO_TODAY, ENT_DEMO_FUTURE, HAZ_DEMO_H5
from climada.entity import Entity
from climada.hazard import Hazard

# Entity today has an uncertainty in the total asset value
def ent_today_func(x_ent):
    #In-function imports needed only for parallel computing on Windows
    from climada.entity import Entity 
    from climada.util.constants import ENT_DEMO_TODAY 
    entity = Entity.from_excel(ENT_DEMO_TODAY)
    entity.exposures.ref_year = 2018
    entity.exposures.gdf.value *= x_ent
    return entity

# Entity in the future has a +- 10% uncertainty in the cost of all the adapatation measures
def ent_fut_func(m_fut_cost):
    #In-function imports needed only for parallel computing on Windows
    from climada.entity import Entity 
    from climada.util.constants import ENT_DEMO_FUTURE 
    entity = Entity.from_excel(ENT_DEMO_FUTURE)
    entity.exposures.ref_year = 2040  
    for meas in entity.measures.get_measure('TC'):
        meas.cost *= m_fut_cost
    return entity

haz_base = Hazard.from_hdf5(HAZ_DEMO_H5)
# The hazard intensity in the future is also uncertainty by a multiplicative factor
def haz_fut(x_haz_fut, haz_base):
    #In-function imports needed only for parallel computing on Windows
    import copy 
    from climada.hazard import Hazard 
    from climada.util.constants import HAZ_DEMO_H5 
    haz = copy.deepcopy(haz_base)
    haz.intensity = haz.intensity.multiply(x_haz_fut)
    return haz
from functools import partial
haz_fut_func = partial(haz_fut, haz_base=haz_base)
2023-08-03 14:00:12,145 - climada.hazard.base - INFO - Reading /Users/ckropf/climada/demo/data/tc_fl_1990_2004.h5

Check that costs for measures are changed as desired.

costs_1 = [meas.cost for meas in ent_fut_func(1).measures.get_measure('TC')]
costs_05 = [meas.cost for meas in ent_fut_func(0.5).measures.get_measure('TC')]
print(f"\nThe cost for m_fut_cost=1 are {costs_1}\n"
     f"The cost for m_fut_cost=0.5 are {costs_05}");
The cost for m_fut_cost=1 are [1311768360.8515418, 1728000000.0, 8878779433.630093, 9200000000.0]
The cost for m_fut_cost=0.5 are [655884180.4257709, 864000000.0, 4439389716.815046, 4600000000.0]

Define the InputVars

import scipy as sp
from climada.engine.unsequa import InputVar

haz_today = haz_base

haz_fut_distr = {"x_haz_fut": sp.stats.uniform(1, 3),
              }
haz_fut_iv = InputVar(haz_fut_func, haz_fut_distr)

ent_today_distr = {"x_ent":  sp.stats.uniform(0.7, 1)}
ent_today_iv = InputVar(ent_today_func, ent_today_distr)

ent_fut_distr = {"m_fut_cost":  sp.stats.norm(1, 0.1)}
ent_fut_iv = InputVar(ent_fut_func, ent_fut_distr)
ent_avg = ent_today_iv.evaluate()
ent_avg.exposures.gdf.head()
latitude longitude value deductible cover impf_TC Value_2010
0 26.933899 -80.128799 1.671301e+10 0 1.392750e+10 1 5.139301e+09
1 26.957203 -80.098284 1.511528e+10 0 1.259606e+10 1 4.647994e+09
2 26.783846 -80.748947 1.511528e+10 0 1.259606e+10 1 4.647994e+09
3 26.645524 -80.550704 1.511528e+10 0 1.259606e+10 1 4.647994e+09
4 26.897796 -80.596929 1.511528e+10 0 1.259606e+10 1 4.647994e+09

Compute cost benefit uncertainty and sensitivity using default methods#

For examples of how to use non-defaults please see the impact example

from climada.engine.unsequa import CalcCostBenefit

unc_cb = CalcCostBenefit(haz_input_var=haz_today, ent_input_var=ent_today_iv,
                     haz_fut_input_var=haz_fut_iv, ent_fut_input_var=ent_fut_iv)
output_cb= unc_cb.make_sample(N=10, sampling_kwargs={'calc_second_order':False})
output_cb.get_samples_df().tail()
2023-08-03 14:00:13,114 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 50
x_ent x_haz_fut m_fut_cost
45 1.35625 2.96875 0.813727
46 1.04375 2.96875 0.813727
47 1.35625 2.03125 0.813727
48 1.35625 2.96875 0.899001
49 1.04375 2.03125 0.899001

For longer computations, it is possible to use a pool for parallel computation.

#without pool
output_cb = unc_cb.uncertainty(output_cb)

#with pool
output_cb = unc_cb.uncertainty(output_cb, processes=4)

The output of CostBenefit.calc is rather complex in its structure. The metrics dictionary inherits this complexity.

#Top level metrics keys
macro_metrics = output_cb.uncertainty_metrics
macro_metrics
['imp_meas_present',
 'imp_meas_future',
 'tot_climate_risk',
 'benefit',
 'cost_ben_ratio']
# The benefits and cost_ben_ratio are available for each measure
output_cb.get_uncertainty(metric_list=['benefit', 'cost_ben_ratio']).tail()
Mangroves Benef Beach nourishment Benef Seawall Benef Building code Benef Mangroves CostBen Beach nourishment CostBen Seawall CostBen Building code CostBen
45 8.670468e+09 6.722992e+09 6.214684e+08 3.926190e+10 0.123110 0.209151 11.625533 0.190676
46 8.549601e+09 6.624301e+09 6.214684e+08 3.920155e+10 0.124850 0.212267 11.625533 0.190969
47 1.455086e+10 1.152385e+10 6.206260e+07 1.901856e+10 0.073358 0.122018 116.413127 0.393631
48 8.670468e+09 6.722992e+09 6.214684e+08 3.926190e+10 0.136011 0.231069 12.843826 0.210657
49 1.443000e+10 1.142516e+10 6.206260e+07 1.895821e+10 0.081724 0.135970 128.612593 0.436265
# The impact_meas_present and impact_meas_future provide values of the cost_meas, risk_transf, risk, 
# and cost_ins for each measure
output_cb.get_uncertainty(metric_list=['imp_meas_present']).tail()
no measure - risk - present no measure - risk_transf - present no measure - cost_meas - present no measure - cost_ins - present Mangroves - risk - present Mangroves - risk_transf - present Mangroves - cost_meas - present Mangroves - cost_ins - present Beach nourishment - risk - present Beach nourishment - risk_transf - present Beach nourishment - cost_meas - present Beach nourishment - cost_ins - present Seawall - risk - present Seawall - risk_transf - present Seawall - cost_meas - present Seawall - cost_ins - present Building code - risk - present Building code - risk_transf - present Building code - cost_meas - present Building code - cost_ins - present
45 1.040893e+08 0.0 0 0 5.197409e+07 0 1.311768e+09 1 6.153578e+07 0 1.728000e+09 1 1.040893e+08 0 8.878779e+09 1 7.806698e+07 0 9.200000e+09 1
46 8.010560e+07 0.0 0 0 3.999849e+07 0 1.311768e+09 1 4.735703e+07 0 1.728000e+09 1 8.010560e+07 0 8.878779e+09 1 6.007920e+07 0 9.200000e+09 1
47 1.040893e+08 0.0 0 0 5.197409e+07 0 1.311768e+09 1 6.153578e+07 0 1.728000e+09 1 1.040893e+08 0 8.878779e+09 1 7.806698e+07 0 9.200000e+09 1
48 1.040893e+08 0.0 0 0 5.197409e+07 0 1.311768e+09 1 6.153578e+07 0 1.728000e+09 1 1.040893e+08 0 8.878779e+09 1 7.806698e+07 0 9.200000e+09 1
49 8.010560e+07 0.0 0 0 3.999849e+07 0 1.311768e+09 1 4.735703e+07 0 1.728000e+09 1 8.010560e+07 0 8.878779e+09 1 6.007920e+07 0 9.200000e+09 1

We can plot the distributions for the top metrics or our choice.

# tot_climate_risk and benefit
output_cb.plot_uncertainty(metric_list=['benefit'], figsize=(12,8));
../_images/c8782c21520d00355c153827ae2ef0ce7f1bc1cb8a1c62a480ba757d2b7c3364.png

Analogously to the impact example, now that we have a metric distribution, we can compute the sensitivity indices. Since we used the default sampling method, we can use the default sensitivity analysis method. However, since we used calc_second_order = False for the sampling, we need to specify the same for the sensitivity analysis.

output_cb = unc_cb.sensitivity(output_cb, sensitivity_kwargs={'calc_second_order':False})
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/site-packages/pandas/core/dtypes/common.py:1687: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
  npdtype = np.dtype(dtype)

The sensitivity indices can be plotted. For the default method ‘sobol’, by default the ‘S1’ sensitivity index is plotted.

Note that since we have quite a few measures, the plot must be adjusted a bit or dropped. Also see that for many metrics, the sensitivity to certain uncertainty parameters appears to be 0. However, this result is to be treated with care. Indeed, we used for demonstration purposes a rather too low number of samples, which is indicated by large confidence intervals (vertical black lines) for most sensitivity indices. For a more robust result the analysis should be repeated with more samples.

#plot only certain metrics
axes = output_cb.plot_sensitivity(metric_list=['cost_ben_ratio','tot_climate_risk','benefit'], figsize=(12,8));
../_images/a0188aa021267566a2f830f684ff302c959ba6d906ae584dec36b158e57d7b65.png

Advanced examples#

Coupled variables#

In this example, we show how you can define correlated input variables. Suppose your exposures and hazards are conditioned on the same Shared Socio-economic Pathway (SSP). Then, you want that only exposures and hazard belonging to the same SSP are present in each sample.

In order to achieve this, you must simply define an uncertainty parameter that shares the same name and the same distribution for both the exposures and the hazard uncertainty variables.

Many scenarios of hazards and exposures#

In this example we look at the case where many scenarios are tested in the uncertainty analysis. For instance, suppose you have data for different Shared Socio-economic Pathways (SSP) and different Climate Change projections. From the SSPs, you have a number of Exposures, saved to files. From the climate projections, you have a number of Hazards, saved to file.

The task is to sample from the SSPs and the Climate change scenarios for the uncertainty and sensitivity analysis efficiently.

For demonstration purposes, we will use below as exposures files the litpop for three countries, and for tha hazard files the winter storms for the same three countries. Instead of having SSPs, we now want to only combine exposures and hazards of the same countries.

from climada.util.api_client import Client
client = Client()
def get_litpop(iso):
    return client.get_litpop(country=iso)


def get_ws(iso):
    properties = {
        'country_iso3alpha': iso,
    }
    return client.get_hazard('storm_europe', properties=properties)
#Define list of exposures and/or of hazard files

exp_list = [get_litpop(iso) for iso in ['CHE', 'DEU', 'ITA']]
haz_list = [get_ws(iso) for iso in ['CHE', 'DEU', 'ITA']]
for exp, haz in zip(exp_list, haz_list):
    exp.gdf['impf_WS'] = 1
    exp.assign_centroids(haz)
#Define the input variable 
from climada.entity import ImpactFuncSet, Exposures
from climada.entity.impact_funcs.storm_europe import ImpfStormEurope
from climada.hazard import Hazard
from climada.engine.unsequa import InputVar
import scipy as sp
import copy

def exp_func(cnt, x_exp, exp_list=exp_list):
    exp = exp_list[int(cnt)].copy()
    exp.gdf.value *= x_exp
    return exp

exp_distr = {"x_exp": sp.stats.uniform(0.9, 0.2),
             "cnt": sp.stats.randint(low=0, high=len(exp_list))  #use the same parameter name accross input variables
              }
exp_iv = InputVar(exp_func, exp_distr)


def haz_func(cnt, i_haz, haz_list=haz_list):
    haz = copy.deepcopy(haz_list[int(cnt)])  #use the same parameter name accross input variables 
    haz.intensity *= i_haz
    return haz

haz_distr = {"i_haz": sp.stats.norm(1, 0.2),
             "cnt": sp.stats.randint(low=0, high=len(haz_list))
              }
haz_iv = InputVar(haz_func, haz_distr)

impf = ImpfStormEurope.from_schwierz()
impf_set = ImpactFuncSet()
impf_set.append(impf)
impf_iv = InputVar.impfset([impf_set], bounds_mdd = [0.9, 1.1])
from climada.engine.unsequa import CalcImpact

calc_imp = CalcImpact(exp_iv, impf_iv, haz_iv)
2023-08-04 16:34:35,948 - climada.engine.unsequa.calc_base - WARNING - 

The input parameter cnt is shared among at least 2 input variables. Their uncertainty is thus computed with the same samples for this input paramter.
output_imp = calc_imp.make_sample(N=2**2, sampling_kwargs={'skip_values': 2**3})
2023-08-04 16:34:35,987 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 40
# as we can see, there is only a single input parameter "cnt" to select the country for both the exposures and the hazard
output_imp.samples_df.tail()
x_exp cnt MDD i_haz
35 0.9875 0.0 1.0375 1.097755
36 1.0625 1.0 1.0375 1.097755
37 1.0625 0.0 0.9375 1.097755
38 1.0625 0.0 1.0375 1.097755
39 1.0625 0.0 1.0375 1.097755
output_imp = calc_imp.uncertainty(output_imp)
2023-08-04 16:34:36,224 - climada.entity.exposures.base - INFO - Exposures matching centroids already found for WS
2023-08-04 16:34:36,227 - climada.engine.impact_calc - INFO - Calculating impact for 8397 assets (>0) and 4260 events.
2023-08-04 16:34:37,042 - climada.engine.unsequa.calc_base - INFO - 

Estimated computaion time: 0:00:10.300000
output_imp.aai_agg_unc_df.tail()
aai_agg
35 1.254818e+07
36 5.340193e+08
37 1.219989e+07
38 1.350121e+07
39 1.350121e+07

Input variable: Repeated loading of files made efficient#

Loading Hazards or Exposures from file is a rather lengthy operation. Thus, we want to minimize the reading operations, ideally reading each file only once. Simultaneously, Hazard and Exposures can be large in memory, and thus we would like to have at most one of each loaded at a time. Thus, we do not want to use the list capacity from the helper method InputVar.exposures and InputVar.hazard.

For demonstration purposes, we will use below as exposures files the litpop for three countries, and for tha hazard files the winter storms for the same three countries. Note that this does not make a lot of sense for an uncertainty analysis. For your use case, please replace the set of exposures and/or hazard files with meaningful sets, for instance sets of exposures for different resolutions or hazards for different model runs.

from climada.util.api_client import Client
client = Client()
def get_litpop_path(iso):
    properties = {
        'country_iso3alpha': iso,
        'res_arcsec': '150',
        'exponents': '(1,1)',
        'fin_mode': 'pc'
    }
    litpop_datasets = client.list_dataset_infos(data_type='litpop', properties=properties)
    ds = litpop_datasets[0]
    download_dir, ds_files = client.download_dataset(ds)
    return ds_files[0]

def get_ws_path(iso):
    properties = {
        'country_iso3alpha': iso,
    }
    hazard_datasets = client.list_dataset_infos(data_type='storm_europe', properties=properties)
    ds = hazard_datasets[0]
    download_dir, ds_files = client.download_dataset(ds)
    return ds_files[0]
#Define list of exposures and/or of hazard files

f_exp_list = [get_litpop_path(iso) for iso in ['CHE', 'DEU', 'ITA']]
f_haz_list = [get_ws_path(iso) for iso in ['CHE', 'DEU', 'ITA']]
#Define the input variable for the loading files
#The trick is to not reload a file if it is already in memory. This is done using a global variable.
from climada.entity import ImpactFunc, ImpactFuncSet, Exposures
from climada.hazard import Hazard
from climada.engine.unsequa import InputVar
import scipy as sp
import copy

def exp_func(f_exp, x_exp, filename_list=f_exp_list):
    filename = filename_list[int(f_exp)]
    global exp_base
    if 'exp_base' in globals():
        if isinstance(exp_base, Exposures):
            if exp_base.gdf.filename != str(filename):
                exp_base = Exposures.from_hdf5(filename)
                exp_base.gdf.filename = str(filename)
    else:
        exp_base = Exposures.from_hdf5(filename)
        exp_base.gdf.filename = str(filename)

    exp = exp_base.copy()
    exp.gdf.value *= x_exp
    return exp

exp_distr = {"x_exp": sp.stats.uniform(0.9, 0.2),
             "f_exp": sp.stats.randint(low=0, high=len(f_exp_list))
              }
exp_iv = InputVar(exp_func, exp_distr)


def haz_func(f_haz, i_haz, filename_list=f_haz_list):
    filename = filename_list[int(f_haz)]
    global haz_base
    if 'haz_base' in globals():
        if isinstance(haz_base, Hazard):
            if haz_base.filename != str(filename):
                haz_base = Hazard.from_hdf5(filename)
                haz_base.filename = str(filename)
    else:
        haz_base = Hazard.from_hdf5(filename)
        haz_base.filename = str(filename)

    haz = copy.deepcopy(haz_base)
    haz.intensity *= i_haz
    return haz

haz_distr = {"i_haz": sp.stats.norm(1, 0.2),
             "f_haz": sp.stats.randint(low=0, high=len(f_haz_list))
              }
haz_iv = InputVar(haz_func, haz_distr)


def impf_func(G=1, v_half=84.7, vmin=25.7, k=3, _id=1):
    
    def xhi(v, v_half, vmin):
        return max([(v - vmin), 0]) / (v_half - vmin)

    def sigmoid_func(v, G, v_half, vmin, k):
        return G * xhi(v, v_half, vmin)**k / (1 + xhi(v, v_half, vmin)**k)

    #In-function imports needed only for parallel computing on Windows
    import numpy as np 
    from climada.entity import ImpactFunc, ImpactFuncSet 
    imp_fun = ImpactFunc()
    imp_fun.haz_type = 'WS'
    imp_fun.id = _id
    imp_fun.intensity_unit = 'm/s'
    imp_fun.intensity = np.linspace(0, 150, num=100)
    imp_fun.mdd = np.repeat(1, len(imp_fun.intensity))
    imp_fun.paa = np.array([sigmoid_func(v, G, v_half, vmin, k) for v in imp_fun.intensity])
    imp_fun.check()
    impf_set = ImpactFuncSet()
    impf_set.append(imp_fun)
    return impf_set

impf_distr = {
    "G": sp.stats.truncnorm(0.5, 1.5),
    "v_half": sp.stats.uniform(35, 65),
    "vmin": sp.stats.uniform(0, 15),
    "k": sp.stats.uniform(1, 4)
    }
impf_iv = InputVar(impf_func, impf_distr)
from climada.engine.unsequa import CalcImpact

calc_imp = CalcImpact(exp_iv, impf_iv, haz_iv)
2023-08-04 16:30:48,177 - climada.entity.exposures.base - INFO - Reading /Users/ckropf/climada/data/exposures/litpop/LitPop_150arcsec_DEU/v2/LitPop_150arcsec_DEU.hdf5
/Users/ckropf/opt/anaconda3/envs/climada_333_shapely2/lib/python3.9/pickle.py:1717: UserWarning: Unpickling a shapely <2.0 geometry object. Please save the pickle again; shapely 2.1 will not have this compatibility.
  setstate(state)

Now that the samples have been generated, it is crucial to oder the samples in order to minimize the number of times files have to be loaded. In this case, loading the hazards take more time than loading the exposures. We thus sort first by hazards (which then each have to be loaded one single time), and then by exposures (which have to be each loaded once for each hazard).

# Ordering of the samples by hazard first and exposures second
output_imp = calc_imp.make_sample(N=2**2, sampling_kwargs={'skip_values': 2**3})
output_imp.order_samples(by=['f_haz', 'f_exp'])
2023-08-04 16:29:49,588 - climada.engine.unsequa.calc_base - INFO - Effective number of made samples: 72

We can verify how the samples are ordered. In the graph below, it is confirmed that the hazard are ordered, and thus the hazards will be loaded once each. The exposures on the other changes at most once per hazard.

import matplotlib.pyplot as plt
e = output_imp.samples_df['f_exp'].values
h = output_imp.samples_df['f_haz'].values

Note that due to the very small number of samples chosen here for illustrative purposes, not all combinations of hazard and exposures are part of the samples. This is due to the nature of the Sobol sequence (default sampling method).

plt.plot(e, label='exposures');
plt.plot(h, label='hazards');
plt.xlabel('samples');
plt.ylabel('file number');
plt.title('Order of exposures and hazards files in samples');
plt.legend(loc='upper right');
<matplotlib.legend.Legend at 0x178baef10>
../_images/d88f70b607bf8a39959f020589ef427f10554b7224a1606a72f280386da761c2.png
output_imp = calc_imp.uncertainty(output_imp)