Impact Function Calibration Module


Impact Function Calibration Module#

Base Classes#

Generic classes for defining the data structures of this module.

class climada.util.calibrate.base.Input(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~pandas.core.frame.DataFrame, ~pandas.core.frame.DataFrame], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, assign_centroids: dataclasses.InitVar[bool] = True)[source]#

Define the static input for a calibration task


Hazard object to compute impacts from




Exposures object to compute impacts from




The data to compare computed impacts to. Index: Event IDs matching the IDs of hazard. Columns: Arbitrary columns. NaN values in the data frame have special meaning: Corresponding impact values computed by the model are ignored in the calibration.




Function that takes the parameters as keyword arguments and returns an impact function set. This will be called each time the optimization algorithm updates the parameters.




Function that takes an impact object as input and transforms its data into a pandas.DataFrame that is compatible with the format of data. The return value of this function will be passed to the cost_func as first argument.




Function that takes two pandas.Dataframe objects and returns the scalar “cost” between them. The optimization algorithm will try to minimize this number. The first argument is the true/correct values (data), and the second argument is the estimated/predicted values.




The bounds for the parameters. Keys: parameter names. Values: scipy.minimize.Bounds instance or tuple of minimum and maximum value. Unbounded parameters need not be specified here. See the documentation for the selected optimization algorithm on which data types are supported.


Mapping (str, {Bounds, tuple(float, float)}), optional


One or multiple instances of scipy.minimize.LinearConstraint, scipy.minimize.NonlinearConstraint, or a mapping. See the documentation for the selected optimization algorithm on which data types are supported.


Constraint or list of Constraint, optional


Keyword arguments to climada.engine.impact_calc.ImpactCalc.impact(). Defaults to {"assign_centroids": False} (by default, centroids are assigned here via the assign_centroids parameter, to avoid assigning them each time the impact is calculated).


Mapping (str, Any), optional


If the impact model returns impact data for which no values exist in data, insert this value. Defaults to NaN, in which case the impact from the model is ignored. Set this to zero to explicitly calibrate to zero impacts in these cases.


float, optional


If True (default), assign the hazard centroids to the exposure when this object is created.


bool, optional

impact_to_aligned_df(impact: Impact, fillna: float = nan) Tuple[DataFrame, DataFrame][source]#

Create a dataframe from an impact and align it with the data.

When aligning, two general cases might occur, which are not mutually exclusive:

  1. There are data points for which no impact was computed. This will always be treated as an impact of zero.

  2. There are impacts for which no data points exist. For these points, the input data will be filled with the value of Input.missing_data_value.

This method performs the following steps:

  • Transform the impact into a dataframe using impact_to_dataframe.

  • Align the data with the impact dataframe, using missing_data_value as fill value.

  • Align the impact dataframe with the data, using zeros as fill value.

  • In the aligned impact, set all values to zero where the data is NaN.

  • Fill remaining NaNs in data with fillna.


impact_df (pandas.DataFrame) – The impact computed by the model, transformed into a dataframe by Input.impact_to_dataframe.


  • data_aligned (pd.DataFrame) – The data aligned to the impact dataframe

  • impact_df_aligned (pd.DataFrame) – The impact transformed to a dataframe and aligned with the data

__init__(hazard: ~climada.hazard.base.Hazard, exposure: ~climada.entity.exposures.base.Exposures, data: ~pandas.core.frame.DataFrame, impact_func_creator: ~typing.Callable[[...], ~climada.entity.impact_funcs.impact_func_set.ImpactFuncSet], impact_to_dataframe: ~typing.Callable[[~climada.engine.impact.Impact], ~pandas.core.frame.DataFrame], cost_func: ~typing.Callable[[~pandas.core.frame.DataFrame, ~pandas.core.frame.DataFrame], ~numbers.Number], bounds: ~typing.Mapping[str, ~scipy.optimize._constraints.Bounds | ~typing.Tuple[~numbers.Number, ~numbers.Number]] | None = None, constraints: ~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping | list[~scipy.optimize._constraints.LinearConstraint | ~scipy.optimize._constraints.NonlinearConstraint | ~typing.Mapping] | None = None, impact_calc_kwds: ~typing.Mapping[str, ~typing.Any] = <factory>, missing_data_value: float = nan, assign_centroids: dataclasses.InitVar[bool] = True) None#
class climada.util.calibrate.base.Output(params: Mapping[str, Number], target: Number)[source]#

Generic output of a calibration task


The optimal parameters


Mapping (str, Number)


The target function value for the optimal parameters



to_hdf5(filepath: Path | str, mode: str = 'x')[source]#

Write the output into an H5 file

This stores the data as attributes because we only store single numbers, not arrays

  • filepath (Path or str) – The filepath to store the data.

  • mode (str (optional)) – The mode for opening the file. Defaults to x (Create file, fail if exists).

classmethod from_hdf5(filepath: Path | str)[source]#

Create an output object from an H5 file

__init__(params: Mapping[str, Number], target: Number) None#
class climada.util.calibrate.base.OutputEvaluator(input: Input, output: Output)[source]#

Evaluate the output of a calibration task

  • input (Input) – The input object for the optimization task.

  • output (Output) – The output object returned by the optimization task.


The impact function set built from the optimized parameters




An impact object calculated using the optimal impf_set



plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Create a bar plot comparing estimated model output and data per event.

Every row of the is considered an event. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:



This plot does not include the ignored impact, see

plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Create a bar plot comparing estimated model output and data per event

Every column of the is considered a region. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:



This plot does not include the ignored impact, see

plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)[source]#

Plot a heatmap comparing all events per all regions

Every column of the is considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:


__init__(input: Input, output: Output) None#
class climada.util.calibrate.base.Optimizer(input: Input)[source]#

Abstract base class (interface) for an optimization

This defines the interface for optimizers in CLIMADA. New optimizers can be created by deriving from this class and overriding at least the run() method.


The input object for the optimization task. See Input.



_target_func(data: DataFrame, predicted: DataFrame) Number[source]#

Target function for the optimizer

The default version of this function simply returns the value of the cost function evaluated on the arguments.

  • data (pandas.DataFrame) – The reference data used for calibration. By default, this is

  • predicted (pandas.DataFrame) – The impact predicted by the data calibration after it has been transformed into a dataframe by Input.impact_to_dataframe.

Return type:

The value of the target function for the optimizer.

_kwargs_to_impact_func_creator(*_, **kwargs) Dict[str, Any][source]#

Define how the parameters to _opt_func() must be transformed

Optimizers may implement different ways of representing the parameters (e.g., key-value pairs, arrays, etc.). Depending on this representation, the parameters must be transformed to match the syntax of the impact function generator used, see Input.impact_func_creator.

In this default version, the method simply returns its keyword arguments as mapping. Override this method if the optimizer used does not represent parameters as key-value pairs.


kwargs – The parameters as key-value pairs.

Return type:

The parameters as key-value pairs.

_opt_func(*args, **kwargs) Number[source]#

The optimization function iterated by the optimizer

This function takes arbitrary arguments from the optimizer, generates a new set of impact functions from it, computes the impact, and finally calculates the target function value and returns it.


args, kwargs – Arbitrary arguments from the optimizer, including parameters

Return type:

Target function value for the given arguments

abstract run(**opt_kwargs) Output[source]#

Execute the optimization

__init__(input: Input) None#

Bayesian Optimizer#

Calibration based on Bayesian optimization.

climada.util.calibrate.bayesian_optimizer.select_best(p_space_df: DataFrame, cost_limit: float, absolute: bool = True, cost_col=('Calibration', 'Cost Function')) DataFrame[source]#

Select the best parameter space samples defined by a cost function limit

The limit is a factor of the minimum value relative to itself (absolute=True) or to the range of cost function values (absolute=False). A cost_limit of 0.1 will select all rows where the cost function is within

  • 110% of the minimum value if absolute=True.

  • 10% of the range between minimum and maximum cost function value if


  • p_space_df (pd.DataFrame) – The parameter space to select from.

  • cost_limit (float) – The limit factor used for selection.

  • absolute (bool, optional) – Whether the limit factor is applied to the minimum value (True) or the range of values (False). Defaults to True.

  • cost_col (Column specifier, optional) – The column indicating cost function values. Defaults to ("Calibration", "Cost Function").


A subselection of the input data frame.

Return type:


class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutput(params: Mapping[str, Number], target: Number, p_space: TargetSpace)[source]#

Bases: Output

Output of a calibration with BayesianOptimizer


The parameter space sampled by the optimizer.




Return the sampled parameter space as pandas.DataFrame


Data frame whose columns are the parameter values and the associated cost function value (Cost Function) and whose rows are the optimizer iterations.

Return type:


to_hdf5(filepath: Path | str, mode: str = 'x')[source]#

Write this output to an H5 file

classmethod from_hdf5(filepath: Path | str)[source]#

Read BayesianOptimizerOutput from an H5 file


This results in an object with broken p_space object. Do not further modify this parameter space. This function is only intended to load the parameter space again for analysis/plotting.

plot_p_space(p_space_df: DataFrame | None = None, x: str | None = None, y: str | None = None, min_def: str | Tuple[str, str] | None = 'Cost Function', min_fmt: str = 'x', min_color: str = 'r', **plot_kwargs) Axes | List[Axes][source]#

Plot the parameter space as scatter plot(s)

Produce a scatter plot where each point represents a parameter combination sampled by the optimizer. The coloring represents the cost function value. If there are more than two parameters in the input data frame, this method will produce one plot for each combination of two parameters. Explicit parameter names to plot can be given via the x and y arguments. If no data frame is provided as argument, the output of p_space_to_dataframe() is used.

  • p_space_df (pd.DataFrame, optional) – The parameter space to plot. Defaults to the one returned by p_space_to_dataframe()

  • x (str, optional) – The parameter to plot on the x-axis. If y is not given, this will plot x against all other parameters.

  • y (str, optional) – The parameter to plot on the y-axis. If x is not given, this will plot y against all other parameters.

  • min_def (str, optional) – The name of the column in p_space_df defining which parameter set represents the minimum, which is plotted separately. Defaults to "Cost Function". Set to None to avoid plotting the minimum.

  • min_fmt (str, optional) – Plot format string for plotting the minimum. Defaults to "x".

  • min_color (str, optional) – Color for plotting the minimum. Defaults to "r" (red).

__init__(params: Mapping[str, Number], target: Number, p_space: TargetSpace) None#
class climada.util.calibrate.bayesian_optimizer.Improvement(iteration, sample, random, target, improvement)#

Bases: tuple

count(value, /)#

Return number of occurrences of value.


Alias for field number 4

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.


Alias for field number 0


Alias for field number 2


Alias for field number 1


Alias for field number 3

exception climada.util.calibrate.bayesian_optimizer.StopEarly[source]#

Bases: Exception

An exception for stopping an optimization iteration early

__init__(*args, **kwargs)#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerController(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0)[source]#

Bases: object

A class for controlling the iterations of a BayesianOptimizer.

Each iteration in the optimizer consists of a random sampling of the parameter space with init_points steps, followed by a Gaussian process sampling with n_iter steps. During the latter, the kappa parameter is reduced to reach kappa_min at the end of the iteration. The iteration is stopped prematurely if improvements of the buest guess are below min_improvement for min_improvement_count consecutive times. At the beginning of the next iteration, kappa is reset to its original value.

Optimization stops if max_iterations is reached or if an entire iteration saw now improvement.


Number of randomly sampled points during each iteration.




Maximum number of points using Gaussian process sampling during each iteration.




Minimal relative improvement. If improvements are below this value min_improvement_count times, the iteration is stopped.




Number of times the min_improvement must be undercut to stop the iteration.




Parameter controlling exploration of the upper-confidence-bound acquisition function of the sampling algorithm. Lower values mean less exploration of the parameter space and more exploitation of local information. This value is reduced throughout one iteration, reaching kappa_min at the last iteration step.




Minimal value of kappa after n_iter steps.




Maximum number of iterations before optimization is stopped, irrespective of convergence.




Further keyword arguments to the bayes_opt.UtilityFunction.


dict[str, int | float | str]

classmethod from_input(inp: Input, sampling_base: float = 4, **kwargs)[source]#

Create a controller from a calibration input

This uses the number of parameters to determine the appropriate values for init_points and n_iter. Both values are set to \(b^N\), where \(b\) is the sampling_base parameter and \(N\) is the number of estimated parameters.

  • inp (Input) – Input to the calibration

  • sampling_base (float, optional) – Base for determining the sample size. Increase this for denser sampling. Defaults to 4.

  • kwargs – Keyword argument for the default constructor.

optimizer_params() dict[str, int | float | str | UtilityFunction][source]#

Return parameters for the optimizer

In the current implementation, these do not change.

update(event: str, instance: BayesianOptimization)[source]#

Update the step tracker of this instance.

For step events, check if the latest guess is the new maximum. Also check if the iteration will be stopped early.

For end events, check if any improvement occured. If not, stop the optimization.

  • event (bayes_opt.Events) – The event descriptor

  • instance (bayes_opt.BayesianOptimization) – Optimization instance triggering the event

  • StopEarly – If the optimization only achieves minimal improvement, stop the iteration early with this exception.

  • StopIteration – If an entire iteration did not achieve improvement, stop the optimization.

improvements() DataFrame[source]#

Return improvements as nicely formatted data



Return type:


__init__(init_points: int, n_iter: int, min_improvement: float = 0.001, min_improvement_count: int = 2, kappa: float = 2.576, kappa_min: float = 0.1, max_iterations: int = 10, utility_func_kwargs: dict[str, int | float | str] = <factory>, _last_it_improved: int = 0, _last_it_end: int = 0) None#
class climada.util.calibrate.bayesian_optimizer.BayesianOptimizer(input: Input, verbose: int = 0, random_state: dataclasses.InitVar[int] = 1, allow_duplicate_points: dataclasses.InitVar[bool] = True, bayes_opt_kwds: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None)[source]#

Bases: Optimizer

An optimization using bayes_opt.BayesianOptimization

This optimizer reports the target function value for each parameter set and maximizes that value. Therefore, a higher target function value is better. The cost function, however, is still minimized: The target function is defined as the inverse of the cost function.

For details on the underlying optimizer, see bayesian-optimization/BayesianOptimization.

  • input (Input) – The input data for this optimizer. See the Notes below for input requirements.

  • verbose (int, optional) – Verbosity of the optimizer output. Defaults to 0. The output is not affected by the CLIMADA logging settings.

  • random_state (int, optional) – Seed for initializing the random number generator. Defaults to 1.

  • allow_duplicate_points (bool, optional) – Allow the optimizer to sample the same points in parameter space multiple times. This may happen if the parameter space is tightly bound or constrained. Defaults to True.

  • bayes_opt_kwds (dict) – Additional keyword arguments passed to the BayesianOptimization constructor.


The following requirements apply to the parameters of Input when using this class:


Setting bounds is required because the optimizer first “explores” the bound parameter space and then narrows its search to regions where the cost function is low.


Must be an instance of scipy.minimize.LinearConstraint or scipy.minimize.NonlinearConstraint. See bayesian-optimization/BayesianOptimization for further information. Supplying contraints is optional.


The optimizer instance of this class.



run(controller: BayesianOptimizerController) BayesianOptimizerOutput[source]#

Execute the optimization

BayesianOptimization maximizes a target function. Therefore, this class inverts the cost function and used that as target function. The cost function is still minimized.

  • controller (BayesianOptimizerController) – The controller instance used to set the optimization iteration parameters.

  • opt_kwargs – Further keyword arguments passed to BayesianOptimization.maximize.


output – Optimization output. BayesianOptimizerOutput.p_space stores data on the sampled parameter space.

Return type:


__init__(input: Input, verbose: int = 0, random_state: dataclasses.InitVar[int] = 1, allow_duplicate_points: dataclasses.InitVar[bool] = True, bayes_opt_kwds: dataclasses.InitVar[Optional[Mapping[str, Any]]] = None) None#
class climada.util.calibrate.bayesian_optimizer.BayesianOptimizerOutputEvaluator(input: Input, output: BayesianOptimizerOutput)[source]#

Bases: OutputEvaluator

Evaluate the output of BayesianOptimizer.

  • input (Input) – The input object for the optimization task.

  • output (BayesianOptimizerOutput) – The output object returned by the Bayesian optimization task.


TypeError – If output is not of type BayesianOptimizerOutput

plot_impf_variability(p_space_df: DataFrame | None = None, plot_haz: bool = True, plot_opt_kws: dict | None = None, plot_impf_kws: dict | None = None, plot_hist_kws: dict | None = None, plot_axv_kws: dict | None = None)[source]#

Plot impact function variability with parameter combinations of almost equal cost function values

  • p_space_df (pd.DataFrame, optional) – Parameter space to plot functions from. If None, this uses the space returned by p_space_to_dataframe(). Use select_best() for a convenient subselection of parameters close to the optimum.

  • plot_haz (bool, optional) – Whether or not to plot hazard intensity distibution. Defaults to False.

  • plot_opt_kws (dict, optional) – Keyword arguments for optimal impact function plot. Defaults to None.

  • plot_impf_kws (dict, optional) – Keyword arguments for all impact function plots. Defaults to None.

  • plot_hist_kws (dict, optional) – Keyword arguments for hazard intensity histogram plot. Defaults to None.

  • plot_axv_kws (dict, optional) – Keyword arguments for hazard intensity range plot (axvspan).

__init__(input: Input, output: BayesianOptimizerOutput) None#
plot_at_event(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Create a bar plot comparing estimated model output and data per event.

Every row of the is considered an event. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:



This plot does not include the ignored impact, see

plot_at_region(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Create a bar plot comparing estimated model output and data per event

Every column of the is considered a region. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent regions and whose columns represent the modelled impact and the calibration data, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:



This plot does not include the ignored impact, see

plot_event_region_heatmap(data_transf: ~typing.Callable[[~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function OutputEvaluator.<lambda>>, **plot_kwargs)#

Plot a heatmap comparing all events per all regions

Every column of the is considered a region, and every row is considered an event. The data to be plotted can be transformed with a generic function data_transf.

  • data_transf (Callable (pd.DataFrame -> pd.DataFrame), optional) – A function that transforms the data to plot before plotting. It receives a dataframe whose rows represent events and whose columns represent the regions, respectively. By default, the data is not transformed.

  • plot_kwargs – Keyword arguments passed to the method.


ax – The plot axis returned by

Return type:


Scipy Optimizer#

Calibration based on the scipy.optimize module.

class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizerOutput(params: Mapping[str, Number], target: Number, result: OptimizeResult)[source]#

Bases: Output

Output of a calibration with ScipyMinimizeOptimizer


The OptimizeResult instance returned by scipy.optimize.minimize.



__init__(params: Mapping[str, Number], target: Number, result: OptimizeResult) None#
classmethod from_hdf5(filepath: Path | str)#

Create an output object from an H5 file

to_hdf5(filepath: Path | str, mode: str = 'x')#

Write the output into an H5 file

This stores the data as attributes because we only store single numbers, not arrays

  • filepath (Path or str) – The filepath to store the data.

  • mode (str (optional)) – The mode for opening the file. Defaults to x (Create file, fail if exists).

class climada.util.calibrate.scipy_optimizer.ScipyMinimizeOptimizer(input: Input)[source]#

Bases: Optimizer

An optimization using scipy.optimize.minimize

By default, this optimizer uses the "trust-constr" method. This is advertised as the most general minimization method of the scipy package and supports bounds and constraints on the parameters. Users are free to choose any method of the catalogue, but must be aware that they might require different input parameters. These can be supplied via additional keyword arguments to run().

See for details.


input (Input) – The input data for this optimizer. Supported data types for constraint might vary depending on the minimization method used.

run(**opt_kwargs) ScipyMinimizeOptimizerOutput[source]#

Execute the optimization


output – The output of the optimization. The ScipyMinimizeOptimizerOutput.result attribute stores the associated scipy.optimize.OptimizeResult instance.

Return type:


__init__(input: Input) None#