CLIMADA coding conventions#

Dependencies (python packages)#

Python is extremely powerful thanks to the large amount of available libraries, packages and modules. However, maintaining a code with a large number of such packages creates dependencies which is very care intensive. Indeed, each package developer can and does update and develop continuously. This means that certain code can become obsolete over time, stop working altogether, or become incompatible with other packages. Hence, it is crucial to keep the philosophy:

As many packages as needed, as few as possible.

Thus, when you are coding, follow these priorities:

  1. Python standard library

  2. Functions and methods already implemented in CLIMADA (do NOT introduce circulary imports though)

  3. Packages already included in CLIMADA

  4. Before adding a new dependency:

Hence, first try to solve your problem with the standard library and function/methods already implemented in CLIMADA (see in particular the utility functions) then use the packages included in CLIMADA, and if this is not enough, propose the addition of a new package. Do not hesitate to propose new packages if this is needed for your work!

Class inheritance#

In Python, a class can inherit from other classes, which is a very useful mechanism in certain circumstances. However, it is wise to think about inheritance before implementing it. Very important to note, that CLIMADA classes DO NOT inherit from external library classes. For example, if Exposure class is directly inherited from the external package Geopandas, it may cause problems in CLIMADA if Geopandas is updated.

CLIMADA classes shall NOT inherit classes from external modules.

Avoid attribute-style accesses#

CLIMADA developers shall use item-style access instead of attribute-style access (e.g. centroids.gdf[“dist_coast”] instead of centroids.gdf.dist_coast) when accessing a column (in the example: “dist_coast”) in a DataFrame or GeoDataFrame, or variables and attributes of xarray Datasets and DataArrays.

Reasons are: Improved syntax highlighting, more consistency (since in many cases you cannot use attribute-style access, so you are forced to fall back to item-style access), avoid mixing up attribute and column names.

Code formatting#

Consistent code formatting is crucial for any project, especially open-source ones. It enhances readability, reduces cognitive load, and makes collaboration easier by ensuring that code looks the same regardless of who wrote it. Uniform formatting helps avoiding unnecessary differences in version control, focusing reviewson functional changes rather than stylistic differences.

Pull requests checks#

Currently, the CI/CD pipeline checks that:

  1. Every files end with a newline

  2. There are no trailing whitespace at the end of lines.

  3. All .py and .ipynb files are formatted following black convention

  4. Import statements are sorted following isort convention

Note that most text editors usually take care of 1. and 2. by default.

Please note that pull requests will not be merged if these checks fail. The easiest way to ensure this, is to use pre-commit hooks, which will allow you to both run the checks and apply fixes when creating a new commit. Following the advanced installation instructions will set up these hooks for you.

black#

We chose black as our formatter because it perfectly fits this need, quoting directly from the project

Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead. Black makes code review faster by producing the smallest diffs possible.

black automatically reformats your Python code to conform to the PEP 8 style guide, among other guidelines. It takes care of various aspects, including:

  • Line Length: By default, it wraps lines to 88 characters, though this can be adjusted.

  • Indentation: Ensures consistent use of 4 spaces for indentation.

  • String Quotes: Converts all strings to use double quotes by default.

  • Spacing: Adjusts spacing around operators and after commas to maintain readability.

For installation and more in-depth information on black, refer to its documentation.

Plugins executing black are available for our recommended IDEs:

isort#

isort is a Python utility to sort imports alphabetically, and automatically separated into sections and by type.

Just like black it ensure consistency of the code, focusing on the imports

For installation and more in depth information on isort refer to its documentation.

A VSCode plugin is available.

How do I update my branch if it is not up to date with the formatted Climada?#

If you were developing a feature before Climada switched to black formatting, you will need to follow a few steps to update your branch to the new formatting.

Given a feature branch YOUR_BRANCH, do the following:

  1. Update the repo to fetch the latest changes:

    git fetch -t
    git checkout develop-white
    git checkout develop-black
    
  2. Switch to your feature branch and merge develop-white (in order to get the latest changes in develop before switching to black):

    git checkout YOUR_BRANCH
    git pull
    pre-commit uninstall || pip install pre-commit
    git merge --no-ff develop-white
    

    If merge conflicts arise, resolve them and conclude the merge as instructed by Git. It also helps to check if the tests pass after the merge.

  3. Install and run the pre-commit hooks:

    pre-commit install
    pre-commit run --all-files
    
  4. Commit the changes applied by the hooks to your branch:

    git add -u
    git commit
    
  5. Now merge develop-black:

    git merge --no-ff develop-black
    

    Resolve all conflicts by choosing “Ours” over “Theirs” (“Current Change” over the “Incoming Change”).

    git checkout --ours .
    git add -u
    git commit
    
  6. Now, get up to date with the latest develop branch:

    git checkout develop
    git pull
    git checkout YOUR_BRANCH
    git merge --no-ff develop
    

    Again, fix merge conflicts if they arise and check if the tests pass. Accept the incoming changes for the tutorials 1_main, Exposures, LitPop Impact, Forecast and TropicalCyclone unless you made changes to those. Again, the file with the most likely merging conflicts is CHANGELOG.md, which should probably be resolved by accepting both changes.

  7. Finally, push your latest changes:

    git push origin YOUR_BRANCH
    

Paper repository#

Applications made with CLIMADA which are published in the form of a paper or a report are very much encouraged to be submitted to the climada/paper repository. You can either:

  • Prepare a well-commented jupyter notebook with the code necessary to reproduce your results and upload it to the climada/paper repository. Note however that the repository cannot be used for storing data files.

  • Upload the code necessary to reproduce your results to a separate repository of your own. Then, add a link to your repository and to your publication to the readme file on the climada/paper repository.

Notes about DOI

Some journals require you to provide a DOI to the code and data used for your publication. In this case, we encourage you to create a separate repository for your code and create a DOI using Zenodo or any specific service from your institution (e.g. ETH Zürich).

The CLIMADA releases are also identified with a DOI.

Utility functions#

In CLIMADA, there is a set of utility functions defined in climada.util. A few examples are:

  • convert large monetary numbers into thousands, millions or billions together with the correct unit name

  • compute distances

  • load hdf5 files

  • convert iso country numbers between formats

Whenever you develop a module or make a code review, be attentive to see whether a given functionality has already been implemented as a utility function. In addition, think carefully whether a given function/method does belong in its module or is actually independent of any particular module and should be defined as a utility function.

It is very important to not reinvent the wheel and to avoid unnecessary redundancies in the code. This makes maintenance and debugging very tedious.

Data dependencies#

Web APIs#

CLIMADA relies on open data available through web APIs such as those of the World Bank, Natural Earth, NASA and NOAA. You might execute the test climada_python-x.y.z/test_data_api.py to check that all the APIs used are active. If any is out of service (temporarily or permanently), the test will indicate which one.

Manual download#

As indicated in the software and tutorials, other data might need to be downloaded manually by the user. The following table shows these last data sources, their version used, its current availability and where they are used within CLIMADA:

Name

Version

Link

CLIMADA class

CLIMADA version

CLIMADA tutorial reference

Fire Information for Resource Management System

FIRMS

BushFire

> v1.2.5

climada_hazard_BushFire.ipynb

Gridded Population of the World (GPW)

v4.11

GPW4.11

LitPop

> v1.2.3

climada_entity_LitPop.ipynb

Side note on parameters#

Don’t use *args and **kwargs parameters without a very good reason.#

There are valid use cases for this kind of parameter notation.
In particular *args comes in handy when there is an unknown number of equal typed arguments to be passed. E.g., the pathlib.Path constructor.
But if the parameters are expected to be structured in any way, it is just a bad idea.

def f(x, y, z):
    return x + y + z


# bad in most cases
def g(*args, **kwargs):
    x = args[0]
    y = kwargs["y"]
    s = f(*args, **kwargs)
    print(x, y, s)


g(1, y=2, z=3)
# usually just fine
def g(x, y, z):
    s = f(x, y, z)
    print(x, y, s)


g(1, y=2, z=3)

Decrease the number of parameters.#

Though CLIMADA’s pylint configuration .pylintrc allows 7 arguments for any method or function before it complains, it is advisable to aim for less. It is quite likely that a function with so many parameters has an inherent design flaw.

There are very well designed command line tools with inumerable optional arguments, e.g., rsync - but these are command line tools. There are also methods like pandas.DataFrame.plot() with countless optional arguments and it makes perfectly sense.

But within the climada package it probably doesn’t. divide et impera!

Whenever a method has more than 5 parameters, it is more than likely that it can be refactored pretty easily into two or more methods with less parameters and less complexity:

def f(a, b, c, d, e, f, g, h):
    print(f"f does many things with a lot of arguments: {a, b, c, d, e, f, g, h}")
    return sum([a, b, c, d, e, f, g, h])


f(1, 2, 3, 4, 5, 6, 7, 8)
def f1(a, b, c, d):
    print(f"f1 does less things with fewer arguments: {a, b, c, d}")
    return sum([a, b, c, d])


def f2(e, f, g, h):
    print(f"f2 dito: {e, f, g, h}")
    return sum([e, f, g, h])


def f3(x, y):
    print(f"f3 dito, but on a higher level: {x, y}")
    return sum([x, y])


f3(f1(1, 2, 3, 4), f2(5, 6, 7, 8))

This of course pleads the case on a strictly formal level. No real complexities have been reduced during the making of this example.
Nevertheless there is the benefit of reduced test case requirements. And in real life, real complexity will be reduced.