CLIMADA coding conventions#
Dependencies (python packages)#
Python is extremely powerful thanks to the large amount of available libraries, packages and modules. However, maintaining a code with a large number of such packages creates dependencies which is very care intensive. Indeed, each package developer can and does update and develop continuously. This means that certain code can become obsolete over time, stop working altogether, or become incompatible with other packages. Hence, it is crucial to keep the philosophy:
As many packages as needed, as few as possible.
Thus, when you are coding, follow these priorities:
Functions and methods already implemented in CLIMADA (do NOT introduce circulary imports though)
Before adding a new dependency:
Contact a repository admin to get permission
Open an issue
Hence, first try to solve your problem with the standard library and function/methods already implemented in CLIMADA (see in particular the utility functions) then use the packages included in CLIMADA, and if this is not enough, propose the addition of a new package. Do not hesitate to propose new packages if this is needed for your work!
Class inheritance#
In Python, a class can inherit from other classes, which is a very useful mechanism in certain circumstances. However, it is wise to think about inheritance before implementing it. Very important to note, that CLIMADA classes DO NOT inherit from external library classes. For example, if Exposure
class is directly inherited from the external package Geopandas
, it may cause problems in CLIMADA if Geopandas
is updated.
CLIMADA classes shall NOT inherit classes from external modules.
Avoid attribute-style accesses#
CLIMADA developers shall use item-style access instead of attribute-style access (e.g. centroids.gdf[“dist_coast”] instead of centroids.gdf.dist_coast) when accessing a column (in the example: “dist_coast”) in a DataFrame or GeoDataFrame, or variables and attributes of xarray Datasets and DataArrays.
Reasons are: Improved syntax highlighting, more consistency (since in many cases you cannot use attribute-style access, so you are forced to fall back to item-style access), avoid mixing up attribute and column names.
Code formatting#
Consistent code formatting is crucial for any project, especially open-source ones. It enhances readability, reduces cognitive load, and makes collaboration easier by ensuring that code looks the same regardless of who wrote it. Uniform formatting helps avoiding unnecessary differences in version control, focusing reviewson functional changes rather than stylistic differences.
Pull requests checks#
Currently, the CI/CD pipeline checks that:
Every files end with a newline
There are no trailing whitespace at the end of lines.
All
.py
and.ipynb
files are formatted followingblack
conventionImport statements are sorted following
isort
convention
Note that most text editors usually take care of 1. and 2. by default.
Please note that pull requests will not be merged if these checks fail. The easiest way to ensure this, is to use pre-commit hooks, which will allow you to both run the checks and apply fixes when creating a new commit. Following the advanced installation instructions will set up these hooks for you.
black
#
We chose black as our formatter because it perfectly fits this need, quoting directly from the project
Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead. Black makes code review faster by producing the smallest diffs possible.
black
automatically reformats your Python code to conform to the PEP 8 style guide, among other guidelines. It takes care of various aspects, including:
Line Length: By default, it wraps lines to 88 characters, though this can be adjusted.
Indentation: Ensures consistent use of 4 spaces for indentation.
String Quotes: Converts all strings to use double quotes by default.
Spacing: Adjusts spacing around operators and after commas to maintain readability.
For installation and more in-depth information on black, refer to its documentation.
Plugins executing black
are available for our recommended IDEs:
VSCode: Black Formatter Plugin
Spyder: See this SO post
JupyterLab: Code Formatter Plugin
isort
#
isort
is a Python utility to sort imports alphabetically, and automatically separated into sections and by type.
Just like black
it ensure consistency of the code, focusing on the imports
For installation and more in depth information on isort
refer to its documentation.
A VSCode plugin is available.
How do I update my branch if it is not up to date with the formatted Climada?#
If you were developing a feature before Climada switched to black
formatting, you will need to follow a few steps to update your branch to the new formatting.
Given a feature branch YOUR_BRANCH
, do the following:
Update the repo to fetch the latest changes:
git fetch -t git checkout develop-white git checkout develop-black
Switch to your feature branch and merge
develop-white
(in order to get the latest changes indevelop
before switching toblack
):git checkout YOUR_BRANCH git pull pre-commit uninstall || pip install pre-commit git merge --no-ff develop-white
If merge conflicts arise, resolve them and conclude the merge as instructed by Git. It also helps to check if the tests pass after the merge.
Install and run the pre-commit hooks:
pre-commit install pre-commit run --all-files
Commit the changes applied by the hooks to your branch:
git add -u git commit
Now merge
develop-black
:git merge --no-ff develop-black
Resolve all conflicts by choosing “Ours” over “Theirs” (“Current Change” over the “Incoming Change”).
git checkout --ours . git add -u git commit
Now, get up to date with the latest
develop
branch:git checkout develop git pull git checkout YOUR_BRANCH git merge --no-ff develop
Again, fix merge conflicts if they arise and check if the tests pass. Accept the incoming changes for the tutorials 1_main, Exposures, LitPop Impact, Forecast and TropicalCyclone unless you made changes to those. Again, the file with the most likely merging conflicts is CHANGELOG.md, which should probably be resolved by accepting both changes.
Finally, push your latest changes:
git push origin YOUR_BRANCH
Paper repository#
Applications made with CLIMADA which are published in the form of a paper or a report are very much encouraged to be submitted to the climada/paper repository. You can either:
Prepare a well-commented jupyter notebook with the code necessary to reproduce your results and upload it to the climada/paper repository. Note however that the repository cannot be used for storing data files.
Upload the code necessary to reproduce your results to a separate repository of your own. Then, add a link to your repository and to your publication to the readme file on the climada/paper repository.
Notes about DOI
Some journals require you to provide a DOI to the code and data used for your publication. In this case, we encourage you to create a separate repository for your code and create a DOI using Zenodo or any specific service from your institution (e.g. ETH Zürich).
The CLIMADA releases are also identified with a DOI.
Utility functions#
In CLIMADA, there is a set of utility functions defined in climada.util
. A few examples are:
convert large monetary numbers into thousands, millions or billions together with the correct unit name
compute distances
load hdf5 files
convert iso country numbers between formats
…
Whenever you develop a module or make a code review, be attentive to see whether a given functionality has already been implemented as a utility function. In addition, think carefully whether a given function/method does belong in its module or is actually independent of any particular module and should be defined as a utility function.
It is very important to not reinvent the wheel and to avoid unnecessary redundancies in the code. This makes maintenance and debugging very tedious.
Data dependencies#
Web APIs#
CLIMADA relies on open data available through web APIs such as those of the World Bank, Natural Earth, NASA and NOAA.
You might execute the test climada_python-x.y.z/test_data_api.py
to check that all the APIs used are active.
If any is out of service (temporarily or permanently), the test will indicate which one.
Manual download#
As indicated in the software and tutorials, other data might need to be downloaded manually by the user. The following table shows these last data sources, their version used, its current availability and where they are used within CLIMADA:
Side note on parameters#
Don’t use *args and **kwargs parameters without a very good reason.#
There are valid use cases for this kind of parameter notation.
In particular *args
comes in handy when there is an unknown number of equal typed arguments to be passed. E.g., the pathlib.Path
constructor.
But if the parameters are expected to be structured in any way, it is just a bad idea.
def f(x, y, z):
return x + y + z
# bad in most cases
def g(*args, **kwargs):
x = args[0]
y = kwargs["y"]
s = f(*args, **kwargs)
print(x, y, s)
g(1, y=2, z=3)
# usually just fine
def g(x, y, z):
s = f(x, y, z)
print(x, y, s)
g(1, y=2, z=3)
Decrease the number of parameters.#
Though CLIMADA’s pylint configuration .pylintrc allows 7 arguments for any method or function before it complains, it is advisable to aim for less. It is quite likely that a function with so many parameters has an inherent design flaw.
There are very well designed command line tools with inumerable optional arguments, e.g., rsync - but these are command line tools. There are also methods like pandas.DataFrame.plot()
with countless optional arguments and it makes perfectly sense.
But within the climada package it probably doesn’t. divide et impera!
Whenever a method has more than 5 parameters, it is more than likely that it can be refactored pretty easily into two or more methods with less parameters and less complexity:
def f(a, b, c, d, e, f, g, h):
print(f"f does many things with a lot of arguments: {a, b, c, d, e, f, g, h}")
return sum([a, b, c, d, e, f, g, h])
f(1, 2, 3, 4, 5, 6, 7, 8)
def f1(a, b, c, d):
print(f"f1 does less things with fewer arguments: {a, b, c, d}")
return sum([a, b, c, d])
def f2(e, f, g, h):
print(f"f2 dito: {e, f, g, h}")
return sum([e, f, g, h])
def f3(x, y):
print(f"f3 dito, but on a higher level: {x, y}")
return sum([x, y])
f3(f1(1, 2, 3, 4), f2(5, 6, 7, 8))
This of course pleads the case on a strictly formal level. No real complexities have been reduced during the making of this example.
Nevertheless there is the benefit of reduced test case requirements. And in real life, real complexity will be reduced.