Development and Git and CLIMADA#
Git and GitHub#
Git’s not that scary
95% of your work on Git will be done with the same handful of commands (the other 5% will always be done with careful Googling)
Almost everything in Git can be undone by design (but use
rebase
,--force
and--hard
with care!)Your favourite IDE (Spyder, PyCharm, …) will have a GUI for working with Git, or you can download a standalone one.
The Git Book is a great introduction to how Git works and to using it on the command line.
Consider using a GUI program such as “git desktop” or “Gitkraken” to have a visual git interface, in particular at the beginning. Your python IDE is also likely to have a visual git interface.
Feel free to ask for help
What we assume you know#
We’re assuming you’re all familiar with the basics of Git.
What (and why) is version control
How to clone a repository
How to make a commit and push it to GitHub
What a branch is, and how to make one
How to merge two branches
The basics of the GitHub website
If you’re not feeling great about this, we recommend
sending me a message so we can arrange an introduction with CLIMADA
exploring the Git Book
Terms we’ll be using today#
These are terms that will come up a lot, so let’s make sure we know them
local versus remote
Our remote repository is hosted on GitHub. This is the central location where all updates to CLIMADA that we want to share end up. If you’re updating CLIMADA for the community, your code will end up here too.
Your local repository is the copy you have on the machine you’re working on, and where you do your work.
Git calls the (first, default) remote the
origin
(It’s possible to set more than one remote repository, e.g. you might set one up on a network-restricted computing cluster)
push, pull and pull request
You push your work when you send it from your local machine to the remote repository
You pull from the remote repository to update the code on your local machine
A pull request is a standardised review process on GitHub. Usually it ends with one branch merging into another
Conflict resolution
Sometimes two people have made changes to the same bit of code. Usually this comes up when you’re trying to merge branches. The changes have to be manually compared and the code edited to make sure the ‘correct’ version of the code is kept.
Gitflow#
Gitflow is a particular way of using git to organise projects that have
multiple developers
working on different features
with a release cycle
It means that
there’s always a stable version of the code available to the public
the chances of two developers’ code conflicting are reduced
the process of adding and reviewing features and fixes is more standardised for everyone
Gitflow is a convention, so you don’t need any additional software.
… but if you want you can get some: a popular extension to the git command line tool allows you to issue more intuitive commands for a Gitflow workflow.
Mac/Linux users can install git-flow from their package manager, and it’s included with Git for Windows
Gitflow works on the develop
branch instead of main
#
The critical difference between Gitflow and ‘standard’ git is that almost all of your work takes place on the
develop
branch, instead of themain
(formerlymaster
) branch.The
main
branch is reserved for planned, stable product releases, and it’s what the general public download when they install CLIMADA. The developers almost never interact with it.
Gitflow is a feature-based workflow#
This is common to many workflows: when you want to add something new to the model you start a new branch, work on it locally, and then merge it back into
develop
with a pull request (which we’ll cover later).
By convention we name all CLIMADA feature branches
feature/*
(e.g.feature/meteorite
).Features can be anything, from entire hazard modules to a smarter way to do one line of a calculation. Most of the work you’ll do on CLIMADA will be a features of one size or another.
We’ll talk more about developing CLIMADA features later!
Gitflow enables a regular release cycle#
A release is usually more complex than merging
develop
intomain
.
So for this a
release-*
branch is created fromdevelop
. We’ll all be notified repeatedly when the deadline is to submit (and then to review) pull requests so that you can be included in a release.The core developer team (mostly Emanuel) will then make sure tests, bugfixes, documentation and compatibility requirements are met, merging any fixes back into
develop
.On release day, the release branch is merged into
main
, the commit is tagged as a release and the release notes are published on the GitHub at CLIMADA-project/climada_python
Everything else is hotfixes#
The other type of branch you’ll create is a hotfix.
Hotfixes are generally small changes to code that do one thing, fixing typos, small bugs, or updating docstrings. They’re done in much the same way as features, and are usually merged with a pull request.
The difference between features and hotfixes is fuzzy and you don’t need to worry about getting it right.
Hotfixes will occasionally be used to fix bugs on the
main
branch, in which case they will merge into bothmain
anddevelop
.Some hotfixes are so simple - e.g. fixing a typo or a docstring - that they don’t need a pull request. Use your judgement, but as a rule, if you change what the code does, or how, you should be merging with a pull request.
Installing CLIMADA for development#
See Installation for instructions on how to install CLIMADA for developers. You might need to install additional environments contained in climada_python/requirements
when using specific functionalities. Also see Apps for working with CLIMADA for an overview of which tools are useful for CLIMADA developers.
Pre-Commit Hooks#
Climada developer dependencies include pre-commit hooks to help ensure code linting and formatting. See Code Formatting for our conventions regarding formatting. These hooks will run on all staged files and verify:
the absence of trailing whitespace
that files end in a newline and only a newline
the correct sorting of imports using
isort
the correct formatting of the code using
black
If you have installed the pre-commit hooks (see Install developer dependencies), they will be run each time you attempt to create a new commit, and the usual git flow can slightly change:
If any check fails, you will be warned and these hooks will apply corrections (such as formatting the code with black if it is not). As files are modified, you are required to stage them again (hooks cannot stage their modification, only you can) and commit again.
As an exemple, suppose you made an improvement to Centroids and want to commit these changes, you would run:
$ git status
On branch feature/<new_feature>
Your branch is up-to-date with 'origin/<new_feature>'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: climada/hazard/centroids/centr.py
Now trying to commit, and assuming that imports are not correctly sorted, and some of the code is not correctly formatted:
$ git commit -m "Add <new_feature> to centroids"
Fix End of Files.........................................................Passed
Trim Trailing Whitespace.................................................Passed
isort....................................................................Failed
- hook id: isort
- files were modified by this hook
Fixing [...]/climada_python/climada/hazard/centroids/centr.py
black-jupyter............................................................Failed
- hook id: black-jupyter
- files were modified by this hook
reformatted climada/hazard/centroids/centr.py
All done! ✨ 🍰 ✨
Note the commit was aborted, and the problems were fixed.
However, these changes added by the hooks are not staged yet.
You have to run git add
again to stage them:
$ git status
On branch feature/<new_feature>
Your branch is up-to-date with 'origin/<new_feature>'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: climada/hazard/centroids/centr.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: climada/hazard/centroids/centr.py
$ git add climada/hazard/centroids/centr.py
After that, you can execute the commit and the hooks should pass:
$ git commit -m "Add <new_feature> to centroids"
Fix End of Files.........................................................Passed
Trim Trailing Whitespace.................................................Passed
isort....................................................................Passed
black-jupyter............................................................Passed
All done! ✨ 🍰 ✨
Does it belong in CLIMADA?#
When developing for CLIMADA, it is important to distinguish between core content and particular applications. Core content is meant to be included into the climada_python repository and will be subject to a code review. Any new addition should first be discussed with one of the repository admins. The purpose of this discussion is to see
How does the planned module fit into CLIMADA?
What is an optimal architecture for the new module?
What parts might already exist in other parts of the code?
Applications made with CLIMADA, such as an ECA study can be stored in the paper repository once they have been published. For other types of work, consider making a separate repository that imports CLIMADA as an external package.
Features and branches#
Planning a new feature#
Here we’re talking about large features such as new modules, new data sources, or big methodological changes. Any extension to CLIMADA that might affect other developers’ work, modify the CLIMADA core, or need a big code review.
Smaller feature branches don’t need such formalities. Use your judgment, and if in doubt, let people know.
Talk to the group#
Before starting coding a module, do not forget to coordinate with one of the repo admins (Emanuel, Chahan or Lukas)
This is the chance to work out the Big Picture stuff that is better when it’s planned with the group - possible intersections with other projects, possible conflicts, changes to the CLIMADA core, additional dependencies
Also talk with others from the core development team (see the GitHub wiki).
Bring it to a developers meeting - people may be able to help/advise and are always interested in hearing about new projects. You can also find reviewers!
Also, keep talking! Your plans will change :)
Planning the work#
Does the project go in its own repository and import CLIMADA, or does it extend the main CLIMADA repository?
The way this is done is slowly changing, so definitely discuss it with the group.
Chahan will discuss this later!
Find a few people who will help to review your code.
Ask in a developers’ meeting, on Slack (for WCR developers) or message people on the development team (see the GitHub wiki).
Let them know roughly how much code will be in the reviews, and when you’ll be creating pull requests.
How can the work split into manageable chunks?
A series of smaller pull requests is far more manageable than one big one (and takes off some of the pre-release pressure)
Reviewing and spotting issues/improvements/generalisations early is always a good thing.
It encourages modularisation of the code: smaller self-contained updates, with documentation and tests.
Will there be any changes to the CLIMADA core?
These should be planned carefully
Will you need any new dependencies? Are you sure?
Chahan will discuss this later!
Working on feature branches#
When developing a big new feature, consider creating a feature branch and merging smaller branches into that feature branch with pull requests, keeping the whole process separate from develop
until it’s completed. This makes step-by-step code review nice and easy, and makes the final merge more easily tracked in the history.
e.g. developing the big feature/meteorite
module you might write feature/meteorite-hazard
and merge it in, then feature/meteorite-impact
, then feature/meteorite-stochastic-events
etc… before finally merging feature/meteorite
into develop
. Each of these could be a reviewable pull request.
Make a new branch#
For new features in Git flow:
git flow feature start feature_name
Which is equivalent to (in vanilla git):
git checkout -b feature/feature_name
Or work on an existing branch:
git checkout -b branch_name
Follow the python do’s and don’t and performance guides. Write small readable methods, classes and functions.#
get the latest data from the remote repository and update your branch
git pull
see your locally modified files
git status
add changes you want to include in the commit
git add climada/modified_file.py climada/test/test_modified_file.py
commit the changes
git commit -m "new functionality of .. implemented"
Make unit and integration tests on your code, preferably during development#
Pull requests#
We want every line of code that goes into the CLIMADA repository to be reviewed!
Code review:
catches bugs (there are always bugs)
lets you draw on the experience of the rest of the team
makes sure that more than one person knows how your code works
helps to unify and standardise CLIMADA’s code, so new users find it easier to read and navigate
creates an archived description and discussion of the changes you’ve made
When to make a pull request#
When you’ve finished writing a big new class or method (and its tests)
When you’ve fixed a bug or made an improvement you want to merge
When you want to merge a change of code into
develop
ormain
When you want to discuss a bit of code you’ve been working on - pull requests aren’t only for merging branches
Not all pull requests have to be into develop
- you can make a pull request into any active branch that suits you.
Pull requests need to be made latest two weeks before a release, see releases.
Step by step pull request!#
Let’s suppose you’ve developed a cool new module on the feature/meteorite
branch and you’re ready to merge it into develop
.
Checklist before you start#
Documentation
Tests
Tutorial (if a complete new feature)
Updated dependencies (if need be)
Added your name to the AUTHORS file
Added an entry to the
CHANGELOG.md
file. See https://keepachangelog.com for information on how this shoud look like.(Advanced, optional) interactively rebase/squash recent commits that aren’t yet on GitHub.
Steps#
Make sure the
develop
branch is up to date on your own machinegit checkout develop git pull
Merge
develop
into your feature branch and resolve any conflictsgit checkout feature/meteorite git merge develop
In the case of more complex conflicts, you may want to speak with others who worked on the same code. Your IDE should have a tool for conflict resolution.
Check all the tests pass locally
make unit_test make integ_test
Perform a static code analysis using pylint with CLIMADA’s configuration
.pylintrc
(in the climada root directory). Jenkins executes it after every push.
To do it locally, your IDE probably provides a tool, or you can runmake lint
and see the output inpylint.log
.
Push to GitHub. If you’re pushing this branch for the first time, use
git push -u origin feature/meteorite
and if you’re updating a branch that’s already on GitHub:
git push
Check all the tests pass on the WCR Jenkins server (https://ied-wcr-jenkins.ethz.ch). See Emanuel’s presentation for how to do this! You should regularly be pushing your code and checking this!
Create the pull request!
On the CLIMADA GitHub page, navigate to your feature branch (there’s a drop-down menu above the file structure, pointing by default to
main
).Above the file structure is a branch summary and an icon to the right labelled “Pull request”.
Choose which branch you want to merge with. This will usually be
develop
, but may be another feature branch for more complex feature development.Give your pull request an informative title (like a commit message).
Write a description of the pull request. This can usually be adapted from your branch’s commit messages (you wrote informative commit messages, didn’t you?), and should give a high-level summary of the changes, specific points you want the reviewers’ input on, and explanations for decisions you’ve made. The code documentation (and any references) should cover the more detailed stuff.
Assign reviewers in the page’s right hand sidebar. Tag anyone who might be interested in reading the code. You should already have found one or two people who are happy to read the whole request and sign it off (they could also be added to ‘Assignees’).
Create the pull request.
Contact the reviewers to let them know the request is live. GitHub’s settings mean that they may not be alerted automatically. Maybe also let people know on the WCR Slack!
Talk with your reviewers
Use the comment/chat functionality within GitHub’s pull requests - it’s useful to have an archive of discussions and the decisions made.
Take comments and suggestions on board, but you don’t need to agree with everything and you don’t need to implement everything.
If you feel someone is asking for too many changes, prioritise, especially if you don’t have time for complex rewrites.
If the suggested changes and or features don’t block functionality and you don’t have time to fix them, they can be moved to Issues.
Chase people up if they’re slow. People are slow.
Once you implement the requested changes, respond to the comments with the corresponding commit implementing each requested change.
If the review takes a while, remember to merge
develop
back into the feature branch every now and again (and check the tests are still passing on Jenkins).
Anything pushed to the branch is added to the pull request.Once everyone reviewing has said they’re satisfied with the code you can merge the pull request using the GitHub interface.
Delete the branch once it’s merged, there’s no reason to keep it. (Also try not to re-use that branch name later.)Update the
develop
branch on your local machine.
Also see the Reviewer Guide and Reviewer Checklist!
General tips and tricks#
Ask for help with Git#
Git isn’t intuitive, and rewinding or resetting is always work. If you’re not certain what you’re doing, or if you think you’ve messed up, send someone a message.
Don’t push or commit to develop or main#
Almost all new additions to CLIMADA should be merged into the
develop
branch with a pull request.You won’t merge into the
main
branch, except for emergency hotfixes (which should be communicated to the team).You won’t merge into the
develop
branch without a pull request, except for small documentation updates and typos.The above points mean you should never need to push the
main
ordevelop
branches.
So if you find yourself on the main
or develop
branches typing git merge ...
or git push
stop and think again - you should probably be making a pull request.
This can be difficult to undo, so contact someone on the team if you’re unsure!
Commit more often than you think, and use informative commit messages#
Committing often makes mistakes less scary to undo
git reset --hard HEAD
Detailed commit messages make writing pull requests really easy
Yes it’s boring, but trust me, everyone (usually your future self) will love you when they’re rooting through the git history to try and understand why something was changed
Commit message syntax guidelines#
Basic syntax guidelines taken from here https://chris.beams.io/posts/git-commit/ (on 17.06.2020)
Limit the subject line to 50 characters
Capitalize the subject line
Do not end the subject line with a period
Use the imperative mood in the subject line (e.g. “Add new tests”)
Wrap the body at 72 characters (most editors will do this automatically)
Use the body to explain what and why vs. how
Separate the subject from body with a blank line (This is best done with a GUI. With the command line you have to use text editor, you cannot do it directly with the git command)
Put the name of the function/class/module/file that was edited
When fixing an issue, add the reference gh-ISSUENUMBER to the commit message e.g. “fixes gh-40.” or “Closes gh-40.” For more infos see here https://docs.github.com/en/enterprise/2.16/user/github/managing-your-work-on-github/closing-issues-using-keywords#about-issue-references.
What not to commit#
There are a lot of things that don’t belong in the Git repository:
Don’t commit data, except for config files and very small files for tests.
Don’t commit anything containing passwords or authentication credentials or tokens. (These are annoying to remove from the Git history.) Contact the team if you need to manage authorisations within the code.
Don’t commit anything that can be created by the CLIMADA code itself
If files like this are going to be present for other users as well, add them to the repository’s .gitignore
.
Jupyter Notebook metadata#
Git compares file versions by text tokens. Jupyter Notebooks typically contain a lot of metadata, along with binary data like image files. Simply re-running a notebook can change this metadata, which will be reported as file changes by Git. This causes excessive Diff reports that cannot be reviewed conveniently.
To avoid committing changes of unrelated metadata, open Jupyter Notebooks in a text editor instead of your browser renderer. When committing changes, make sure that you indeed only commit things you did change, and revert any changes to metadata that are not related to your code updates.
Several code editors use plugins to render Jupyter Notebooks. Here we collect the instructions to inspect Jupyter Notebooks as plain text when using them:
VSCode: Open the Jupyter Notebook. Then open the internal command prompt (
Ctrl
+Shift
+P
orCmd
+Shift
+P
on macOS) and type/select ‘View: Reopen Editor with Text Editor’
Log ideas and bugs as GitHub Issues#
If there’s a change you might want to see in the code - something that generalises, something that’s not quite right, or a cool new feature - it can be set up as a GitHub Issue. Issues are pages for conversations about changes to the codebase and for logging bugs, and act as a ‘backlog’ for the CLIMADA project.
For a bug, or a question about functionality, make a minimal working example, state which version of CLIMADA you are using, and post it with the Issue.
How not to mess up the timeline#
Git builds the repository through incremental edits. This means it’s great at keeping track of its history. But there are a few commands that edit this history, and if histories get out of sync on different copies of the repository you’re going to have a bad time.
Don’t rebase any commits that already exist remotely!
Don’t
--force
anything that exists remotely unless you know what you’re doing!Otherwise, you’re unlikely to do anything irreversible
You can do what you like with commits that only exist on your machine.
That said, doing an interactive rebase to tidy up your commit history before you push it to GitHub is a nice friendly gesture :)
Do not fast forward merges#
(This shouldn’t be relevant - all your merges into develop
should be through pull requests, which doesn’t fast forward. But:)
Don’t fast forward your merges unless your branch is a single commit. Use
git merge --no-ff ...
The exceptions is when you’re merging develop
into your feature branch.
Merge the remote develop branch into your feature branch every now and again#
This way you’ll find conflicts early
git checkout develop
git pull
git checkout feature/myfeature
git merge develop
Create frequent pull requests#
I said this already:
It structures your workflow
It’s easier for reviewers
If you’re going to break something for other people you all know sooner
It saves work for the rest of the team right before a release
Whenever you do something with CLIMADA, make a new local branch#
You never know when a quick experiment will become something you want to save for later.
But do not do everything in the CLIMADA repository#
If you’re running CLIMADA rather than developing it, create a new folder, initialise a new repository with
git init
and store your scripts and data thereIf you’re writing an extension to CLIMADA that doesn’t change the model core, create a new folder, initialise a new repository with
git init
and import CLIMADA. You can always add it to the model later if you need to.