Development and Git and CLIMADA

Chris Fairless

Table of Contents

  • 1  development and Git and CLIMADA

    • 1.1  Introduction

    • 1.2  Git and GitHub

    • 1.3  Gitflow

    • 1.4  Installing CLIMADA for development

    • 1.5  Features and branches

    • 1.6  Pull Requests

    • 1.7  General tips and tricks

Introduction

Git and GitHub

  • Git’s not that scary

    • 95% of your work on Git will be done with the same handful of commands

    • (the other 5% will always be done with careful Googling)

    • Almost everything in Git can be undone by design (but use rebase, --force and --hard with care!)

    • Your favourite IDE (Spyder, PyCharm, …) will have a GUI for working with Git, or you can download a standalone one.

  • The Git Book is a great introduction to how Git works and to using it on the command line.

  • Consider using a GUI program such as “git desktop” or “Gitkraken” to have a visual git interface, in particular at the beginning. Your python IDE is also likely to have a visual git interface.

  • Feel free to ask for help

image0

What I assume you know

I’m assuming you’re all familiar with the basics of Git.

  • What (and why) is version control

  • How to clone a repository

  • How to make a commit and push it to GitHub

  • What a branch is, and how to make one

  • How to merge two branches

  • The basics of the GitHub website

If you’re not feeling great about this, I recommend - sending me a message so we can arrange an introduction with CLIMADA - exploring the Git Book

Terms we’ll be using today

These are terms I’ll be using a lot today, so let’s make sure we know them

  • local versus remote

    • Our remote repository is hosted on GitHub. This is the central location where all updates to CLIMADA that we want to share end up. If you’re updating CLIMADA for the community, your code will end up here too.

    • Your local repository is the copy you have on the machine you’re working on, and where you do your work.

    • Git calls the (first, default) remote the origin

    • (It’s possible to set more than one remote repository, e.g. you might set one up on a network-restricted computing cluster)

  • push, pull and pull request

    • You push your work when you send it from your local machine to the remote repository

    • You pull from the remote repository to update the code on your local machine

    • A pull request is a standardised review process on GitHub. Usually it ends with one branch merging into another

  • Conflict resolution

    • Sometimes two people have made changes to the same bit of code. Usually this comes up when you’re trying to merge branches. The changes have to be manually compared and the code edited to make sure the ‘correct’ version of the code is kept.

Gitflow

Gitflow is a particular way of using git to organise projects that have - multiple developers - working on different features - with a release cycle

It means that - there’s always a stable version of the code available to the public - the chances of two developers’ code conflicting are reduced - the process of adding and reviewing features and fixes is more standardised for everyone

Gitflow is a convention, so you don’t need any additional software. - … but if you want you can get some: a popular extension to the git command line tool allows you to issue more intuitive commands for a Gitflow workflow. - Mac/Linux users can install git-flow from their package manager, and it’s included with Git for Windows

Gitflow works on the develop branch instead of main

image0

  • The critical difference between Gitflow and ‘standard’ git is that almost all of your work takes place on the develop branch, instead of the main (formerly master) branch.

  • The main branch is reserved for planned, stable product releases, and it’s what the general public download when they install CLIMADA. The developers almost never interact with it.

Gitflow is a feature-based workflow

image0

  • This is common to many workflows: when you want to add something new to the model you start a new branch, work on it locally, and then merge it back into develop with a pull request (which we’ll cover later).

  • By convention we name all CLIMADA feature branches feature/* (e.g. feature/meteorite).

  • Features can be anything, from entire hazard modules to a smarter way to do one line of a calculation. Most of the work you’ll do on CLIMADA will be a features of one size or another.

  • We’ll talk more about developing CLIMADA features later!

Gitflow enables a regular release cycle

image0

  • A release is usually more complex than merging develop into main.

  • So for this a release-* branch is created from develop. We’ll all be notified repeatedly when the deadline is to submit (and then to review) pull requests so that you can be included in a release.

  • The core developer team (mostly Emanuel) will then make sure tests, bugfixes, documentation and compatibility requirements are met, merging any fixes back into develop.

  • On release day, the release branch is merged into main, the commit is tagged as a release and the release notes are published on the GitHub at https://github.com/CLIMADA-project/climada_python/releases

Everything else is hotfixes

image0

  • The other type of branch you’ll create is a hotfix.

  • Hotfixes are generally small changes to code that do one thing, fixing typos, small bugs, or updating docstrings. They’re done in much the same way as features, and are usually merged with a pull request.

  • The difference between features and hotfixes is fuzzy and you don’t need to worry about getting it right.

  • Hotfixes will occasionally be used to fix bugs on the main branch, in which case they will merge into both main and develop.

  • Some hotfixes are so simple - e.g. fixing a typo or a docstring - that they don’t need a pull request. Use your judgement, but as a rule, if you change what the code does, or how, you should be merging with a pull request.

Installing CLIMADA for development

  1. Install Git and Anaconda (or Miniconda).

    Also consider installing Git flow. This is included with Git for Windows and has different implementations e.g. here for Windows and Mac.

  2. Clone (or fork) the project on GitHub

    From the location where you want to create the project folder, run in your terminal:

    ::
    
  3. Install the packages in climada_python/requirements/env_climada.yml and climada_python/requirements/env_developer.yml (see install). You might need to install additional environments contained in climada_python/requirements when using specific functionalities.

Features and branches

Planning a new feature

Here we’re talking about large features such as new modules, new data sources, or big methodological changes. Any extension to CLIMADA that might affect other developers’ work, modify the CLIMADA core, or need a big code review.

Smaller feature branches don’t need such formalities. Use your judgment, and if in doubt, let people know.

Talk to the group

  • Before starting coding a module, do not forget to coordinate with one of the repo admins (Emanuel, Chahan or David)

  • This is the chance to work out the Big Picture stuff that is better when it’s planned with the group - possible intersections with other projects, possible conflicts, changes to the CLIMADA core, additional dependencies (see Chahan’s presentation later)

  • Also talk with others from the core development team (see the GitHub wiki).

  • Bring it to a developers meeting - people may be able to help/advise and are always interested in hearing about new projects. You can also find reviewers!

  • Also, keep talking! Your plans will change :)

Planning the work

  • Does the project go in its own repository and import CLIMADA, or does it extend the main CLIMADA repository?

    • The way this is done is slowly changing, so definitely discuss it with the group.

    • Chahan will discuss this later!

  • Find a few people who will help to review your code.

    • Ask in a developers’ meeting, on Slack (for WCR developers) or message people on the development team (see the GitHub wiki).

    • Let them know roughly how much code will be in the reviews, and when you’ll be creating pull requests.

  • How can the work split into manageable chunks?

    • A series of smaller pull requests is far more manageable than one big one (and takes off some of the pre-release pressure)

    • Reviewing and spotting issues/improvements/generalisations early is always a good thing.

    • It encourages modularisation of the code: smaller self-contained updates, with documentation and tests.

  • Will there be any changes to the CLIMADA core?

    • These should be planned carefully

  • Will you need any new dependencies? Are you sure?

    • Chahan will discuss this later!

Working on feature branches

When developing a big new feature, consider creating a feature branch and merging smaller branches into that feature branch with pull requests, keeping the whole process separate from develop until it’s completed. This makes step-by-step code review nice and easy, and makes the final merge more easily tracked in the history.

e.g. developing the big feature/meteorite module you might write feature/meteorite-hazard and merge it in, then feature/meteorite-impact, then feature/meteorite-stochastic-events etc… before finally merging feature/meteorite into develop. Each of these could be a reviewable pull request.

Make a new branch

For new features in Git flow:

git flow feature start feature_name

Which is equivalent to (in vanilla git):

git checkout -b feature/feature_name

Or work on an existing branch:

git checkout -b branch_name

Follow the python do’s and don’t and performance guides. Write small readable methods, classes and functions.

get the latest data from the remote repository and update your branch

git pull

see your locally modified files

git status

add changes you want to include in the commit

git add climada/modified_file.py climada/test/test_modified_file.py

commit the changes

git commit -m "new functionality of .. implemented"

Make unit and integration tests on your code, preferably during development

see Guide on unit and integration tests

Pull requests

We want every line of code that goes into the CLIMADA repository to be reviewed!

Code review: - catches bugs (there are always bugs) - lets you draw on the experience of the rest of the team - makes sure that more than one person knows how your code works - helps to unify and standardise CLIMADA’s code, so new users find it easier to read and navigate - creates an archived description and discussion of the changes you’ve made

When to make a pull request

  • When you’ve finished writing a big new class or method (and its tests)

  • When you’ve fixed a bug or made an improvement you want to merge

  • When you want to merge a change of code into develop or main

  • When you want to discuss a bit of code you’ve been working on - pull requests aren’t only for merging branches

Not all pull requests have to be into develop - you can make a pull request into any active branch that suits you.

Pull requests need to be made latest two weeks before a release, see releases.

Step by step pull request!

Let’s suppose you’ve developed a cool new module on the feature/meteorite branch and you’re ready to merge it into develop.

Checklist before you start

  • Documentation

  • Tests

  • Tutorial (if a complete new feature)

  • Updated dependencies (if need be)

  • Added your name to the AUTHORS file

  • (Advanced, optional) interactively rebase/squash recent commits that aren’t yet on GitHub.

Step by step pull request!

  1. Make sure the develop branch is up to date on your own machine

git checkout develop
git pull
  1. Merge develop into your feature branch and resolve any conflicts

git checkout feature/meteorite
git merge develop

In the case of more complex conflicts, you may want to speak with others who worked on the same code. Your IDE should have a tool for conflict resolution.

  1. Check all the tests pass locally

make unit_test
make integ_test
  1. Perform a static code analysis using pylint with CLIMADA’s configuration .pylintrc (in the climada root directory). Jenkins executes it after every push. To do it locally, your IDE probably provides a tool, or you can run make lint and see the output in pylint.log.

  1. Push to GitHub. If you’re pushing this branch for the first time, use

git push -u origin feature/meteorite

and if you’re updating a branch that’s already on GitHub:

git push
  1. Check all the tests pass on the WCR Jenkins server (https://ied-wcr-jenkins.ethz.ch). See Emanuel’s presentation for how to do this! You should regularly be pushing your code and checking this!

  1. Create the pull request!

  • On the CLIMADA GitHub page, navigate to your feature branch (there’s a drop-down menu above the file structure, pointing by default to main).

  • Above the file structure is a branch summary and an icon to the right labelled “Pull request”.

  • Choose which branch you want to merge with. This will usually be develop, but may be another feature branch for more complex feature development.

  • Give your pull request an informative title (like a commit message).

  • Write a description of the pull request. This can usually be adapted from your branch’s commit messages (you wrote informative commit messages, didn’t you?), and should give a high-level summary of the changes, specific points you want the reviewers’ input on, and explanations for decisions you’ve made. The code documentation (and any references) should cover the more detailed stuff.

  • Assign reviewers in the page’s right hand sidebar. Tag anyone who might be interested in reading the code. You should already have found one or two people who are happy to read the whole request and sign it off (they could also be added to ‘Assignees’).

  • Create the pull request.

  • Contact the reviewers to let them know the request is live. GitHub’s settings mean that they may not be alerted automatically. Maybe also let people know on the WCR Slack!

  1. Talk with your reviewers

  • Use the comment/chat functionality within GitHub’s pull requests - it’s useful to have an archive of discussions and the decisions made.

  • Take comments and suggestions on board, but you don’t need to agree with everything and you don’t need to implement everything.

  • If you feel someone is asking for too many changes, prioritise, especially if you don’t have time for complex rewrites.

  • If the suggested changes and or features don’t block functionality and you don’t have time to fix them, they can be moved to Issues.

  • Chase people up if they’re slow. People are slow.

  1. Once you implement the requested changes, respond to the comments with the corresponding commit implementing each requested change.

  2. If the review takes a while, remember to merge develop back into the feature branch every now and again (and check the tests are still passing on Jenkins). Anything pushed to the branch is added to the pull request.

  3. Once everyone reviewing has said they’re satisfied with the code you can merge the pull request using the GitHub interface. Delete the branch once it’s merged, there’s no reason to keep it. (Also try not to re-use that branch name later.)

  4. Update the develop branch on your local machine.

How to review a pull request

  • Be friendly

  • Decide how much time you can spare and the detail you can work in. Tell the author!

  • Use the comment/chat functionality within GitHub’s pull requests - it’s useful to have an archive of discussions and the decisions made.

  • Fix the big things first! If there are more important issues, not every style guide has to be stuck to, not every slight increase in speed needs to be pointed out, and test coverage doesn’t have to be 100%.

  • Make it clear when a change is optional, or is a matter of opinion

At a minimum - Make sure unit and integration tests are passing on Jenkins - (For complete modules) Run the tutorial on your local machine and check it does what it says it does - Check everything is fully documented

At least one reviewer needs to - Review all the changes in the pull request. Read what it’s supposed to do, check it does that, and make sure the logic is sound. - Check that the code follows the CLIMADA style guidelines #TODO: link - If the code is implementing an algorithm it should be referenced in the documentation. Check it’s implemented correctly. - Try to think of edge cases and ways the code could break. See if there’s appropriate error handling in cases where the function might behave unexpectedly. - (Optional) suggest easy ways to speed up the code, and more elegant ways to achieve the same goal.

There are a few ways to suggest changes - As questions and comments on the pull request page - As code suggestions (max a few lines) in the code review tools on GitHub. The author can then approve and commit the changes from GitHub pull request page. This is great for typos and little stylistic changes. - If you decide to help the author with changes, you can either push them to the same branch, or create a new branch and make a pull request with the changes back into the branch you’re reviewing. This lets the author review it and merge.

General tips and tricks

Ask for help with Git

  • Git isn’t intuitive, and rewinding or resetting is always work. If you’re not certain what you’re doing, or if you think you’ve messed up, send someone a message.

Don’t push or commit to develop or main

  • Almost all new additions to CLIMADA should be merged into the develop branch with a pull request.

  • You won’t merge into the main branch, except for emergency hotfixes (which should be communicated to the team).

  • You won’t merge into the develop branch without a pull request, except for small documentation updates and typos.

  • The above points mean you should never need to push the main or develop branches.

So if you find yourself on the main or develop branches typing git merge ... or git push stop and think again - you should probably be making a pull request.

This can be difficult to undo, so contact someone on the team if you’re unsure!

Commit more often than you think, and use informative commit messages

  • Committing often makes mistakes less scary to undo

git reset --hard HEAD
  • Detailed commit messages make writing pull requests really easy

  • Yes it’s boring, but trust me, everyone (usually your future self) will love you when they’re rooting through the git history to try and understand why something was changed

Commit message syntax guidelines

Basic syntax guidelines taken from here https://chris.beams.io/posts/git-commit/ (on 17.06.2020)

  • Limit the subject line to 50 characters

  • Capitalize the subject line

  • Do not end the subject line with a period

  • Use the imperative mood in the subject line (e.g. “Add new tests”)

  • Wrap the body at 72 characters (most editors will do this automatically)

  • Use the body to explain what and why vs. how

  • Separate the subject from body with a blank line (This is best done with a GUI. With the command line you have to use text editor, you cannot do it directly with the git command)

  • Put the name of the function/class/module/file that was edited

  • When fixing an issue, add the reference gh-ISSUENUMBER to the commit message e.g. “fixes gh-40.” or “Closes gh-40.” For more infos see here https://docs.github.com/en/enterprise/2.16/user/github/managing-your-work-on-github/closing-issues-using-keywords#about-issue-references.

What not to commit

There are a lot of things that don’t belong in the Git repository: - Don’t commit data, except for config files and very small files for tests. - Don’t commit anything containing passwords or authentication credentials or tokens. (These are annoying to remove from the Git history.) Contact the team if you need to manage authorisations within the code. - Don’t commit anything that can be created by the CLIMADA code itself

If files like this are going to be present for other users as well, add them to the repository’s .gitignore.

Log ideas and bugs as GitHub Issues

If there’s a change you might want to see in the code - something that generalises, something that’s not quite right, or a cool new feature - it can be set up as a GitHub Issue. Issues are pages for conversations about changes to the codebase and for logging bugs, and act as a ‘backlog’ for the CLIMADA project.

For a bug, or a question about functionality, make a minimal working example, state which version of CLIMADA you are using, and post it with the Issue.

How not to mess up the timeline

Git builds the repository through incremental edits. This means it’s great at keeping track of its history. But there are a few commands that edit this history, and if histories get out of sync on different copies of the repository you’re going to have a bad time.

  • Don’t rebase any commits that already exist remotely!

  • Don’t --force anything that exists remotely unless you know what you’re doing!

  • Otherwise, you’re unlikely to do anything irreversible

  • You can do what you like with commits that only exist on your machine.

That said, doing an interactive rebase to tidy up your commit history before you push it to GitHub is a nice friendly gesture :)

Don’t fast forward merges

(This shouldn’t be relevant - all your merges into develop should be through pull requests, which doesn’t fast forward. But:)

Don’t fast forward your merges unless your branch is a single commit. Use git merge --no-ff ...

The exceptions is when you’re merging develop into your feature branch.

Merge the remote develop branch into your feature branch every now and again

  • This way you’ll find conflicts early

git checkout develop
git pull
git checkout feature/myfeature
git merge develop

Create frequent pull requests

I said this already: - It structures your workflow - It’s easier for reviewers - If you’re going to break something for other people you all know sooner - It saves work for the rest of the team right before a release

Whenever you do something with CLIMADA, make a new local branch

You never know when a quick experiment will become something you want to save for later.

But don’t do everything in the CLIMADA repository

  • If you’re running CLIMADA rather than developing it, create a new folder, initialise a new repository with git init and store your scripts and data there

  • If you’re writing an extension to CLIMADA that doesn’t change the model core, create a new folder, initialise a new repository with git init and import CLIMADA. You can always add it to the model later if you need to.

Questions

Git and Github logos https://xkcd.com/1597/