Intern:Git for Scripts

Motivation
In order to better organise the common scripts that are used by curators (and, possibly, also by non-curators), we have decided (meeting on 17th January 2024, see the related ticket) to manage all common scripts in a single git repository. This requires some discipline, i.e. common workflows, to coordinate the work of different people on a common code basis.

Use Cases
The common repository will contain small tools, curation scripts or code snippets that can be used to facilitate the data curation process. In the following, we will refer to such self-contained pieces of software as tools.

Regarding the way in which users will interact with the common repository, we have identified the following use-cases:


 * 1) Installation: A user wants to find, install and run a given tool.
 * 2) Modify: A user wants to modify an tool and share the changes.
 * 3) Sharing: A user wants to share a given tool with other users.

Training
The common workflows will require some basic knowledge of git. Users can acquire this knowledge in internal trainings or question-and-answer meetings that can be organised according to need.

Implementation
The common tools will be managed in the gitlab repository data_processing. Different versions of the tools will be organised using git branches (see the official git book).

The repository has a main branch called master, which contains all the official versions of the tools. Users that only need to run tools from the repository will only see this branch.

Other branches are used as temporary branches used to integrate (merge) the local work of a user into the official master branch.

Here is a sketch of how the use cases described in a previous section are mapped to workflows. Most of the workflows are executed in a shell. Under Linux or MacOS you can use your standard shell. Under Windows, you can open the Git Bash shell that is installed along with Git.

In all the examples below, we assume that you will install / have installed the tools under.

Use case: Installation / update
Installing the tools amounts to making a local clone of the gitlab repository. In a shell, type:

$ cd /d/pangaea $ git clone https://gitlab.awi.de/pangaea/data_processing $ cd data_processing

After the tools have been installed locally, there might be some new version available on the gitlab repository. You can update you local installation with:

$ cd /d/pangaea/data_processing $ git checkout master $ git pull origin master

Important note: If you have made changes locally, you have to follow use case Modify, as explained below.

Use case: Modify an existing tool
Precondition: You have already cloned the  repository on you local computer.

Before you make any changes, make sure that you local master branch is up-to-date with its counterpart on the gitlab server:

$ cd /d/pangaea/data_processing $ git checkout master $ git pull origin master

Avoid making changes while you are on the master branch:
 * 1) You will not be able to push these changes to the gitlab server directly, since the master branch is protected on that server.
 * 2) If some other user adds changes to the master branch, you may have conflicts the next time you try to update your local copy.

Instead, make a local branch and switch to it: $ git branch my_branch master $ git checkout my_branch

Initially, your working branch is just a copy of master and is checked out locally. You can start making changes to your code until you have a version you are satisfied with.

You can commit your changes (create a new version containing your changes) with the following commands:

$ git add  $ git add  $ git add  $ ... $ git commit -m 'My commit message.'

Now you have a new version in your local branch.

If you want to share the changes you must merge them into the  branch. First, you have to push the changes to the gitlab repository:

$ git push origin my_branch

You then have to open gitlab and create a merge request for your branch. Please note the following points:
 * In the merge request, make sure that you specify the correct branches so that the request merges from  into.
 * You can specify a reviewer for your merge request if you think that another user should have a look at your changes before they are merged into the main branch.
 * You must assign the merge request to a user with maintainer role. Currently, the users  and   can be specified as assignee.

The assignee will have to finalise (merge) the request once the reviewer has approved it. If no reviewer has been specified, the assignee can merge a request immediately.

There are two sub-cases for a merge request.

The happy case: no merge conflicts
If no merge conflicts are detected, you just have to wait for the merge request to be processed. After that, your changes will be on the  branch on the gitlab repository. In order to have them in your local repository as well, type:

$ git checkout master $ git pull origin master

The less happy case: diverging master on gitlab
This can happen if another user merged their own changes before you created your merge request. In this case, gitlab will complain that the branch that you want to merge is based on an old version of  and will refuse to process the merge request.

This problem can normally be solved automatically, unless the other user has been editing the same files as you and has introduced conflicting changes. In order to avoid this, try to coordinate your work with your colleagues and avoid working on the same files at the same time, if possible.

You can try and solve the problem of diverging branches with the following commands:

$ git checkout master $ git pull origin master $ git checkout my_branch $ git rebase master $ git push origin my_branch --force

After the  command, your branch will be based on the current state of the   branch and the merge request should now be ready to be processed: you are now in the happy case.

If the  command reports any conflicts, you have to resolve them manually. It is not possible to provide a general instruction to solve this kind of problems here. This can be a topic for a git training.

What to do with the local branch?
After you have finished working with your local branch and you have merged your changes to, you can delete it with: $ git checkout master $ git branch -d my_branch $ git push origin my_branch --delete

If you decide to keep your local branch and reuse it several time, always make sure that it is up-to-date with the  branch before starting to work with it again: $ git checkout master $ git pull origin master $ git checkout my_branch $ git rebase master

A short illustration of the main workflow steps can be found in the following PDF file:

Use case: Sharing a tool
Technically speaking, this use case is the same as the previous one, since it amounts to adding the code for the new tool to the project while working on a temporary branch, pushing the temporary branch to gitlab and creating a merge request.

Of course, you should try to choose an appropriate path for the new tool. You can discuss this with other users before the change is merged, e.g. you can invite other users to write comments in you merge request.