Intern talk:Scripts

Motivation
In order to better organise the common scripts that are used by curators (and, possibly, also by non-curators), we have decided (meeting on 17th January 2024, see the related ticket) to manage all common scripts in a single git repository. This requires some discipline, i.e. common workflows, to coordinate the work of different people on a common code basis.

Use Cases
The common repository will contain small tools, curation scripts or code snippets that can be used to facilitate the data curation process. In the following, we will refer to such self-contained pieces of software as items.

Regarding the way in which users will interact with the common repository, we have identified the following use-cases:


 * 1) Installation: A user wants to find, install and run a given item.
 * 2) Modify: A user wants to modify an item and share the changes.
 * 3) Sharing: A user wants to share a given item with other users.

'''Kommentar von Flavia: Ich würde hier 'tool/curation script' vorschlagen. Nicht alle Scripte sind Tools, sondern oft Code-Schnipsel die geteilt und in ähnlicher Form wiederverwendet werden.'''

'''Giorgio: Wäre item eventuell ein passendes Wort? Ich habe es als Vorschlag in den Text integriert.'''

Training
The common workflows will require some basic knowledge of git. Users can acquire this knowledge in internal trainings or question-and-answer meetings that will be organised according to need.

Implementation
The common tools will be managed in the gitlab repository data_processing. Different versions of the tools will be organised using git branches (see the official git book).

The repository has a main branch called master, which contains all the official versions of the tools. Users that only need to run tools from the repository will only see this branch.

Other branches are used as temporary branches used to integrate (merge) the local work of a user into the official master branch.

Here is a sketch of how the use cases described in a previous section are mapped to workflows. Most of the workflows are executed in a shell. Under Linux or MacOS you can use your standard shell. Under Windows, you can open the Git Bash shell that is installed along with Git. When this documentation is more mature, we can consider using other tools such as TortoiseGit.

In all the examples below, we assume that you will install / have installed the tools under.

Use case: Installation / update
Installing the tools amounts to making a local clone of the gitlab repository. In a shell, type:

$ cd /d/pangaea $ git clone https://gitlab.awi.de/pangaea/data_processing $ cd data_processing

After the tools have been installed locally, there might be some new version available on the gitlab repository. You can update you local installation with:

$ cd /d/pangaea/data_processing $ git checkout master $ git pull origin master

Important note: If you have made changes locally, you have to follow use case Modify, as explained below.

Use case: Modify an existing tool
Precondition: You have already cloned the  repository on you local computer.

Before you make any changes, make sure that you local master branch is up-to-date with its counterpart on the gitlab server:

$ cd /d/pangaea/data_processing $ git checkout master $ git pull origin master

Avoid making changes while you are on the master branch: '''Antwort von Giorgio: Ja, die Rechte für das data_processing Repositorium sind mittlerweile eingestellt. Nur Nutzer mit "maintainer" Rolle können einen Merge-Request abschließen. Nutzer mit "developer"-Rolle können nur Merge-Requests erstellen'''
 * 1) You will not be able to push these changes to the gitlab server directly, since the master branch is protected on that server. Frage von Flavia: Ist das inzwischen schon implementiert?
 * 1) If some other user adds changes to the master branch, you may have conflicts the next time you try to update your local copy.

Instead, make a local branch and switch to it: $ git branch my_branch master $ git checkout my_branch

Initially, your working branch is just a copy of master and is checked out locally. You can start making changes to your code until you have a version you are satisfied with.

You can commit your changes (create a new version containing your changes) with the following commands:

$ git add  $ git add  $ git add  $ ... $ git commit -m 'My commit message.'

Now you have a new version in your local branch.

If you want to share the changes you must merge them into the  branch. First, you have to push the changes to the gitlab repository:

$ git push origin my_branch

You then have to open gitlab and create a merge request for your branch. More details on this will follow.

There are now two sub-cases.

The happy case: no merge conflicts
If no merge conflicts are detected, you just have to wait for the merge request to be processed. After that, your changes will be on the  branch on the gitlab repository. In order to have them in your local repository as well, type:

$ git checkout master $ git pull origin master

The less happy case: merge conflicts have been detected
This can happen if another user merged their own changes before you created your merge request. In this case, gitlab will complain that the branch that you want to merge is based on an old version of  and will refuse to process the merge request.

This problem can normally be solved automatically, unless the other user has been editing the same files as you and has introduced conflicting changes. In order to avoid this, try to coordinate your work with your colleagues and avoid working on the same files at the same time, if possible.

You can try and solve the merge conflicts with the following commands:

$ git checkout master $ git pull origin master $ git checkout my_branch $ git rebase master $ git push origin my_branch --force

After the  command, your branch will be based on the current state of the   branch and the merge request should now be ready to be processed: you are now in the happy case.

If the  command reports any conflicts, you have to resolve them manually. It is not possible to provide a general instruction to solve this kind of problems here. This can be a topic for a git training.

What to do with the local branch?
After you have finished working with your local branch and you have merged your changes to, you can delete it with: $ git checkout master $ git branch -d my_branch $ git push origin my_branch --delete

If you decide to keep your local branch and reuse it several time, always make sure that it is up-to-date with the  branch before starting to work with it: $ git checkout master $ git pull origin master $ git checkout my_branch $ git rebase master

A short illustration of the main workflow steps can be found in the following PDF file:

Use case: Sharing a tool
Technically speaking, this use case is the same as the previous one, since it amounts to adding the code for the new tool to the project while working on a temporary branch, pushing the temporary branch to gitlab and creating a merge request.

Of course, you should try to choose an appropriate path for the new tool. You can discuss this with other users before the change is merged, e.g. you can invite other users to write comments in you merge request.

Frage von Flavia: Verstehe ich das richtig, dass wir selbst bestimmen, wer die Merge Request überprüfen soll?

Antwort von Giorgio:
Gute Frage. Es gibt meiner Meinung nach zwei Aspekte:
 * 1) Entspricht ein Merge-Request unserem Workflow, z.B. ist der Branch korrekt von Master abgeleitet? Das ist eine eher technische Frage und könnte am besten von einem Entwickler überprüft werden. Man kann das Git-Repositorium in Gitlab so einstellen, dass nur gewisse Nutzer (Maintainer) einen Merge-Request abschließen können. Ein Nutzer mit "Maintainer"-Rolle würde dann überprüfen, dass der Workflow eingehalten wird.
 * 2) Ist ein Merge-Request inhaltlich korrekt? Z.B., wenn Nutzer A ein Skript ändern will, das von Nutzer B entwickelt wurde, kann Nutzer A Nutzer B als "Reviewer" einladen, um eine Rückmeldung zu bekommen, bevor die Änderungen im Master-Branch landen.

Der Ablauf wäre dann ungefähr so:
 * 1) Nutzer A erstellt einen Merge-Request und setzt Nutzer B als Reviewer.
 * 2) Nutzer A und B besprechen die Änderungen (über Kommentare auf gitlab, per E-Mail, Chat, ...). Nutzer A kann weitere Commits hinzufügen. Diese werden im Merge-Request automatisch angezeigt. Wenn Nutzer A und B zufrieden mit den Änderungen sind, kann Nutzer B den Merge-Request als "Angenommen" markieren.
 * 3) Nutzer C (mit "Maintainer" Rolle) sieht, dass ein offener Merge-Request angenommen wurde, und verwendet die "merge" Funktion von gitlab, um den Merge-Request zu schließen.

Also A, B, C wären die drei Rollen, die involviert sein können. Es kann auch sein, dass Nutzer A das Skript selber entwickelt hatte (A = B), oder dass A auch die "Maintainer" Rolle hat (A = C), usw.