Intern:File Upload

PANGAEA File Upload Workflow

This a short overview about the current state of file uploads and file archiving.

Upload limits

 * Via ticket: max size attachments 100 MB / max 20 files
 * Via uploader: max size 15 GB per file, no limit in total volume (for curators no limit in file size)

Create data submission ticket
An author creates a submission request: https://www.pangaea.de/submit/

As curator, check submissions: https://issues.pangaea.de/projects/PDI

Request “upload link” for large file uploads
If files exceed the submission limit, as curator click on “Request file upload”.

Example: https://issues.pangaea.de/browse/PDI-24552

Upload files
Author follows generated link and uploads files to, e.g.

https://issues.pangaea.de/upload/?code=5ef052c2338474.86421682

Web page must be open while uploading. A page reload cancels current uploads. So wait!

Author clicks “Confirm file upload” to comment on submission ticket.

Check uploaded files
As curator, check files and file names e.g. for invalid characters.

Upload files are located on Isilon and can be edited there. This is the source.

can be opened and edited for example in file explorer.

When using WinSCP, the files are located here:

/pangaea/ext/isilon/upload/

After editing, e.g. file rename, you can check the upload overview:

https://issues.pangaea.de/upload/?code=5ef052c2338474.86421682

You can also connect to the server via ssh or filezilla, see below: Get file list for editorial

Binary File Import

 * There are two options: A Files are on your PC or B Files are in the upload tool
 * For the possibilities to upload files/folders from servers it is important that the access rights of these files are right. In case you copied or added a file yourself to the upload folder, for example, the right will not be the ones needed.

A: Files are in the upload tool

 * Go to upload tool and click on 'download Import Matrix'
 * you now have the import file (the paths will start with ), which can be edited and completed with additional parameters or multiple Events
 * Open 4D and goto Import > Data > Open
 * choose import file
 * you are now also requested to choose the folder, but this time click 'abbrechen', because the files are already staged
 * do everything else as usual (transfer from issue, ...)
 * after import you can add additional parameter by adding the binary object parameter to the lower half several times, then changing them to MD5 or file size
 * ready

B: Files archived under isilon project folders (e.g., /isibhv/projects/p_mosaic_als/)
This is relevant for archiving data from AWI Projects.
 * The user pangaeaadm must be added by the project owner through cloud.awi.de (e.g., project p_mosaic_als).
 * The header contains one or multiple parameters, or other parameter of Data type 4
 * The links pointing to /isi/projects start with, followed by the complete path to individual files   * according to Roland Koppe, 2.2.23
 * Useful also for data located at PANGAEA Cloud              * according to Roland Koppe, 2.2.23
 * Transfer from project folders on other isilon instances (e.g., Potsdam = isipd) is established by modifying the link to:   * according to Roland Koppe, 16.1.24

Before using options C or D, please check this section: Staging folders

C: Files archived under AWI-Servers (e.g., hs/projects/)
This is relevant for archiving data from AWI-Geophysic-Group.
 * The file folder on hs must be shared with the user pangaeaadm.  Management by the user takes place via cloud.awi.de. (seismicsea is set up)
 * When archiving multiple files, staging is recommended: see Staging folders'''
 * The header contains one or multiple parameters, or other parameter of Data type 4
 * The links pointing to /hs/gsys/ start with, followed by "server path"

D: Files archived under /hs/platforms (WORM)
This is relevant for archiving raw data from expeditions stored at /hs/platforms (WORM) - this is a special case.
 * For AWI platforms, typically the import files with lists of links are prepared by the AWI Data Logistics Support group and submitted in ticket from  (user: o2a-ingest)
 * When archiving multiple files, staging is recommended: see Staging folders'''
 * The header contains one or multiple parameters, or other parameter of Data type 4
 * The links pointing to /hs/platforms start with, followed by "sensor path"
 * using  creates a symbolic link; the data are not copied; only use for WORM!

E: Files on FTP/SFTP/SCP/HTTP Server
(not testet, please edit or confirm this is working)

This should not be the usual way! Only for very large files that are not shipped via hard drive.


 * User puts files on an FTP/SFTP/SCP/HTTP Server and sends us the link. Login information should be included in the link.
 * The header contains one or multiple parameters, or other parameter of Data type 4.
 * The links can be used as is: http(s):// or ftp://

Access restrictions
During (or also after) import, access restrictions can be set or removed in 4D.

Exchanging files
"Exchanging" individual files cannot be done. Rewrite the old dataset with a complete newly imported one.

Staging folders (Files archived under AWI-Servers, e.g., hs/gsys/, WORM)
Background information: the HSM archiving (and storing) data (hssrv1) documentation is available at RZ confluence. When accessing more than a few files, prior staging is strongly recommended (this will make the processing much faster). After staging, the files are available in the cash and they are online. If the cache is almost full, files are deleted from it and are only available in the archive (they are offline).

Access to HS

 * Connect to AWI VPN (if not in internal AWI network)
 * Open terminal window (e.g. Eingabeaufforderung under Windows or Terminal on Mac).
 * Connect to . Use your AWI user password, for the first time you might need to type   to confirm you trust the connection.
 * Alternatively, when your PC account user name is not identical with the AWI user name, add the short version of the AWI user name to the command:
 * A message starting with "Welcome to Ubuntu..." will appear after successful login.

It is possible to check, if files in a folder are online or offline:

 * Are files in my folder offline (need to be staged)?
 * Are some files in my folder online?

Staging

 * Before staging a folder containing your files for import, you can inspect the content of your folder on hs with the list command.
 * Stage all file in the entire folder (incl. subfolders):.
 * Alternatively, if this leads to an error command not found:.
 * Check the staging status: . When the "user" column of the report contains your user name or pangaeaadm, you need to wait (see Fig.)
 * After staging is completed, import the list of files (see above sections C and D).

Alternatively, when only some of the files within one folder should be staged (some are already online or only certain extensions should be staged):
 * Stage only files in the folder, which are currently offline:
 * You can check first, which files will be actually staged using the command echo. It will print a list of individual staging commands, instead of executing them:
 * If the commands are all correct, echo can be removed.
 * Stage only files with certain extension (e.g. netCDF files) using -name .*nc:
 * Check the staging status: . When the "user" column of the report contains your user name or pangaeaadm, you need to wait (see Fig.)
 * After staging is completed, import the list of files (see above sections C and D).

Setting rights
When data are added by editors to the Uploader folders, or after unzipping, the group rights needed for the importer are not present. These can be corrected, 2 methods for that are presented below:

Linux commands in JupyterHub terminal

 * Open a new terminal window in https://jupyterhub.awi.de (while connected via AWI VPN).
 * Change directory to the relevant folder.
 * Before changing the rights, you can inspect the content (and rights) of your folder with the list command.
 * Change the rights for all files in the folder:  or only for selection (e.g., all nc files):
 * After changing the rights, you can inspect the content (and rights) of your folder with the list command again.

Filezilla
Step 1:
 * add the pangaea-im2 server to your Filezilla (only for the first time)
 * Protocol: SFTP- SSH
 * Server pangaea-im2.awi.de
 * Your user name and passwort
 * select the pangaea-im2 server and connect

Step 2:
 * Find the destination folder on the right: /isi/pangaea/upload/PDI-XXX
 * Copy files (left to right) or when existing, inspect their rights

Step 3
 * Select files for which the rights need to be changed (right click, Dateiberechtigungen)

Step 3
 * Change the group rights (the result should be 770)

Links

 * Handout "Hands-on training AWI cloud infrastructure 2022-06-22"