Binary object

From PangaWiki
Jump to: navigation, search

Binary objects (BO or binary large object = BLOB) are digital files from photos, grafics, films, accoustics or any binary data in specific formats. BOs are stored in a file system on a hard disc/tape robot combination (hs=hierarchical storage). The metadata, describing the BOs, are stored in Pangaea and contain a link to the BO.

BOs are archived in three steps:

(A) Upload of binary objects

  1. File upload to storage system using as far as possible standard formats. Proprietary formats should follow scientific de-facto standards, e.g. segy, NetCDF. Photos may contain a description in the IPTC metadata fields which ensures, that the photo is always accompanied by its descriptive information. (access to storage system via the data librarian or the data curator of AWI).
  2. Keep path for metadata definitions and import

(B) Metadata definition or import

  1. If locations of the BO are related to a campaign/cruise/expedition, the required fields need to be filled in the campaign table.
  2. If a BO is georeferenced, the event/station/site/profile need to be imported/defined in the event table.
  3. Define authors in the staff table and publications in the reference table (if applicable).

(C) Data import Data are archived with one data description per event. Required description fields are: Data set title, principle investigator (PI), method, comment, and reference(s) if published. A detailed description of the BO can be added as a pdf- or txt-file.

There are different ways of archiving BOs:

a) The event is a station with one (two) pair of latitude/longitude -> all BOs are added to this single event, but may be differentiated by geocode DATE/TIME.

b) The event is a profile described in detail by several pairs of lat/long, giving each BO its own georeference; -> in this case a list must be imported, containing the columns:

  • Event label
  • date/time
  • latitude
  • longitude (both in decimal degree)
  • path to the BOs as URL
  • further columns with analytical and/or technical data may be added as required.

c) If a BO is not georeferenced, the data description is defined in the following steps:

  1. open the Data sets table, click on New
  2. add Author(s), Title, Source or Reference and Project on the Basics card
  3. add Comment, Keywords and Topologic type on the Details card
  4. on the Web card, check the static box (which means, that you will add a static link)
  5. add the path linking the BO in the URL field
  6. add the path to a comment file in the URL Data details field
  7. leave the field Export filename empty
  8. press Save

If tables containing links to BOs are imported as data files, filenames should follow a certain syntax: EventLabel_describer.extension. The describer can be used individually and may contain a code (date/time, ordinal number, depth, length, ...) or the type of a BO (photo, x-ray, description, video, ...) or a combination of both.

Examples for filenames of BOs:

  • Photo EventLabel_photo.jpg
  • Photo of a segment EventLabel_begin-end_photo.jpg (GeoB1234-5_678-778_photo.jpg)
  • Photo, x-ray EventLabel_x-ray.tiff (GIK12345-6_000-100_x-ray.jpg)
  • Photo w time EventLabel_hh_mm_ss.png
  • Photo w date/time EventLabel_yyyy-mm-ddThh_mm_ss.jpg
  • Photo w ordinal number EventLabel_0034.png
  • Grafic of a segment EventLabel_begin-end.gif
  • Description as pdf EventLabel_begin-end_descr.pdf (GeoB1234-5_678-778_descr.pdf)
  • Plain text EventLabel_comment.txt (PS58/012-5_agecontrol.txt)
  • Film/sound sequence EventLabel_begin-end_video.avi

In case several events occured at one site (e.g. box corer followed by a piston corer), one directory, named after the site, should contain the documentation BOs of all events. This will allow a synoptic overview of the availability of all photos and graphics from the same site; however a data set description must be defined for each event (in this case all pointing to the same folder).

File types/MIME types of BO

data

  • -.nc (netCDF)

images

  • -.tif or tiff
  • -.jpg
  • -.png
  • -.gif
  • -.pict (should be removed)

documents

  • -.pdf
  • -.txt
  • -.xml

audio/video

  • -.ogg
  • -.wav
  • -.mp2
  • MPEG-4 im container -.mp4 (ISO/IEC-14496)

seismic

  • -.sgy or -.segy

archiv

  • -.zip
  • -.tar

see also MIME Media Types

Workflow for import (images)

  • upload images/photos
    • AWI intern to smb://pcsrv1.dmawi.de/pangaea
    • extern to lokal ftp-server
  • email to [1] about provision of files; in case of ftp transfer about location on external ftp-server
  • provide metadata
  • generate thumbnails (250 x 250 px)
  • store thumbs in the long-term area of the Pangaea web server at http://store.pangaea.de/Images/...
  • upload images to the storage system at /hs/gwp/pangaea/Images (will link to http://hs.pangaea.de/Images/...)
  • import metadata/data set including the links to the images

Login/Authorization for BOs

As BOs are not stored inside the PANGAEA database, it is not possible to restrict access through the native database functions. Because of that, the files inside http://hs.pangaea.de and http://store.pangaea.de are linked to the dataset ID and the web server enforces the authorization against the dataset PIs / explicitely defined users (see Login).

It is recommeneded to link all directorys in HS/STORE to the corresponding data set (even if no authorization is required)!

To link a directory on HS/STORE to a dataset, just place an empty file with the following name in the directory: ".datasetXXXXX", where XXXXX is the ID of the data set. This file works like a "marker". It tells the web server to use this dataset ID for enforcing access constraints for all files and sub-directorys where the .datasetXXXXX files resides (do not forget the leading dot in the file name, which means "hidden" on UNIX).

Important: Although the web server is also available via HTTPS, please create links in datasets using HTTP only. This lowers CPU usage on both client and server while downloading huge files. If the files require a login/password, the user is automatically upgraded to HTTPS for login and download. Therefore it is recommended to make datasets with files on HS freely accessible.

Important: The restriction only works for the whole directory and all its sub-directories, so you should put every dataset in a separate directory!

If you want to enable directory listing, you can even place a ".dirlist" file inside the directory, which tells the web server, to supply a listing (for all files and sub-folders).

Please note: ".htaccess" files were disabled on HS/STORE. If such a file exists, the web server responds with HTTP 500 Internal Server Error.

Using this mechanism, it is also possible to restrict access to metadata/other versions of the data set (e.g. a PDF file). Just place these files in a folder on HS/STORE and place a ".datasetXXXXX" file in the directory. The files can only be downloaded, if the user is logged in according to dataset's restrictions.

Examples of data sets linked to BO

  • doi:10.1594/PANGAEA.108215 dataset consists of graphics and photos as a documentation of a geological sediment core. In the header the fields of the metainformation as described above are listed. In HTML-view the grafic objects are shown as thumbnails, a click on a thumbnail starts download of the file in full resolution.
  • doi:10.1594/PANGAEA.198686 shows photos from the ocean floor along a transect. Due to a missing relation between photo and precise position, all photos are related to a single pair of latitude/longitude only; in addition the lat/longs describing the track are given in one minute time resolution as additional text file. For a convenient download of all photos an extra link is added.
  • doi:10.1594/PANGAEA.119185 is a fully georeferenced data set containg the description of a cruise track with links to BO as seismic and grafic files from certain segments of the track. Columns with further technical data are added. The URL shows the path to the mass storage system for BO used by PANGAEA.