Format

From PANGAEA Wiki
Jump to navigation Jump to search

PANGAEA accepts and publishes a wide spectrum of data formats. Thereby, we classify these file formats into three categories – Documentation formats, and Tabular and Binary data formats – based on how these formats are treated and processed during the editorial processing and how they are represented in our data publications.

Important: If you consider submitting data and supplementary documentation to PANGAEA, please make sure to provide open (ideally non-proprietary) formats widely accepted and endorsed in your scientific communities in order to support the accessibility and (re-)usability on long time-scales and by openly available tools.

Documentation

As documentation we consider all files provided in typical text data formats meant to supplement or further describe data submissions (whether in tabular or binary formats), such as processing reports, instrument calibration protocols, standard operating procedures.

Accepted formats for documentation are:

Tabular data

PANGAEA is specifically good at (or, to put it differently, is able to make the most of) tabular field observation data. Owing to the many efforts we put in harmonizing parameter/variable names, methods, dimensions and units during our editorial processing, and due to the fact that this kind of data is stored in a relational database (PostgreSQL) at PANGAEA, our users are able to easily compile specific parameters/variables of interest from many similar studies (i.e. related individual publications) into meta-studies targeting new research questions and contexts using our Data Warehouse or similar functionality scripted with the help of the Python module pangaeapy or the R pendant pangaear (see our Tools site for details).

The preferred Format for data tables is

  • TAB-delimited TEXT-files (UTF-8 encoded), or
  • (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc).

Please note that tables are not accepted as encapsulated objects (e.g., in .mat files).

Example tabular dataset: https://doi.org/10.1594/PANGAEA.934148

Binary data

Binary objects and documentations are usually stored in a combination of hard-drive arrays (for immediate and performant access) and tape archives. File formats should follow ISO standards or at least de facto standards. Online preview is available for raster graphics and videos (e.g. .tif, .png, .jpeg, .mp4).

Example: https://doi.org/10.1594/PANGAEA.936185

Images

Video

see: http://de.wikipedia.org/wiki/Digital_Video

Audio

Seismic data

  • segy

ADCP

Large array-oriented, scientific data (no models - please see our corresponding Wiki article "Model data and PANGAEA"!)

Compression

  • zip is ISO-standard and supported - *.tar, *.rar and *.7z are NOT standard and (at least the latter two) not supported by PANGAEA

Proprietary formats

Include a reference, preferably with a DOI, to open source software (e.g. at GitHub, pypi.org) that can be used to open such files.