Format
PANGAEA accepts and publishes a wide spectrum of data formats. Thereby, we classify these file formats into three categories – Documentation formats, and Tabular and Binary data formats – based on how these formats are treated and processed during the editorial processing and how they are represented in our data publications.
Important: If you consider submitting data and supplementary documentation to PANGAEA, please make sure to provide open (ideally non-proprietary) formats widely accepted and endorsed in your scientific communities in order to support the accessibility and (re-)usability on long time-scales and by openly available tools.
Documentation
As documentation we consider all files provided in typical text data formats meant to supplement or further describe data submissions (whether in tabular or binary formats), such as processing reports, instrument calibration protocols, standard operating procedures.
Accepted formats for documentation are:
- PDF/A (ISO19005) - http://en.wikipedia.org/wiki/PDF/A
- RTF or ODF (ISO26300) - http://en.wikipedia.org/wiki/OpenDocument
- MS Office files - standard OOXML (ISO/IEC 29500:2008 since Office 2013 - https://de.wikipedia.org/wiki/Microsoft_Office), e.g. *.docx for MS Office document and *.xlsx for Excel spreadsheet files
- or (our favourite and recommended) plain UTF-8 encoded text files - https://en.wikipedia.org/wiki/UTF-8
Tabular data
PANGAEA is specifically good at (or, to put it differently, is able to make the most of) tabular field observation data. Owing to the many efforts we put in harmonizing parameter/variable names, methods, dimensions and units during our editorial processing, and due to the fact that this kind of data is stored in a relational database (PostgreSQL) at PANGAEA, our users are able to easily compile specific parameters/variables of interest from many similar studies (i.e. related individual publications) into meta-studies targeting new research questions and contexts using our Data Warehouse or similar functionality scripted with the help of the Python module pangaeapy or the R pendant pangaear (see our Tools site for details).
The preferred Format for data tables is
- TAB-delimited TEXT-files (UTF-8 encoded), or
- (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc).
Please note that tables are not accepted as encapsulated objects (e.g., in .mat files).
Example tabular dataset: https://doi.org/10.1594/PANGAEA.934148
Binary data
Binary objects and documentations are usually stored in a combination of hard-drive arrays (for immediate and performant access) and tape archives. File formats should follow ISO standards or at least de facto standards. Online preview is available for raster graphics and videos (e.g. .tif, .png, .jpeg, .mp4).
Example: https://doi.org/10.1594/PANGAEA.936185
Images
- tiff
- jpeg
- png
Video
see: http://de.wikipedia.org/wiki/Digital_Video
- MPG Container
- MP3
- MPEG2 (for PAL)
- MP4 Container
Audio
- MP3
- WAVE (WAV)
- description http://en.wikipedia.org/wiki/WAV
- example doi:10.1594/PANGAEA.339110
Seismic data
- segy
ADCP
- proprietary binary ping-format, archived on hs, linked to metadescription in PANGAEA
- final processed data in UTF-8, archived in data numeric of PANGAEA (file size 100-500 MB!)
- Example doi:10.1594/PANGAEA.701279
Large array-oriented, scientific data (no models - please see our corresponding Wiki article "Model data and PANGAEA"!)
- Network Common Data Form (NetCDF),
- description http://en.wikipedia.org/wiki/NetCDF
- Unidata/NSF http://www.unidata.ucar.edu/software/netcdf/
- example https://doi.org/10.1594/PANGAEA.940846
- viewer panoply http://www.giss.nasa.gov/tools/panoply/
Compression
- zip is ISO-standard and supported - *.tar, *.rar and *.7z are NOT standard and (at least the latter two) not supported by PANGAEA
Proprietary formats
Include a reference, preferably with a DOI, to open source software (e.g. at GitHub, pypi.org) that can be used to open such files.