Format
PANGAEA accepts and publishes a wide spectrum of data formats. Thereby, we classify these file formats into three categories – Documentation formats, and Tabular and Binary data formats – based on how these formats are treated and processed during our editorial work, and how they are represented in our data publications.
Important: If you consider submitting data and supplementary documentation to PANGAEA, please make sure to provide open (ideally non-proprietary) formats widely accepted and endorsed in your scientific communities in order to support the accessibility and (re-)usability on long time-scales and by openly available tools.
Documentation
As documentation we consider all files provided in typical text data formats meant to supplement or further describe data submissions (whether in tabular or binary formats), such as processing reports, instrument calibration protocols, standard operating procedures.
Accepted formats for documentation are:
- PDF/A (ISO19005) - http://en.wikipedia.org/wiki/PDF/A
- RTF or ODF (ISO26300) - http://en.wikipedia.org/wiki/OpenDocument
- Microsoft Office files such as *.docx for MS Word, and *.xlsx for Excel spreadsheet documents – compliant to standard OOXML (ISO/IEC 29500:2008 since Office 2013 – https://de.wikipedia.org/wiki/Microsoft_Office),
- or (our favourite and recommended) plain UTF-8 encoded text files - https://en.wikipedia.org/wiki/UTF-8
Tabular data
PANGAEA is specifically good at (or, to put it differently, is able to make the most of) tabular field observation data. Owing to the many efforts we put in harmonizing parameter/variable names, methods, dimensions and units during our editorial processing, and due to the fact that this kind of data is stored in a relational database (PostgreSQL) at PANGAEA, our users are able to easily compile specific parameters/variables of interest from many similar studies (i.e. related individual publications) into meta-studies targeting new research questions and contexts using our Data Warehouse. Or apply similar functionality with the help of the Python module pangaeapy or the R pendant pangaear (see our Tools site for details).
The preferred formats for data tables are:
- TAB-delimited TEXT-files (UTF-8 encoded), or
- (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc).
Please note that data tables are not accepted as encapsulated objects (e.g., in .mat files).
Example for a tabular dataset: https://doi.org/10.1594/PANGAEA.934148
Binary data
Binary objects and documentations are usually stored in a combination of hard-drive arrays (for immediate and performant access) and tape archives. File formats should follow ISO standards or at least de facto standards. Online preview is available for raster graphics and videos (e.g. .tif, .png, .jpeg, .mp4).
Example: https://doi.org/10.1594/PANGAEA.936185
Images
- tiff
- jpeg
- png
Video
see: http://de.wikipedia.org/wiki/Digital_Video
- MPG Container
- MP3
- MPEG2 (for PAL)
- MP4 Container
Audio
- MP3
- WAVE (WAV)
- description http://en.wikipedia.org/wiki/WAV
- example doi:10.1594/PANGAEA.339110
Seismic data
- segy
ADCP
- proprietary binary ping-format, archived on hs, linked to metadescription in PANGAEA
- final processed data in UTF-8, archived in data numeric of PANGAEA (file size 100-500 MB!)
- Example doi:10.1594/PANGAEA.701279
Large array-oriented, scientific data (no models - please see our corresponding Wiki article "Model data and PANGAEA"!)
- Network Common Data Form (NetCDF),
- description http://en.wikipedia.org/wiki/NetCDF
- Unidata/NSF http://www.unidata.ucar.edu/software/netcdf/
- example https://doi.org/10.1594/PANGAEA.940846
- viewer panoply http://www.giss.nasa.gov/tools/panoply/
Compression
- zip is ISO-standard and supported - *.tar, *.rar and *.7z are NOT standard and (at least the latter two) not supported by PANGAEA
Proprietary formats
If proprietary data formats cannot be avoided, please include a reference to open source software, preferably with a DOI, that can be used to open such files (e.g. at GitHub, pypi.org).