Authors Guides

From PANGAEA Wiki
Jump to navigation Jump to search

These guides provide essential information for data submitters and authors on how to prepare and submit their data for publication with PANGAEA. We recommend that you read the following information carefully before submitting data to us. These instructions includes the scope of PANGAEA, editorial criteria and processes, and preparation guides for metadata and data.

I. Mission and Scope

The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth and environmental science, compliant to our Terms of Use. PANGAEA guarantees long-term availability of its content for at least 10 years (~75% is actually older than that). PANGAEA is open to any project, institution, or individual scientist to archive and publish research data.

PANGAEA focuses on georeferenced observational and experimental research data. Citability, comprehensive metadata descriptions, interoperability of data and metadata, a high degree of structural and semantic harmonization of the data inventory as well as the long-term commitment of the hosting institutions (AWI & MARUM) ensures the FAIRness (Wilkinson et al., 2016) of archived data both for use by humans and machines (i.e. tools and scripts, federated infrastructures, data portals and aggregators etc.).

Most of the data published on PANGAEA are freely available and can be used under the terms of the license mentioned on the dataset description. A few password-protected datasets are under moratorium due to ongoing projects. The metadata for all published datasets is always open-access under the CC0 license and includes the Principal Investigator (PI) who can be contacted for individual access.

Each dataset can be identified, shared, published and cited by the data citation, which includes a Digital Object Identifier (DOI). PANGAEA also allows data to be published as supplements to science articles (example) or as citable data collections in combination with data journals such as Nature Scientific Data, Geoscience Data Journal, Earth System Science Data and others.

The PANGAEA data editorial ensures the integrity and authenticity as well as a high usability of your data. Archived data are machine readable and mirrored into our Data Warehouse which allows efficient compilations of data.

If you find PANGAEA useful for your work, please help us in maintaining our service by citing:

Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x

II. Editorial Criteria and Processes

General Information

PANGAEA is committed to publishing high quality datasets in maximum compliance with the FAIR Data Principles. During the publication process, data and metadata are checked for completeness and plausibility, and are structurally harmonized. This harmonization and standardization promotes a high degree of reusability and interoperability of the data stock and, among other things, supports the optimal readability and further processability of the data by machines and algorithms (Felden et al., 2023). Following standardized procedures, the PANGAEA Editorial Team systematically reviews incoming data submissions and decides whether the submissions are sufficiently mature and of the appropriate quality to be published with PANGAEA. Data submissions that do not meet the scope and/or our quality requirements will be rejected.

Before submitting data to us, please check if there is a community-specific certified FAIR-data-repository for your data type. Community-specific data repositories may be able to better describe, represent and publish your type of data, or bring your data into a discipline-specific context. The repository search platform re3data may be very helpful in this regard.

Data types and file formats accepted by PANGAEA

PANGAEA publishes primary/validated data from many fields of Earth and Environmental Science as well as Biodiversity research. This includes georeferenced observational (example: https://doi.pangaea.de/10.1594/PANGAEA.967645) and experimental data (example: https://doi.org/10.1594/PANGAEA.966520). PANGAEA is specialized in field observation and experimental data in two-dimensional tabular format with parameters/variables measured provided in columns.

Preferred formats for data are TAB-delimited text files in UTF-8 encoding, or (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc. - please see the corresponding wiki article for more information). Example: https://doi.org/10.1594/PANGAEA.937808. Tables are not accepted as proprietary or encapsulated file types (e.g., Matlab files .mat or PDF files).

Binary objects such as NetCDF files, seismic data files (e.g. segy), photos/images and videos are also accepted as long as they are fully described with metadata. In order to follow the FAIR data principles and guarantee reusability for PANGAEA data publications, all such binary files must be usable with open source software. Example: https://doi.org/10.1594/PANGAEA.936185

As an addition to numerical (or binary) data, supporting documentation on datasets can be archived and published (e.g. processing reports, instrument calibration protocols, standard operating procedures). These can be submitted as PDF/A, plain text or open document formats like RTF, ODF or MS Office documents (.docx, .xlsx). Links to such documentation already published elsewhere are also possible. In this case, please provide a complete reference.

Data types and formats PANGAEA does not accept (i.a. no longer)

Raw Data: Raw data without metadata (Processing level 0) are not accepted in PANGAEA. Raw data with their metadata (Processing level 1) may be accepted under certain circumstances and should be accompanied with their primary/validated data.

Sequence data: PANGAEA does not archive molecular sequence data, but will accept related (meta)data and create cross-links to, e.g., the European Nucleotide Archive (ENA). For more information, please read: Molecular data in PANGAEA. If your molecular data are accompanied by environmental parameters, we recommend that you submit your data to GFBio's free and multidisciplinary publication service.

Model data: Many, but not all, types of model and simulation data will not be longer accepted for publication. Please read more about our constrains, definitions and explanations in our corresponding Wiki article. Data outputs from models that rely entirely on algorithms and (process) generalizations, and have no concrete (and clearly specifiable) spatial reference to field observations, will not be accepted by PANGAEA. For climate modeling/simulation data, the World Data Center for Climate (WDCC), run by the German Climate Computing Center (DKRZ), provides an established long-term archival and publication service.

Software/Code: PANGAEA is not a suitable platform to publish software. In general, we recommend storing and managing software products or any kind of scripts and code on specialized platforms such as GitHub in combination with versioned publishing in  e.g. Zenodo. Zenodo provides persistant identifiers which can be cross-linked to your dataset published in PANGAEA. This is the preferred method to combine PANGAEA datasets and relevant versions of code.

Other formats: Data presented exclusively as plots/figures, standalone PDF or MS Word documents will not be published. Tables in device-specific (e.g. CTD sensor output) or proprietary formats such as Matlab .mat files and R-files or other program-specific formats will not be accepted for publication. The same applies to topic/community-specific formats, which cannot be reused with open source software. These file types require transformation into accepted file formats (see our wiki article Data types and formats).

Usual turn-over times, timelines and publishing options

Depending on the extent and complexity of your data submission the editorial process and minting of DOI names for submissions, not affiliated with our hosting institutions MARUM or AWI, to project and institutional partners (our front offices) may take up to several months. Temporary access keys for journal reviewers can be provided once our (initial formal and subsequent in-depth) review stages have been completed and the data have been successfully ingested into the PANGAEA system. Usually this takes 6-8 weeks after the initial acceptance of the submission for publication. A data citation including the DOI is created at the very end of the curation workflow. Therefore, we strongly recommend submitting data as early as possible so that the respective citation and DOI can be generated in time to be included in your scholarly publications. To gain higher processing priority, PANGAEA is open to project- and institutional cooperations including human ressources.

Regarding (optional) associated moratoria and updates to data during the paper publication process, we offer several options for data publications:

Dataset option for moratorium and open access
No Dataset options Status DOI
1 In review & access restricted Dataset is open for corrections, citation is preliminary, metadata are available to the public, data not available to the public (e.g. during article review) DOI not registered, changes/updates in metadata and data possible
2 In review & no access restriction Dataset is open for corrections, citation is preliminary, metadata and data already available to the public (e.g. public review) DOI not registered, changes/updates in metadata and data possible
3 Published & access restricted Dataset is final and fully citable, metadata are available to the public, data not yet available to the public (moratorium) DOI registered, no more changes except for information regarding your paper publication
4 Published & no access restriction Dataset is final and fully citable, metadata and data are open access under the CC0 and CC-BY license, respectively DOI registered, no more changes except for information regarding your paper publication

The editorial process

Workflow overview of a data publication

The workflow for a data publication from source to publication is similar to the process established in the scientific literature (submission > review > editing > publication). The editorial process follows a two-step review procedure and is coordinated by the Editor-in-Chief and our Data Editors. The workflow and communication of each data submission is documented and tracked through our ticket system.

The workflow is an interaction between the (corresponding) author and the editorial team and consists of 8 steps:

  1. Data submission - Authors submit their dataset and a contextual description of their dataset (metadata) using the submission online form. They follow the guidelines provided in this document and, if necessary, project or institution specific data policies.
  2. Initial review - Editorial staff will review the submission to determine acceptance for further evaluation. Consultation with our expert editors may be part of this decision. The main focus of this review stage is to assess the scope and significance of the data submitted for publication with PANGAEA, to evaluate the data submission for completeness of metadata, and to assess the validity and format of the data provided. If the necessary requirements are not met, a reminder will be sent to the author.
  3. Acceptance/Rejection - Once the submission is considered complete and the dataset is accepted for publication in PANGAEA, the author will be notified via the ticket system and associated emails. If the submission does not meet the requirements of PANGAEA, it will be rejected. In this case, the author will be informed.
  4. Editorial Review - Once the submission is due for processing, it is assigned to an expert data editor. The editor thoroughly reviews the metadata and data for validity and plausibility. The editor will contact the author, if there are open questions about the submission. Please note: If the data and metadata do not meet PANGAEA's quality standards or the submitting author does not respond to the editors’ requests, the submission may be rejected.
  5. Processing/Data import - Data and metadata are prepared for import into the relational database, or for file archiving on our servers. During this process, the metadata and data are structurally harmonized and aligned with standardized terminologies. Submitted data can be reformatted by the editor to comply with the PANGAEA data model. This step may involve transposing, merging or splitting tables, adding metadata columns (such as official event labels and geocodes), etc.. After import, the editor performs a final check of the dataset.
  6. Dataset proof - The editor sends a temporary link of the dataset landing page to the author(s) and asks for a proofread. The DOI is assigned, but not yet registered ("activated"). The dataset status is set to "in review" and data remains password protected at this stage (option #1 in the table above). Associated metadata is always open access (CC0 license).
  7. Corrections - Through an iterative process between author and editor, the dataset is edited until the final approval by the author.
  8. Publication - The dataset status is set to "published"; the DOI will be activated four weeks after the final edits and will then be part of the official dataset citation. At the author's request, password protection may be maintained (or set up) for a period of up a maximum of two years (option #3). Otherwise metadata and data are open access under the respective license (option #4). In case the dataset is under moratorium a temporary access link with an expiration date can be provided at the request of the author, e.g. to share the data with individuals or groups, such as co-authors or anonymous reviewers.

Costs

Basic operations are covered by institutional and public funding, but in order to ensure high quality processing and archiving of new data, PANGAEA requires additional funding. In the case that the data are submitted as part of a project with funding available for publication costs (e.g. as part of the costs for Open Access publications at the DFG), PANGAEA would appreciate a financial contribution of 500.– € (net) per data submission. Other forms of funded collaborations are highly appreciated (e.g. as project partner). Please contact us for further information.

III. Guidelines for data submission and formatting

This chapter describes how to prepare your metadata and data for submission to PANGAEA. We recommend that you read these guides before submitting your data, and that you familiarize yourself with the PANGAEA publication style by reading about the scope of PANGAEA and by searching for and viewing datasets typical for your research field.

Please note that by registering to PANGAEA and by submitting data to PANGAEA you have agreed to our Terms of Use.

PANGAEA is an international data publisher, therefore we accept data submissions (including all data and metadata as well as any supplementary information) only in English. All resulting publications and our communication with data authors will also be in English.

PANGAEA datasets are intended to be self-contained and self-explanatory, i.e. a potential user of the data should be able to judge the quality and suitability for re-use (fit-for-use/fit-for-purpose). Therefore, complete metadata describing the dataset comprehensively and according to the FAIR principles must be available.

For more guidance on how to properly prepare metadata and data please see the guidelines below and our video tutorials. We also offer community workshops twice a year to support our users. The winter workshop focuses on data submission issues. The workshop usually held in early summer focuses on data search and (i.a. automated) access for re-use of PANGAEA publications. If you are interested, please subscribe to our training mailing list here.

Prepare your data and metadata for submission - a step by step guide through our submission form

All data must be submitted using our online submission form. Data submitted by any other means will not be processed or passed on. If you have any questions or comments about your data submission, please use either the comment field in the online submission form (step 5) or our contact form. You can also leave a comment in the submission ticket that is automatically created when you complete the form.

A commented guide through the submission form

Step 1 - Basic information

  • Title: Provide a dataset title that briefly describes what was measured, observed, or calculated, when, where, and how. The title must be independent of the title of the manuscript/paper.
  • Authors: Lists all authors of the dataset. Use full names, not initials. Authors' names are case-sensitive, do not use all capital letters for last names (example: Roe, Jane). Please provide the correct e-mail address for each author, no duplicates. If there is really no email address no-reply@pangaea.de can be entered. Fill in the affiliation field (use full names, no abbreviations, ideally according to the Research Organization Registry (ROR)).
  • Keywords: Provide suitable keywords
  • Abstract: Add a dataset abstract that is independent of the manuscript/paper abstract. The abstract should provide a concise and method-oriented description of the observation or measurement, i.e. what, when, where, why and how the data was collected. The summary should consist of meaningful running text. The format of the dataset abstract is the same as for paper abstracts. We expect more than two sentences, and ideally the length should be limited to 5000 characters. Avoid including interpretations of the data. For further information please refer to the documentation on data abstracts for PANGAEA.
  • License: Select the appropriate license for your dataset. We recommend the CC-BY 4.0 license option. Please read our wiki article to understand why.

Step 2 - References

  • References: Include all relevant references as full citations, not limited to a DOI including the manuscript(s) to which the data belong(s). Include any additional references mentioned in the data, methods or abstract. Include SOPs, processing or calibration reports, AWI Registry handles/links, or any other complementary documentation, if available.

Step 3 - Projects and Grants

  • Projects: Provide names and references to related projects, grants and awards.

Step 4 - Upload

  • Upload: Upload your data files here. Please see below how to prepare your data files.
    • More than 20 Files? → Please check the "Request upload link" box. You will receive an upload link within one to three days. Submissions containing more than 20 files may be rejected without further notice. Please replace spaces in file names before uploading.
    • Files larger than 100 MB? → Please check the "Request upload link" box. Individual files must be less than 15 GB in size, but multiple files can be uploaded at the same time using the Uploader. Please replace any spaces in file names before uploading them.
  • File description: This is where you describe your files. If your submission consists of more than one data table or dataset, please provide a title, authors, and abstract for each.

Step 5 - Submit

  • Comment: Field for any request/comment for the PANGAEA editors
  • Moratorium: Check, if you need a moratorium. If yes, please select the end date. The default is 6 months, if no end date is selected, the maximum is 2 years.
  • Terms of Use: please read and accept our ToU.

Changes to submissions via our online form

If you need to change or add metadata after submitting, please use the (blue) "Edit Metadata" button in the submission ticket only. The link will be sent to you automatically after you complete the form. Please note that for technical reasons direct edits in the description field of our ticket system are invalid and cannot be accepted. This is especially true for abstracts. Abstracts submitted as data files will not be considered.

Requirements for the data files and their metadata

PANGAEA publishes data from earth and environmental science research in various formats. Tabular data are the main focus of PANGAEA and should be prepared as TAB-delimited text-files in UTF-8 encoding, or (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc.). Please take a look at our best practice manuals and templates, which outline our requirements for relevant metadata and the structure of submitted data tables.

Metadata about the data

Data tables and data files are provided with metadata about the sampling/measuring stations or equipment, and the parameters/variables measured. The following list of meta-information is required for each data submission to PANGAEA.

  • Campaign: Were the samples or measurements relevant to your data collected during campaigns, expeditions, field trips or cruises? We subsume these under the label "Campaign", which is best described by the meta information listed below. We recommend using our template “Campaign” or the sheet "Campaign" in the excel file in our templates to provide the following required information, if applicable:
    • Campaign_Label, e.g. the respective cruise number
    • Basis, e.g. the name of the ship, station, aircraft etc. Please leave this field blank, if no basis can be provided.
    • Start date(/time) in ISO format and UTC, i.e.. YYYY-MM-DDThh:mm:ss, UTC
    • End date(/time) in ISO-format and UTC, i.e. YYYY-MM-DDThh:mm:ss, UTC
    • Responsible scientist(s)
    • For ship expeditions start and end port
    • For expeditions with German research vessels, please refer to the cruise inventory and report information according to this list.
  • Event: An event refers to the sampling or measurement site or position for field observations, or the sampling location of organisms or media such as water used for experiments. Please refer to the event documentation for more details. Please use the text file “Event” or the sheet "Event" in the Excel workbook from our templates whenever possible. Information that should be provided includes:
    • Event_Label - refers to a representative short name or label for the station or locality of a sampling event. For data from expeditions with German research vessels please use the official event labels and station lists provided in the cruise inventory.
    • Latitude and Longitude - considered mandatory event metadata both must be specified in decimal degrees and conform to WGS84 (positive for north, negative for south). For profiles, please provide start and end positions.
    • Elevation - the “3rd geocode”. Please specify start and end elevations for profiles.
    • Date/Time of sampling/measurement provided in ISO format (YYYY-MM-DDThh:mm:ss) and in UTC. An additional column with local date/time may be provided. For profiles and time series, please provide start and end date/time.
    • Device or method used for sampling or the measurement
    • Campaign, see above
    • Any other event related information, e.g. mesh size of nets, core length of sediment and ice cores, International Generic Sample Number (IGSN).
  • Parameters: In PANGAEA, the measurement variables are referred to as parameters. Entries for parameters always require the full parameter name and its unit, if available. You can see the complete list of available parameters here. Please note that parameters may also include the medium (e.g. “Temperature, air” or “Temperature, water”) or other details for disambiguation purposes. Please provide additional information on parameters, e.g. the Principal Investigator and methods as a comment within the data submission, in an additional metadata text document or in the sheet “Parameters” of our Excel template files. Information to be provided:
    • Parameter names of the measured or determined entities given in full, not abbreviated.
    • Unit (SI units are preferred)
    • Include the Principle Investigator (PI) for the measured parameters. The PI is the person being responsible for the acquisition and the scientific quality of the data or a data series.
    • If applicable, identify the primary instrument used to measure each specific variable/parameter, preferably in the following standardized format: "Instrument type, Manufacturer, Model name". If you did not use an instrument, please provide the methodology used instead, preferably in the following (also standardized) format: "Method type according to Reference et al. (YYYY)". Further details on how to provide instrument or method information can be found in the respective documentation.

How to prepare tabular Data:

This section summarizes the formal and structural requirements for data in tabular form. Adherence to these  requirements will significantly reduce the most time-consuming aspect of our editorial work and, thus, support our efforts to reduce the overall processing time for submissions to PANGAEA significantly. Significant deviations from these requirements are therefore likely to result in rejection of the submission.

Structure of tabular data:

  • Submit your tables as TAB-delimited text files (UTF-8 encoding), or as (open) spreadsheet files (e.g. Libreoffice Calc .ods or MS Excel .xlsx).
  • The first column should always contain the event label, followed by columns with the 3rd geocode (e.g. height/depth) and/or sample IDs and sample information. These are followed by columns containing the measured variables (parameters). Each value in a row should refer to the event and the 3rd geocode specified in columns 1 and 2.
  • The first row is reserved for the column headers, which contain the full parameter names including units in square brackets.
  • Multiple tables with different structures should always be provided as separate data files.
  • Please use our template files whenever possible to report your data in the correct way.

How to prepare binary files:

  • One data file: please provide a brief description of what is contained in the file (e.g. in the ‘File description’ field  in Step 4 - “Upload” of our submission form).
  • More than one data file: please include a summary table, provided as TAB-delimited text file (UTF-8 encoding) or spreadsheet including file names, a brief description of each file (< 255 characters including spaces) and corresponding Geocodes.
  • Use filenames without spaces
  • Request an upload link if you are submitting more than 20 files or files larger than 100 Mb (see Step 4 - “Upload” of our submission form).

Additional information and useful links

Frequently asked questions about data submission are listed here FAQ.

Do you have questions about the status of your submission or the status of your submission? Please contact us via your submission ticket or via our contact form.