Talk:Data submission

'''Diese Seite wird von Stefi erstellt und bearbeiter. Bitte seht davon ab, hier etwas einzutragen, zu korrigieren oder zu ergänzen bis wir die Seite für die allgemeine Kritik, Korrektur und Wünsche-Runde frei geben.'''

=Authors Guides (Soll Seitentitel werden)=

This guidelines provides essential information for data submitters and authors on how to prepare and submit their data for publication with PANGAEA. We recommend that you read the following information carefully before submitting data to us. These instructions includes the scope of PANGAEA, editorial criteria and processes and preparation guidelines for metadata and data.

=I. Misssion and Scope=

The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. PANGAEA guarantees long-term availability (greater than 10 years) of its content. PANGAEA is open to any project, institution, or individual scientist to use or to archive and publish data.

PANGAEA focuses on georeferenced observational and experimental data. Citability, comprehensive metadata descriptions, interoperability of data and metadata, a high degree of structural and semantic harmonization of the data inventory as well as the commitment of the hosting institutions ensures FAIRness of archived data.

Most of the data are freely available and can be used under the terms of the license mentioned on the data set description. A few password protected data sets are under moratorium due to ongoing projects. The description of each data set is always visible and includes the principal investigator (PI) who may be asked for access.

Each dataset can be identified, shared, published and cited by using the data citation, which includes a Digital Object Identifier (DOI). PANGAEA also allows data to be published as supplements to science articles (example) or as citable data collections in combination with data journals like ESSD, Geoscience Data Journal, Nature Scientific Data, and others.

The PANGAEA data editorial ensures the integrity and authenticity as well as a high usability of your data. Archived data are machine readable and mirrored into our data warehouse which allows efficient compilations of data. PANGAEA is open to any project, institution, or individual scientist to use or to archive and publish data. Start a data submission here.

If you find PANGAEA useful for your work please cite:

Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x

=II. Editorial Criteria and Processes=

In General
PANGAEA aims to publish high quality datasets, following the FAIR Principles (Wilkinson et al., 2016). In the publication process, data and metadata are checked for completeness and plausibility, and are structurally harmonized. This harmonization and standardization promotes not only machine readability and further processability, but also a high degree of reusability of the data stock, in compliance with the FAIR data principles (Felden et al., 2023). The selection of high quality data sets is based on the standardized PANGAEA editorial process. Data submissions that do not meet the requirements of a high quality data publication will be rejected.

Before submitting data to us, please check if there is a topic-specific repository for your data type. Topic-specific data repositories may be able to better represent and publish your data type, or bring your data into a community-specific context. Please check re3data.

Data tyes and formats accepted in PANGAEA
PANGAEA publishes primary/validated data from many field of Earth and Environmental Science. This includes georeferenced observational and experimental data. The focus is on tabulated field observations and experimental data, which are presented in a relational database (PostgreSQL).

Preferred format for data is TAB-delimited TEXT-files (UTF-8), or excel files. Tables formats are not accepted as binary objects (e.g., .mat). Example: https://doi.org/10.1594/PANGAEA.937808

Binary objects as e.g., NetCDF-files, seismic data files, photos/images and videos are also accepted as long as they are fully described with their metadata. In order to follow the FAIR data publication and guarantee reusability, all binary files must be usable with open source software. Example: https://doi.org/10.1594/PANGAEA.936185

In addition, documentation on data sets can be archived and published. These can be submitted as PDF/A, ODF, plain text or MS Office documents.

Data types and formats PANGAEA does not accept (anymore)
Raw data without metadata (Processing level 0) are not accepted in PANGAEA. Raw data with their metadata (Processing level 1) may accepted and should be accompanied with their primary/validated data.

PANGAEA does not archive molecular sequence data, but will accept related (meta)data and establish links to the European Nucleotide Archive (ENA). For more information, please read: Molecular data in PANGAEA. Please check out e.g., European Nucleotide Archive (ENA). If your molecular data are accompanied with any environmental parameters, you may contact GFBio.

Climate modeling data; the German Climate Computing Center (DKRZ) provides a Long Term Archiving Service for large reserach dara sets which are relevant for climate or Earth system research.

Code and any kind of software; codes can be stored and managed on GitHub and published via Zenodo to get a persistant identifyer. PANGAEA will link to the Zenodo citation.

Data presented exclusively as figures are not published.

Standalone pdf documents will not be published. This also applies to tables in pdf format. Tables must be provided in tab delimited text files or as excel file. Data accompanying documents can be submitted in PDF/A.

The same as for PDF-files is also valid for Word files.

Tables in .mat formats, R-formats or other program-specific formats are not accepted. This also applies to formats output from custom devices.

Timing and publishing options
Depending on the extent and complexity of your data submission the editorial process and minting of DOI names for submission not affiliated to our hosts, project partners and partners (front-offices) might therefore take up to several months. A temporary access key for journal reviewers can be created. But DOI and temporaray access key can only be provided at the end of the curation workflow. We recommend submitting the data as early as possible so that the DOI can be generated in time for a paper publication. We offer several ways to update data in the paper publishing process and keep it under moratorium

Costs
The basic operation is covered by public funding, but in order ensure a high quality in processing and archiving new data, PANGAEA receives additional funds. In case that data are submitted as part of a project for which funding is available for publication, PANGAEA would appreciate a financial contribution of 500.– € (net) for a data submission (e.g. as part of the costs for Open Access publications at the DFG). Other forms of funded collaborations can be negotiated. Please contact us for further information and invoicing.

Editorial process


The workflow for a data publication from source to publication is similar to the submission > review > editorial > publication flow established in scientific literature. The editorial process follows a 2 step review procedure and is coordinated by the editor-in-chief and the data editors. The workflow and communication of each data submission is documented through a Ticket System.

The workflow is an interaction between the (corresponding) author and the editorial team and consists of 8 steps: Please note that this process can take several weeks. DOI and temporaray access key can only be provided at the end of the curation workflow.
 * 1) Data submission - The authors submit their data set and a description of their data set (metadata) via the Submission online tool. They follow the  Authors guides and project or institute data policies.
 * 2) Initial review - The editorial staff consider whether the submission is accepted for further evaluation. The editorial staff consults with expert editors on this decision. In this step we check if the topic is significant for publication in PANGAEA. The further focus at this point is to check the data submissions with respect to completeness of the metadata and with respect to the validity/format of the data. A request will be sent to the author if mandatory information is missing.
 * 3) Acceptance/Rejection - Once the submission is complete and the data set is accepted for publication in PANGAEA, the author is informed. In case the data submission does not meet PANGAEA's requirements, it will be rejected and the author will be informed about it.
 * 4) Editorial Review - The submission is passed to an expert editor. The editor thoroughly checks the metadata and data. The editor will ask the author for more information if the metadata or data is not complete, or if there are questions about the submission. If the data and metadata do not meet our quality standards, the submission may also be rejected in this step.
 * 5) Processing/Data import - Data and metadata are prepared for import into the relational system, or archiving on the servers. For this purpose, the metadata and data are structurally harmonized and supplied with standardized terminologies. Data may be reformatted by the editor to fit to the PANGAEA Data model. During this step, if necessary, tables are transposed, combined or divided, columns with metadata are added (e.g. official event labels and 3rd Geocode), etc.. After import, the editor performs a final check of the data set.
 * 6) Dataset proof - The editor sends the data set link to the author(s), requesting a proofread. The DOI is assigned, but not yet registered ("activated"). The data set status is in "in review" and password protected (Option No 1 of table above) at this stage. Metadata are always open access.
 * 7) Corrections - Through an iterative process between author and editor, the data set is edited until the final approval by the author.
 * 8) Publication - The data set status is set to "published"; the DOI will be activated 4 weeks after the final editing and is then part of the official data set citation. Upon request of the author, a password protection may be set for a moratorium period or until the related paper is published. A temporary access link with an expiry date can be granted upon request of the author. Such a link can be used to share the data with individuals or groups, for example for co-authors or anonymous reviewers.

Costs
The basic operation is covered by public funding, but in order ensure a high quality in processing and archiving new data, PANGAEA receives additional funds. In case that data are submitted as part of a project for which funding is available for publication, PANGAEA would appreciate a financial contribution of 500.– € (net) for a data submission (e.g. as part of the costs for Open Access publications at the DFG). Other forms of funded collaborations can be negotiated. Please contact us for further information and invoicing.

=III. Authors Guides/Formatting Guides= The author’s guides describe how to prepare your metadata and data for submission in PANGAEA. We recommend you to read this guideline before submitting your data. In addition, we recommend that you familiarize yourself with the PANGAEA publishing style by reading about PANGAEA's scope and by searching for and viewing data sets of your research field.

Furthermore, please be aware that with your registration to PANGAEA and submitting data to PANGAEA you have accepted our Terms of Use (https://www.pangaea.de/about/terms.php).

PANGAEA is an international data publisher and therefore we expect all data and metadata written in English.

PANGAEA datasets should be understandable in itself, i.e. a potential user of the data should be able to judge quality and suitability for reuse. Therefore, complete metadata should be available, describing the dataset comprehensively and according to FAIR principles.

For metadata and data preparation please see below, and our Video Tutorials. We offer community workshops twice a year, if you are interested, please subscribe here

Preparation Dataset Metadata, how to fill the Submission Form:
All data have to be submitted using our Submission online tool (https://www.pangaea.de/submit/). Any other data transfer will not be processed or passed on. For any request concerning your data submission either use our contact form (https://www.pangaea.de/contact/) or for existing data submissions write your comment in the field provided for this purpose. PLEASE NOTE: Any emails or calls related to data submission will not be answered or processed due to resource limitation. Our system will automatically inform you about the status of your data submission (processing step).

1. Page

 * Title: Give a dataset title, briefly describing what and where. Title must be independent of manuscript/paper title
 * Authors: Give all authors of the dataset. Give full names, no initials. Author names are case-sensitive, no full uppercase for last names (how to: Doe, Jane). Please enter the correct e-mail addresses for each author, no duplicates. If there is really no email address no-reply@pangaea.de can be entered. Fill up the affiliation field (using full names, no abbreviations).
 * Keywords: Give keywords here
 * Abstract: Add a dataset abstract, which is independent of the manuscript/paper abstract. Abstract contains a concise and method-oriented description of the observation or measurement, namely what, when, where, why and how the data was collected. The summary should consist of meaningful running text. The format of the dataset abstract is the same as that of paper abstracts. We expect more than two sentences, the length should be ideally limited to 5000 characters. Avoid interpretation of the data. For further information see: https://wiki.pangaea.de/wiki/Abstract
 * License: Choose the license for your dataset

2. Page

 * References: Add any relevant reference as full citation and not limited to a DOI here: Paper/manuscript to which the data belong. Add in additional references mentioned in the data, methods or abstract. Add SOPs, AWI-Registry handles/links.

3. Page

 * Projects: Give Projects and awards. Please add the funder’s DOI (can be found here: https://doi.crossref.org/funderNames?mode=list) additionally into the Project website field

4. Page

 * Upload: Upload your data files here. Please see below how to prepare your data files.
 * More than 20 Files -> Please tick the "Request upload link" checkbox. You will receive an upload link within one business day. We will reject the submission if you simply upload here more than 20 files without being asked to do so. For file uploads please name the files without a space
 * Files larger than 100 MB -> Please tick the "Request upload link" checkbox. Individual files must be less than 15GB, however several files can be uploaded simultaneously. For file uploads please name the files without a space
 * File description: You can describe your files here. If you have more than one data table/dataset please ideally provide a title and an abstract for each data table/dataset here.

5. Page

 * Comment: Field for any request/comment for the PANGAEA editors
 * Moratorium: check, if you need a moratorium. If yes, please choose the date. The default is 6 months, if no date is chosen.
 * Terms of Use: please read our ToU (https://www.pangaea.de/about/terms.php) and accept them

Note
If you need to change or add the metadata, please do so exclusively via the "Edit Metadata" button. Please note that for technical reasons direct edits in the description field of the JIRA ticket are invalid and cannot be accepted! This is especially relevant for abstracts, please use "Edit Metadata". Abstracts submitted as data files cannot be considered. We always assume that the metadata from the metadata file represents the most recent version.

Data and their Metadata
PANGAEA publishes data from earth system research in diverse formats (https://wiki.pangaea.de/wiki/Format). Tabular data are the main focus of PANGAEA and should be prepared in TAB-delimited text files (UTF-8 encoding) or Excel-format. Please checkout our Best practice manuals and Templates (https://wiki.pangaea.de/wiki/Best_practice_manuals_and_templates)

Data-Metadata

 * Campaign: Sampling/measurements were done during campaigns, expeditions, field trips, cruises. This is called “Campaign” and includes the following information (https://wiki.pangaea.de/wiki/Campaign). Please use our Template Campaign_Event or topic specific templates. Information that should be provided:
 * Campaign_Label e.g., Cruise Number
 * Basis e.g. ship’s name, station, airplane etc., leave empty, when no basis can be given
 * Begin Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss, UTC
 * End Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss, UTC
 * Responsible Scientist
 * For ship expeditions start and end harbor


 * Event: is the sampling or measurement site/position for field observations or the sampling position of organisms/water/mediums of experiments. See for detailed description: https://wiki.pangaea.de/wiki/Event. And use our Template Campaign_Event or topic specific templates. Information that should be provided:
 * Event_Label = Station/Sample point etc. . For data from German Research vessels please use the official Event labels, can be checked here: https://www.pangaea.de/expeditions/
 * Latitude and Longitude are mandatory event metadata, specified in decimal degrees, WGS84 (positive for north, negative for south). Please specify start and end positions for profiles.
 * Elevation (see https://wiki.pangaea.de/wiki/Geocode). Please specify start and end elevations for profiles.
 * Date/Time of sampling/measurement provided as ISO-format YYYY-MM-DDThh:mm:ss, UTC. Please specify start and end Date/Time for profiles and time series.
 * Device or method used for sampling/measurement
 * Campaign, see above
 * Any other information e.g., mesh size of net devices, core length of sediment and ice cores, International Generic Sample Number (IGSN). Please see the Event documentation.


 * Parameter: in PANGAEA measurement variables are named parameter. Parameter are defined by a full parameter name and its unit. Parameter Name in combination with the unit must be unique in Pangaea
 * Parameter name of the measured or determined characteristic needs to be given in full, not as abbreviation.
 * Unit, SI units are preferred
 * Current list of parameters used in PANGAEA

Data Preparation
Structure of tabular data:
 * In PANGAEA data-tables, the first column indicates the Event label, followed by columns with the 3rd geocode (https://wiki.pangaea.de/wiki/Geocode ) and/or sample ID and Sample information. This is followed by the columns with the variables/parameters. Each value of a row refers to the event and the 3rd geocode in column 1 and 2.
 * The first row presents the column header contains the full parameter name and unit in squared brackets.
 * Several tables with different structures should be provided as different data files/sheets

Dos:
 * All Parameters/Variables must be written out and provided together with their unit.
 * Please write out species names and do not abbreviate the genus name. Spell-check all taxonomic terms, e.g. by using the taxon match tools by the World Register of Marine Species or equivalent taxonomy data provider
 * Use English language only for parameters and any text in the data table
 * Number format in PANGAEA has a dot as decimal separator and no thousands separator
 * Decimal places should be chosen in a scientifically meaningful way. Do not specify an unnecessary and unrealistic number of decimal places. Please be aware numbers of position after the comma represents the precision of your measurement
 * For numeric entries, no special characters are allowed, except PANGAEA Quality Flags (https://wiki.pangaea.de/wiki/Quality_flag)
 * Missing measurements are indicated with an empty cell, and NOT filled with '-', 'n/a', 'NaN', -9999 or '*' etc.
 * Measurements below the detection limit are marked with <”detection limit”
 * Only one (1) parameter/variable per column. Multiple values separated by '-', '±', '' (ranges, values with errors, uncertainties, or alternative values in brackets) within a single cell are not accepted.
 * Abbreviations in the data tables must be explained
 * Remove empty lines and columns; those will not be imported.
 * Please provide the primary instrument used to measure each specific variable/parameter, in the following format: "Instrument type, Manufacturer, Model name". If you did not use any instrument, please provide the method used as alternative, in the following format: "Method type according to Reference et al. (YYYY)". Further details on how to provide measurement devices or methods can be found https://wiki.pangaea.de/wiki/Method
 * For file uploads please name the files without a space

Don‘ts:
 * Do not use any Macros or active formulas
 * Do not use any formatting, or color coding, or returns/linebreaks in excel cells
 * Do not use any notes/comment features of excel
 * Do not include graphs in your excel sheets
 * Do not fill cells of missing measurements with '-', 'n/a', 'NaN', -9999 or '*' etc.
 * Do not set multiple values
 * Mix several tables in one sheet like: Event 1 Depth 1 Parameter 1 -empty column- Event 2 Depth 2 Parameter 2 -empty column- Event 3 Depth 3 Parameter 3….. (Fig. Donts_mix_tables)

Ticket für Webseitenverbesserung: https://issues.pangaea.de/browse/PMW-2392

Alte Kommentare (2023-06-01)

Resources

 * Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10,


 * Elizabeth T. Borer, Eric W. Seabloom, Matthew B. Jones & Mark Schildhauer (2009) Some Simple Guidelines for Effective Data Management, The Bulletin of the Ecological Society of America, 90: 205-214.


 * Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017) Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510.

https://nceas.github.io/datateam-training/reference/

https://nceas.github.io/datateam-training/training/

Submission templates
Good examples! http://www.earthchem.org/data/templates

Examples of data publications
For more information on submissions of frequent types of data see best practice manuals and templates.

The examples below may give a first impression, which information is required for specific scientific fields. The export formats may differ slightly. Please keep in mind that the export format is dynamically produced by the relational database behind PANGAEA. It is thus NOT required to provide the data submission in the exact same technical format; the content is the important part of the data submission.
 * Moorings with trap/current meter
 * Vertical oceanographic profile
 * Horizontal profile/ships track
 * Horizontal distribution of irregular distributed samples
 * Vertical profile
 * Bulk sediment parameter
 * Core logging, Physical properties
 * Hole logging
 * Mineralogy
 * Grain size
 * Pollen
 * Geochemistry
 * Porewater
 * XRF
 * Horizontal profile
 * Ships track data in general
 * Intern:Geophysical profile
 * Reflection seismic
 * Refraction seismic
 * Magnetic
 * Gravimetry
 * Profile versus relative distance
 * Speleotheme
 * Coral
 * Time series
 * Radiation
 * Biological measurements
 * Binary object (data files in various binary formats)
 * photos, images, graphics
 * seismic profiles in sgy-format
 * models
 * Maps
 * Experiments