Talk:Data submission

'''Diese Seite wird von Stefi erstellt und bearbeitet. Bitte seht davon ab, hier etwas einzutragen, zu korrigieren oder zu ergänzen, bis wir die Seite für die allgemeine Kritik, Korrektur und Wünsche-Runde freigeben.'''

=Authors Guides (Soll Seitentitel werden)=

These guidelines provide essential information for data submitters and authors on how to prepare and submit their data for publication with PANGAEA. We recommend that you read the following information carefully before submitting data to us. These instructions includes the scope of PANGAEA, editorial criteria and processes, and preparation guidelines for metadata and data.

=I. Misssion and Scope=

The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. PANGAEA guarantees long-term availability of its content for at least 10 years (~75% is actually older than that). PANGAEA is open to any project, institution, or individual scientist to use or to archive and publish data.

PANGAEA focuses on georeferenced observational and experimental data. Citability, comprehensive metadata descriptions, interoperability of data and metadata, a high degree of structural and semantic harmonization of the data inventory as well as the commitment of the hosting institutions ensures the FAIRness (| Wilkinson et al., 2016) of archived data both for use by humans and machines (i.e. tools and scripts, federated infrastructures, data portals and aggregators etc.).

Most of the data published on PANGAEA are freely available and can be used under the terms of the license mentioned on the dataset description. A few password-protected data sets are under moratorium due to ongoing projects. The metadata for all published datasets is always accessible and includes the Principal Investigator (PI) who can be contacted for individual access.

Each dataset can be identified, shared, published and cited by using the data citation, which includes a Digital Object Identifier (DOI). PANGAEA also allows data to be published as supplements to science articles (example) or as citable data collections in combination with data journals such as ESSD, Geoscience Data Journal, Nature Scientific Data, and others.

The PANGAEA data editorial ensures the integrity and authenticity as well as a high usability of your data. Archived data are machine readable and mirrored into our Data Warehouse which allows efficient compilations of data.

If you find PANGAEA useful for your work please cite:

Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x

=II. Editorial Criteria and Processes=

General Information
PANGAEA is committed to publishing high quality datasets in maximum compliance with the FAIR Data Principles (Wilkinson et al., 2016). During the publication process, data and metadata are checked for completeness and plausibility, and are structurally harmonized. This harmonization and standardization promotes a high degree of reusability and interoperability of the data stock and, among other things, supports the optimal readability and further processability of the data by machines and algorithms (Felden et al., 2023). Following standardized procedures, the PANGAEA Editorial Team systematically reviews incoming data submissions and decides, often after some guidance on revisions, whether the submissions are sufficiently mature and of the appropriate quality to be published with PANGAEA.. Data submissions that do not meet the scope and/or our quality requirements will be rejected.

Before submitting data to us, please check if there is a community-specific repository for your data type. Community-specific data repositories may be able to better describe, represent and publish your type of data, or bring your data into a discipline-specific context. The repository search platform re3data may be very helpful in this regard.

Data types and file formats accepted by PANGAEA
PANGAEA publishes primary/validated data from many fields of Earth and Environmental Science. This includes georeferenced observational and experimental data. PANGAEA is specialized in field observation and experimental data in two-dimensional tabular format with parameters/variables measured provided in columns.

Preferred formats for data are Comma- or TAB-delimited TEXT-files in UTF-8 encoding, or (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc. - please see the respective Wiki article for more). Tables are not accepted as proprietary or encapsulated files types (e.g., .mat or .pdf). Example: https://doi.org/10.1594/PANGAEA.937808

Binary objects such as, e.g. NetCDF-files, seismic data files (e.g. segy), photos/images and videos are also accepted as long as they are fully described with metadata. In order to follow the FAIR principles and guarantee reusability for PANGAEA data publications, all such binary files must be usable with open source software. Example: https://doi.org/10.1594/PANGAEA.936185

As an addition to numerical (or binary) data, augmenting documentation on datasets can be archived and published (e.g. Processing reports, instrument calibration protocols, Standard operating procedures). These can be submitted as PDF/A, plain text or open document formats like RTF, ODF or MS Office documents (docx, xlsx).

Raw Data
Raw data without metadata (Processing level 0) are not accepted in PANGAEA. Raw data with their metadata (Processing level 1) may be accepted under certain circumstances and should be accompanied with their primary/validated data.

Sequence data
PANGAEA does not archive molecular sequence data, but will accept related (meta)data and create cross-links to, e.g., the European Nucleotide Archive (ENA). For more information, please read: Molecular data in PANGAEA. If your molecular data are accompanied with any environmental parameters, we recommend that you submit your data to GFBio and its free and multidisciplinary publication service.

Model data
Many, but not all, kinds of model and simulation data will not be accepted for publication as of 2023. Please read more about our demarcations, definitions and explanations here: [Link to Model data wiki article].

In short:

Data outputs of models that entirely rely on algorithms and (process) generalizations, i.e. what we would consider simulations, and that do not have a concrete (and clearly specifiable) spatial reference to field observational data, are driven by other models or simulations (of processes), will not be accepted by PANGAEA. This includes spatially gridded data from (extra- or) interpolations by models, without immediately adjacent experimental data counterparts, provided either with the model results or by reference to openly available published datasets.

If you are about to submit something in the gray area of these definitions and are convinced that it fits PANGAEA, please submit it anyway, even at the risk that we may ultimately decide against accepting it for publication in PANGAEA. In any case, we will try to make and communicate this decision quickly. For climate modeling/simulation data the World Data Center for Climate (WDCC) run by the German Climate Computing Center (DKRZ) provides an established long term archival and publication service worth your consideration.

Software/Code
PANGAEA is not a suitable platform to publish software. We generally recommend storing and managing software products or any kind of scripts and code on specialized platforms such as GitHub. Established workflows with the collaborating general purpose repository Zenodo provide the option to publish specific versions or snapshots of that software and obtain a respective citation and a persistent identifier to this resource. This is the preferred method to combine PANGAEA datasets and relevant versions of code, because they can be easily cross-linked to the ZENODO publication.

Data presented exclusively as plots/figures are not published.

Standalone pdf documents will not be published. This also applies to tables in pdf format. Tables must be provided in tab delimited text files or as excel file. Data accompanying documents, e.g. standard operation procedures, can be submitted in PDF/A.

The same as for PDF-files is also valid for Word files.

Tables in .mat formats, R-formats or other program-specific formats are not accepted. This also applies to device-specific formats.

Topic/community specific formats, which cannot be reused with open source software, is not accepted.

Timing and publishing options
Depending on the extent and complexity of your data submission the editorial process and minting of DOI names for submission not affiliated to our hosts, project partners and partners (front-offices) might therefore take up to several months. A temporary access key for journal reviewers can be created. But DOI and temporaray access key can only be provided at the end of the curation workflow. We recommend submitting the data as early as possible so that the DOI can be generated in time for a paper publication. We offer several ways to update data in the paper publishing process and keep it under moratorium

Editorial process


The workflow for a data publication from source to publication is similar to the submission > review > editorial > publication flow established in scientific literature. The editorial process follows a two step review procedure and is coordinated by the editor-in-chief and the data editors. The workflow and communication of each data submission is documented through a Ticket System.

The workflow is an interaction between the (corresponding) author and the editorial team and consists of 8 steps: Please note that the editorial workflow can take up to several weeks. A DOI and temporary access key can only be generated after the data has been fully processed and imported.
 * 1) Data submission - The authors submit their data set and a description of their data set (metadata) via the Submission online tool. They follow the  Authors guides and, when required project or institute data policies.
 * 2) Initial review - The editorial staff consider whether the submission is accepted for further evaluation. The editorial staff consults with expert editors on this decision. In this step we check if the topic is significant for publication in PANGAEA. The further focus at this point is to check the data submissions with respect to completeness of the metadata and with respect to the validity/format of the data. A request will be sent to the author if mandatory information is missing.
 * 3) Acceptance/Rejection - Once the submission is complete and the data set is accepted for publication in PANGAEA, the author is informed. In case the data submission does not meet PANGAEA's requirements, it will be rejected and the author will be informed about it.
 * 4) Editorial Review - The submission is passed to an expert editor. The editor thoroughly checks the metadata and data. The editor will ask the author for more information if the metadata or data is not complete, or if there are questions about the submission. If the data and metadata do not meet PANGAEA's quality standards, the submission may also be rejected in this step.
 * 5) Processing/Data import - Data and metadata are prepared for import into the relational system, or archiving on the servers. For this purpose, the metadata and data are structurally harmonized and supplied with standardized terminologies. Data may be reformatted by the editor to fit to the PANGAEA Data model. During this step, if necessary, tables are transposed, combined or divided, columns with metadata are added (e.g. official event labels and 3rd Geocode), etc.. After import, the editor performs a final check of the data set.
 * 6) Dataset proof - The editor sends the data set link to the author(s), requesting a proofread. The DOI is assigned, but not yet registered ("activated"). The data set status is in "in review" and password protected (Option No 1 of table above) at this stage. Metadata are always open access.
 * 7) Corrections - Through an iterative process between author and editor, the data set is edited until the final approval by the author.
 * 8) Publication - The data set status is set to "published"; the DOI will be activated 4 weeks after the final editing and is then part of the official data set citation. Upon request of the author, a password protection may be set for a moratorium period or until the related paper is published. A temporary access link with an expiry date can be granted upon request of the author. Such a link can be used to share the data with individuals or groups, for example for co-authors or anonymous reviewers.

Costs
The basic operation is covered by public funding, but in order to ensure a high quality in processing and archiving new data, PANGAEA receives additional funds. In case that data are submitted as part of a project for which funding is available for publication, PANGAEA would appreciate a financial contribution of 500.– € (net) for a data submission (e.g. as part of the costs for Open Access publications at the DFG). Other forms of funded collaborations can be negotiated. Please contact us for further information and invoicing.

=III. Formatting Guides= The formatting guides describe how to prepare your metadata and data for submission in PANGAEA. We recommend you to read this guideline before submitting your data. In addition, we recommend that you familiarize yourself with the PANGAEA publishing style by reading about PANGAEA's scope and by searching for and viewing data sets of your research field.

Furthermore, please be aware that with your registration to PANGAEA and submitting data to PANGAEA you have accepted our Terms of Use (https://www.pangaea.de/about/terms.php).

PANGAEA is an international data publisher and therefore we expect all data and metadata written in English.

PANGAEA datasets should be understandable in itself, i.e. a potential user of the data should be able to judge quality and suitability for reuse. Therefore, complete metadata should be available, describing the dataset comprehensively and according to FAIR principles.

For metadata and data preparation please see below, and our Video Tutorials. We offer community workshops twice a year, if you are interested, please subscribe here

Preparation Dataset Metadata, how to fill the Submission Form:
All data have to be submitted using our Submission online tool. Any other data transfer will not be processed or passed on. For any request concerning your data submission either use our contact form or for existing data submissions write your comment in the field provided for this purpose. PLEASE NOTE: Any emails or calls related to data submission will not be answered or processed due to resource limitation. Our system will automatically inform you about the status of your data submission (processing step).

Submission: 1. Basic

 * Title: Give a dataset title, briefly describing what and where. Title must be independent of manuscript/paper title
 * Authors: Give all authors of the dataset. Give full names, no initials. Author names are case-sensitive, no full uppercase for last names (how to: Roe, Jane). Please enter the correct e-mail addresses for each author, no duplicates. If there is really no email address no-reply@pangaea.de can be entered. Fill up the affiliation field (using full names, no abbreviations).
 * Keywords: Give keywords here
 * Abstract: Add a dataset abstract, which is independent of the manuscript/paper abstract. Abstract contains a concise and method-oriented description of the observation or measurement, namely what, when, where, why and how the data was collected. The summary should consist of meaningful running text. The format of the dataset abstract is the same as that of paper abstracts. We expect more than two sentences, the length should be ideally limited to 5000 characters. Avoid interpretation of the data. For further information see the PANGAEA Abstract information.
 * License: Choose the license for your dataset

Submission: 2. References

 * References: Add any relevant reference as full citation and not limited to a DOI here: Paper/manuscript to which the data belong. Add in additional references mentioned in the data, methods or abstract. Add SOPs, AWI-Registry handles/links.

Submission: 3. Projects and Grants

 * Projects: Give Projects and awards. Please add the funder’s DOI (can be found here: https://doi.crossref.org/funderNames?mode=list) additionally into the Project website field

Submission: 4. Upload

 * Upload: Upload your data files here. Please see below how to prepare your data files.
 * More than 20 Files -> Please tick the "Request upload link" checkbox. You will receive an upload link within one to three day. We will reject the submission if you simply upload here more than 20 files without being asked to do so. For file uploads please name the files without a space
 * Files larger than 100 MB -> Please tick the "Request upload link" checkbox. Individual files must be less than 15GB, however several files can be uploaded simultaneously. For file uploads please name the files without a space
 * File description: You can describe your files here. If you have more than one data table/dataset please ideally provide a title and an abstract for each data table/dataset here.

Submission 5. Submit

 * Comment: Field for any request/comment for the PANGAEA editors
 * Moratorium: check, if you need a moratorium. If yes, please choose the date. The default is 6 months, if no date is chosen.
 * Terms of Use: please read our ToU and accept them.

Note
If you need to change or add the metadata after submitting, please do so exclusively via the "Edit Metadata" button. Please note that for technical reasons direct edits in the description field of the JIRA ticket are invalid and cannot be accepted! This is especially relevant for abstracts, please use "Edit Metadata". Abstracts submitted as data files cannot be considered. We always assume that the metadata from the metadata file represents the most recent version.

Data and their Metadata
PANGAEA publishes data from earth system research in diverse formats. Tabular data are the main focus of PANGAEA and should be prepared in TAB-delimited text files (UTF-8 encoding) or Excel-format. Please checkout our best practice manuals and templates.

Data-Metadata
Data tables and data files are provided with metadata about the samples/measuring stations and the parameters/variables. This is the metadata listed below.


 * Campaign: Sampling/measurements were done during campaigns, expeditions, field trips, cruises. This is called “Campaign” and includes the following information. Please use the sheet "Campaign" in our our templates. Information that should be provided:
 * Campaign_Label e.g., Cruise Number
 * Basis e.g. ship’s name, station, airplane etc., leave empty, when no basis can be given
 * Begin Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss, UTC
 * End Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss, UTC
 * Responsible Scientist
 * For ship expeditions start and end harbor
 * Please also compare the cruise inventory for expeditions with German research vessels


 * Event: is the sampling or measurement site/position for field observations or the sampling position of organisms/water/mediums of experiments. Please see the Event documentation. Use the sheet "Event" in our our templates. Information that should be provided:
 * Event_Label = Station/Sample point etc. . For data from German research vessels please use the official Event labels, please see the cruise inventory for expeditions and station liszs with German research vessels.
 * Latitude and Longitude are mandatory event metadata, specified in decimal degrees, WGS84 (positive for north, negative for south). Please specify start and end positions for profiles.
 * Elevation, please specify start and end elevations for profiles.
 * Date/Time of sampling/measurement provided as ISO-format YYYY-MM-DDThh:mm:ss, UTC. Please specify start and end Date/Time for profiles and time series.
 * Device or method used for sampling/measurement
 * Campaign, see above
 * Any other information e.g., mesh size of net devices, core length of sediment and ice cores, International Generic Sample Number (IGSN). Please see the Event documentation.


 * Parameter: in PANGAEA measurement variables are named parameter. Parameter are defined by a full parameter name and its unit. Parameter Name in combination with the unit must be unique in Pangaea. Use the sheet "Parameter" in our templates. Information that should be provided:
 * Parameter name of the measured or determined characteristic needs to be given in full, not as abbreviation.
 * Unit, SI units are preferred
 * Add the Principle Investigator (PI) for the parameters. The principle investigator (PI) is the person being responsible for the scientific quality of a data series.
 * Please give the method of measurement for each parameter. Please provide the primary instrument used to measure each specific variable/parameter, in the following format: "Instrument type, Manufacturer, Model name". If you did not use any instrument, please provide the method used as alternative, in the following format: "Method type according to Reference et al. (YYYY)". Further details on how to provide measurement devices or methods can be found in the instructions for methods.
 * Complete list of parameters used in PANGAEA

Data Preparation - Tabular Data:
Structure of tabular data:
 * Submit your tables as TAB-delimited text files (UTF-8 encoding), or as Excel-format.
 * In PANGAEA data-tables, the first column indicates the Event label, followed by columns with the 3rd geocode and/or sample ID and sample information. This is followed by the columns with the variables/parameters. Each value of a row refers to the event and the 3rd geocode in column 1 and 2.
 * The first row presents the column header contains the full parameter name and unit in squared brackets.
 * Several tables with different structures should be provided as different data files.
 * Please use and/or orientate yourself to the sheet "Data" in our templates

Dos:
 * All Parameters/Variables must be written out and provided together with their unit. Units, preferable SI units, are given in square brackets.
 * Please write out species names and do not abbreviate the genus name. Spell-check all taxonomic terms, e.g. by using the taxon match tools by the World Register of Marine Species or equivalent taxonomy data provider.
 * Use English language only for parameters and any text in the data table.
 * Number format in PANGAEA has a dot as decimal separator and no thousands separator.
 * Decimal places should be chosen in a scientifically meaningful way. Do not specify an unnecessary and unrealistic number of decimal places. Please be aware numbers of position after the comma represents the precision of your measurement.
 * For numeric entries, no special characters are allowed, except PANGAEA Quality Flags.
 * Missing measurements are indicated with an empty cell, and NOT filled with '-', 'n/a', 'NaN', -9999 or '*', etc.
 * Measurements below the detection limit are marked with <”detection limit value”.
 * Only one (1) parameter/variable per column. Multiple values separated by '-', '±', '' (ranges, values with errors, uncertainties, or alternative values in brackets) within a single cell are not accepted.
 * Abbreviations in the data tables must be explained.
 * Remove empty lines and columns; those will not be imported.
 * For file uploads please name the files without a space.

Don‘ts:
 * Do not use any Macros or active formulas
 * Do not use any formatting, or color coding, or returns/linebreaks in excel cells
 * Do not merge cells in an excel sheet
 * Do not use any notes/comment features of excel
 * Do not include graphs in your excel sheets
 * Do not fill cells of missing measurements with '-', 'n/a', 'NaN', -9999 or '*' etc.
 * Do not set multiple values
 * Mix several tables in one sheet like: Event 1 Depth 1 Parameter 1 -empty column- Event 2 Depth 2 Parameter 2 -empty column- Event 3 Depth 3 Parameter 3…..

Data Preparation - Binary files:

 * One data file: Please provide a brief description of what is contained in the data file (Submission: 4. Upload)
 * More than a data file: please add a table, TAB-delimited text files (UTF-8 encoding) or Excel-format, with filename, short description of each file (< 255 characters including spaces) and Geocodes.
 * Use filenames without spaces
 * Use the upload-link function, when you have more than 20 files, or files large than 100 Mb, see Submission: 4. Upload.

______________________________________________________________________________________________

Resources

 * Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10,


 * Elizabeth T. Borer, Eric W. Seabloom, Matthew B. Jones & Mark Schildhauer (2009) Some Simple Guidelines for Effective Data Management, The Bulletin of the Ecological Society of America, 90: 205-214.


 * Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017) Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510.

https://nceas.github.io/datateam-training/reference/

https://nceas.github.io/datateam-training/training/

Submission templates
Good examples! http://www.earthchem.org/data/templates

Examples of data publications
For more information on submissions of frequent types of data see best practice manuals and templates.

The examples below may give a first impression, which information is required for specific scientific fields. The export formats may differ slightly. Please keep in mind that the export format is dynamically produced by the relational database behind PANGAEA. It is thus NOT required to provide the data submission in the exact same technical format; the content is the important part of the data submission.
 * Moorings with trap/current meter
 * Vertical oceanographic profile
 * Horizontal profile/ships track
 * Horizontal distribution of irregular distributed samples
 * Vertical profile
 * Bulk sediment parameter
 * Core logging, Physical properties
 * Hole logging
 * Mineralogy
 * Grain size
 * Pollen
 * Geochemistry
 * Porewater
 * XRF
 * Horizontal profile
 * Ships track data in general
 * Intern:Geophysical profile
 * Reflection seismic
 * Refraction seismic
 * Magnetic
 * Gravimetry
 * Profile versus relative distance
 * Speleotheme
 * Coral
 * Time series
 * Radiation
 * Biological measurements
 * Binary object (data files in various binary formats)
 * photos, images, graphics
 * seismic profiles in sgy-format
 * models
 * Maps
 * Experiments