Talk:Data submission

'''Diese Seite wird von Stefi erstellt und bearbeitet. Bitte seht davon ab, hier etwas einzutragen, zu korrigieren oder zu ergänzen, bis wir die Seite für die allgemeine Kritik, Korrektur und Wünsche-Runde freigeben.'''

=Authors Guides (Soll Seitentitel werden)=

These guidelines provide essential information for data submitters and authors on how to prepare and submit their data for publication with PANGAEA. We recommend that you read the following information carefully before submitting data to us. These instructions includes the scope of PANGAEA, editorial criteria and processes, and preparation guidelines for metadata and data.

=I. Misssion and Scope=

The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. PANGAEA guarantees long-term availability of its content for at least 10 years (~75% is actually older than that). PANGAEA is open to any project, institution, or individual scientist to use or to archive and publish data.

PANGAEA focuses on georeferenced observational and experimental data. Citability, comprehensive metadata descriptions, interoperability of data and metadata, a high degree of structural and semantic harmonization of the data inventory as well as the commitment of the hosting institutions ensures the FAIRness (| Wilkinson et al., 2016) of archived data both for use by humans and machines (i.e. tools and scripts, federated infrastructures, data portals and aggregators etc.).

Most of the data published on PANGAEA are freely available and can be used under the terms of the license mentioned on the dataset description. A few password-protected data sets are under moratorium due to ongoing projects. The metadata for all published datasets is always accessible and includes the Principal Investigator (PI) who can be contacted for individual access.

Each dataset can be identified, shared, published and cited by using the data citation, which includes a Digital Object Identifier (DOI). PANGAEA also allows data to be published as supplements to science articles (example) or as citable data collections in combination with data journals such as ESSD, Geoscience Data Journal, Nature Scientific Data, and others.

The PANGAEA data editorial ensures the integrity and authenticity as well as a high usability of your data. Archived data are machine readable and mirrored into our Data Warehouse which allows efficient compilations of data.

If you find PANGAEA useful for your work please cite:

Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x

=II. Editorial Criteria and Processes=

General Information
PANGAEA is committed to publishing high quality datasets in maximum compliance with the FAIR Data Principles (Wilkinson et al., 2016). During the publication process, data and metadata are checked for completeness and plausibility, and are structurally harmonized. This harmonization and standardization promotes a high degree of reusability and interoperability of the data stock and, among other things, supports the optimal readability and further processability of the data by machines and algorithms (Felden et al., 2023). Following standardized procedures, the PANGAEA Editorial Team systematically reviews incoming data submissions and decides, often after some guidance on revisions, whether the submissions are sufficiently mature and of the appropriate quality to be published with PANGAEA.. Data submissions that do not meet the scope and/or our quality requirements will be rejected.

Before submitting data to us, please check if there is a community-specific repository for your data type. Community-specific data repositories may be able to better describe, represent and publish your type of data, or bring your data into a discipline-specific context. The repository search platform re3data may be very helpful in this regard.

Data types and file formats accepted by PANGAEA
PANGAEA publishes primary/validated data from many fields of Earth and Environmental Science. This includes georeferenced observational and experimental data. PANGAEA is specialized in field observation and experimental data in two-dimensional tabular format with parameters/variables measured provided in columns.

Preferred formats for data are Comma- or TAB-delimited TEXT-files in UTF-8 encoding, or (open) spreadsheet file formats (MS Excel .xlsx, OpenOffice & LibreOffice Calc .ods etc. - please see the respective Wiki article for more). Tables are not accepted as proprietary or encapsulated files types (e.g., .mat or .pdf). Example: https://doi.org/10.1594/PANGAEA.937808

Binary objects such as, e.g. NetCDF-files, seismic data files (e.g. segy), photos/images and videos are also accepted as long as they are fully described with metadata. In order to follow the FAIR principles and guarantee reusability for PANGAEA data publications, all such binary files must be usable with open source software. Example: https://doi.org/10.1594/PANGAEA.936185

As an addition to numerical (or binary) data, augmenting documentation on datasets can be archived and published (e.g. Processing reports, instrument calibration protocols, Standard operating procedures). These can be submitted as PDF/A, plain text or open document formats like RTF, ODF or MS Office documents (docx, xlsx).

Data types and formats PANGAEA does not accept (i.a. not anymore)

 * 1) Raw Data: Raw data without metadata (Processing level 0) are not accepted in PANGAEA. Raw data with their metadata (Processing level 1) may be accepted under certain circumstances and should be accompanied with their primary/validated data.
 * 2) Sequence data: PANGAEA does not archive molecular sequence data, but will accept related (meta)data and create cross-links to, e.g., the European Nucleotide Archive (ENA). For more information, please read: Molecular data in PANGAEA. If your molecular data are accompanied with any environmental parameters, we recommend that you submit your data to GFBio and its free and multidisciplinary publication service.
 * 3) Model data: Many, but not all, kinds of model and simulation data will not be accepted for publication as of 2023. Please read more about our demarcations, definitions and explanations here: [Link to Model data wiki article]. In short: Data outputs of models that entirely rely on algorithms and (process) generalizations, i.e. what we would consider simulations, and that do not have a concrete (and clearly specifiable) spatial reference to field observational data, are driven by other models or simulations (of processes), will not be accepted by PANGAEA. This includes spatially gridded data from (extra- or) interpolations by models, without immediately adjacent experimental data counterparts, provided either with the model results or by reference to openly available published datasets. If you are about to submit something in the gray area of these definitions and are convinced that it fits PANGAEA, please submit it anyway, even at the risk that we may ultimately decide against accepting it for publication in PANGAEA. In any case, we will try to make and communicate this decision quickly. For climate modeling/simulation data the World Data Center for Climate (WDCC) run by the German Climate Computing Center (DKRZ) provides an established long term archival and publication service worth your consideration.
 * 4) Software/Code: PANGAEA is not a suitable platform to publish software. We generally recommend storing and managing software products or any kind of scripts and code on specialized platforms such as GitHub. Established workflows with the collaborating general purpose repository Zenodo provide the option to publish specific versions or snapshots of that software and obtain a respective citation and a persistent identifier to this resource. This is the preferred method to combine PANGAEA datasets and relevant versions of code, because they can be easily cross-linked to the ZENODO publication.
 * 5) Other formats: Data presented exclusively as plots/figures, standalone PDF or MS Word documents will not be published. Tables in device-specific (e.g. CTD sensor output) or proprietary formats such as Matlab .mat files and R-files or other program-specific formats will not be accepted for publication. The same applies to topic/community-specific formats, which cannot be reused with open source software. A transformation into accepted file formats are required for these file types (see Data tpyes and formats accepted in PANGAEA).

Usual turn-over times, timelines and publishing options
Depending on the extent and complexity of your data submission the editorial process and minting of DOI names for submissions not affiliated to our hosting institutions MARUM and AWI, to project collaboration partners and institutional partners (front-offices) might take up to several months. Temporary access keys for journal reviewers can be provided once both our (initial formal and subsequent in-depth) review stages have been passed and the data has been ingested into PANGAEA systems successfully, usually not earlier than 6-8 weeks after initially accepting the submission for publication. A data citation including the DOI name is created at the very end of the curation workflow. As a consequence, we strongly recommend submitting data as early as possible so that the respective citation and DOI can be generated in time to be included into related manuscript publications. We offer several options for data publications concerning optional associated moratoria and updates to the data during the paper publication process.

Editorial process


The workflow for a data publication from source to publication is similar to the flow (submission > review > editorial > publication) established in scientific literature. The editorial process follows a two step review procedure and is coordinated by the editor-in-chief and the data editors. The workflow and communication of each data submission is documented and tracked through a ticket system.

The workflow is an interaction between the (corresponding) author and the editorial team and consists of 8 steps: Please note that the editorial workflow can take up to several months. A temporary access key and citation including a DOIcan only be generated after the data has been fully processed and imported.
 * 1) Data submission - The authors submit their data set and a contextual description of their data set (metadata) via the Submission online tool. They follow the  Authors guides and, if required, project or institute specific data policies.
 * 2) Initial review - The editorial staff consider whether the submission is accepted for further evaluation. Consultations with our expert editors may be part of this decision. The main focus of this review stage is the assessment of the scope and significance of the data submitted for publication with PANGAEA as well as to evaluate the data submissions with respect to completeness of the metadata and with respect to the validity and format of the provided data. A request will be sent to the author if necessary requirements are not met.
 * 3) Acceptance/Rejection - Once the submission is considered complete and the dataset is accepted for publication in PANGAEA, the author is informed about that fact via the ticket system and associated emails. Respective rejection messages will be sent in case the data submission does not meet PANGAEA's requirements.
 * 4) Editorial Review - Once it is due for processing the submission is passed to an expert data editor. The editor thoroughly checks the metadata and data for validity and plausibility. The editor will ask the author for more information if the metadata or data is not conclusive or complete, or if there are open questions about the submission. Please note: If the data and metadata do not meet PANGAEA's quality standards or the submitting author does not react to requests by our editors, the submission may also be rejected at this stage.
 * 5) Processing/Data import - Data and metadata are prepared for import into the relational system, where possible, or for file archiving on our servers. For this purpose, the metadata and data are structurally harmonized and aligned with standardized terminologies. Submitted data may be reformatted by the editor to conform to the PANGAEA Data model. This step may involve transposing, merging or splitting tables, adding metadata columns (e.g. official event labels and 3rd Geocode), etc.. After import, the editor performs a final check of the data set.
 * 6) Dataset proof - The editor sends the link to the dataset landing page to the author(s)and requests a proofread. The DOI is assigned, but not yet registered ("activated"). The data set status is set to "in review" and data remains password protected (Option No 1 of table above) at this stage. Respective metadata are always open access (License CC0).
 * 7) Corrections - Through an iterative process between author and editor, the data set is edited until the final approval by the author.
 * 8) Publication - The dataset status is set to "published"; the DOI will be activated 4 weeks after the final editing and is then part of the official dataset citation. At the author's request, password protection may be maintained (or set up) for a period of up to 2 years or until the publication of the corresponding manuscript.. A temporary access link with an expiry date can be granted at the request of the author, e.g. to share the data with individuals or groups, for example for co-authors or anonymous reviewers.

Costs
The basic operation is covered by public funding, but in order to ensure a high quality in processing and archiving new data, PANGAEA receives additional funds. In case that data are submitted as part of a project for which funding is available for publication, PANGAEA would appreciate a financial contribution of 500.– € (net) for a data submission (e.g. as part of the costs for Open Access publications at the DFG). Other forms of funded collaborations can be negotiated. Please contact us for further information and invoicing.

=III. Data Submission and Formatting Guides= The formatting guidelines describe how to prepare your metadata and data for submission to PANGAEA. We recommend that you read these guidelines before submitting your data, and that you familiarize yourself with the PANGAEA publication style by reading about the scope of PANGAEA and by searching for and viewing datasets typical for your research field.

Furthermore, please be aware that with your registration to PANGAEA and submitting data to PANGAEA you have accepted our Terms of Use.

PANGAEA is an international data publisher, accordingly we accept data submissions (including all data and metadata as well as any supplementary information) in English only. All resulting publications as well as our communication with data authors are also in English.

PANGAEA datasets should be self-explanatory, i.e. a potential user of the data should be able to judge the quality and suitability for re-use. Therefore, complete metadata should be available, describing the dataset comprehensively and according to the FAIR principles.

For more guiding information on the appropriate preparation of metadata and data please see below, and our Video Tutorials. We also offer community workshops twice a year to support our users. The one in winter (usually held in November) is dedicated to topics around data submission, the edition in summer focuses more on data search and (i.a. automated) access for re-use of PANGAEA publications. If you are interested, please subscribe to our training mailing list here

Prepare your data and metadata for submission - a step by step guide through our submission form
All data must be submitted using our Submission online tool. Any other means of data transmission will not be processed or passed on. For questions or comments concerning your data submission, please either use the corresponding comment field of the online submission form (stage 5) or our contact form You can also leave a comment in the submission ticket that is created automatically when you have finished the form.

Stage 1 - Basic information

 * Title: Provide a dataset title that briefly describes what was measured and where. The title must be independent of the title of the manuscript/paper.
 * Authors: List all authors of the dataset. Use full names, not initials. Author’s names are case-sensitive, do not use all capital letters for surnames (example: Roe, Jane). Please provide the correct e-mail addresses for each author, no duplicates. If there is really no email address no-reply@pangaea.de can be entered. Fill in the affiliation field (using full names, no abbreviations, ideally compliant to the Research Organization Registry (ROR)).
 * Keywords: Provide suitable keywords here
 * Abstract: Add a dataset abstract that is independent of the manuscript/paper abstract. The abstract should provide a concise and method-oriented description of the observation or measurement, i.e.y what, when, where, why and how the data was collected. The summary should consist of meaningful running text. The format of the dataset abstract is the same as for paper abstracts. We expect more than two sentences, and ideally the length should be limited to 5000 characters. Avoid including interpretations of the data. For further information please refer to the documentation on data abstracts for PANGAEA.
 * License: Choose the appropriate license for your dataset. We recommend the CC-by 4.0 license option. Please refer to our respective Wiki article to understand why.

Stage 2 - References

 * References: Add any relevant references here as full citations, not limited to a DOI including the manuscript(s) to which the data belong(s). Include any additional references mentioned in the data, methods or abstract. Add SOPs, processing or calibration reports, AWI Registry handles/links, or any other complementary documentation, if available.

Stage 3 - Projects and Grants

 * Projects: Provide names and references to related projects and awards. Please add the Crossref Funder ID (download the full list here: https://doi.crossref.org/funderNames?mode=list), if available, to the field “Project website”.

Stage 4 - Upload

 * Upload: Upload your data files here. Please see below how to prepare your data files.
 * More than 20 Files? -> Please tick the "Request upload link" checkbox. You will receive an upload link within one to three days. Submissions including more than 20 files will be rejected  without further notice. Please replace whitespace in file names before uploading them.
 * Files larger than 100 MB? -> Please tick the "Request upload link" checkbox. Individual files must be less than 15GB in size, however several files can be uploaded simultaneously via the uploader. Please replace whitespace in file names before uploading them.
 * File description: You can describe your data files here. If your submission consists of more than one data table or dataset, please provide a title and abstract for each of them.

Stage 5 - Submit

 * Comment: Field for any request/comment for the PANGAEA editors
 * Moratorium: check, if you need a moratorium. If yes, please choose the date. The default is 6 months, if no date is chosen.
 * Terms of Use: please read our ToU and accept them.

Changes to submissions after submitting it via our online form
If you need to change or add metadata after submitting, please do so exclusively via the (blue) "Edit Metadata" button in the respective submission ticket that is automatically sent to you after you have completed the form. Please note that for technical reasons direct edits in the description field of our (JIRA) ticket system are invalid and cannot be accepted. This is especially relevant for abstracts. Abstracts submitted as data files cannot be considered.

Requirements concerning data and their metadata
PANGAEA publishes data from earth system research in diverse formats. Tabular data are the main focus of PANGAEA and should be prepared in TAB-delimited text files (UTF-8 encoding) or Excel-format. Tabular data are, however, the main focus of PANGAEA and should be prepared in TAB-delimited text files (UTF-8 encoding) or (open) spreadsheet file formats (e.g. MS Excel .xlsx). Please take a look at our best practice manuals and templates, which demonstrate our requirements in terms of relevant metadata and the structure of submitted data tables.

Data-Metadata
Data tables and data files are provided with metadata about the sampling/measuring stations or equipment, and the parameters/variables measured. The following list of meta-information is required for each data submission to PANGAEA.


 * Campaign: The samples or measurements relevant to your data were taken during campaigns, expeditions, field trips or cruises? We subsume these under the label "Campaign", which is best described by meta-information listed below. We recommend using the sheet "Campaign" in our templates to provide the following required information, where applicable:
 * Campaign_Label, e.g. the respective cruise number
 * Basis, e.g. a name of a ship, station, airplane etc. Please leave the field empty, if no basis can be provided.
 * Begin Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss and in UTC
 * End Date(/Time) in ISO-format YYYY-MM-DDThh:mm:ss, UTC
 * Responsible scientist
 * For ship expeditions start and end harbor
 * For data from expeditions with German research vessels, please refer to the cruise inventory and report information in compliance with this list.


 * Event: An event refers to the sampling or measurement site or position for field observations, or the sampling position of organisms or mediums like water used for experiments. Please refer to the Event documentation for more details.Please use the sheet "Event" in our templates whenever possible. Information that should be provided include:
 * Event_Label - refers to a representative short name or label for the station or locality of a sampling event. For data from expeditions with German research vessels please use the official event labels and station lists provided in the cruise inventory.
 * Latitude and Longitude - considered mandatory event metadata both must be specified in decimal degrees and compliant to WGS84 (positive for north, negative for south). Please specify start and end positions for profiles.
 * Elevation - the “3rd geocode”. Please specify start and end elevations for profiles.
 * Date/Time of sampling/measurement provided in ISO-format (YYYY-MM-DDThh:mm:ss)and in UTC. A column with local Date/Time may be provided additionally. Please specify start and end Date/Time for profiles and time series.
 * Device or method used for sampling or the measurement
 * Campaign, see above
 * Any other event related information, e.g. mesh size of net devices, core length of sediment and ice cores, International Generic Sample Number (IGSN).


 * Parameter: At PANGAEA, the measurement variables are referred to as parameters. Entries for parameters always require the full parameter name and its unit, if available. You may look them up in the complete list of existing parameters here. Please note that parameters might also include the medium (e.g. “Temperature, air” or “Temperature, water”) or other details for disambiguation purposes. Preferably, use the sheet "Parameter" in our template files to report them. Information that should be provided:
 * Parameter names of the measured or determined entities given in full, not abbreviated.
 * Unit (SI units are preferred)
 * Add the Principle Investigator (PI) for the measured parameters. The principal investigator is the person being responsible for the acquisition and the scientific quality of a data series.
 * Where applicable, provide the primary instrument used to measure each specific variable/parameter, preferably in the following standardized format: "Instrument type, Manufacturer, Model name". If you did not use any instrument, please provide the methodology used instead, preferably in the following (also standardized) format: "Method type according to Reference et al. (YYYY)". Further details on how to provide measurement device or method information can be found in the respective documentation.

How to prepare tabular Data:
This section summarizes formal and structural requirements concerning data in tabular format. Following these closely substantially reduces the most time-intensive aspect of our editorial work and, thus, supports our efforts to reduce general processing times for submissions to PANGAEA significantly. Notable deviations from these requirements will thus likely result in the rejection of the submission.

Structure of tabular data:
 * Submit your tables as TAB-delimited text files (UTF-8 encoding), or as (open) spreadsheet file formats (e.g. MS Excel .xlsx).
 * The first column should always include the event label, followed by columns with the 3rd geocode (e.g. height/depth) and/or sample IDs and sample information. These are followed by columns including the variables measured (parameters). Each value of a row should refer to the event and the 3rd geocode specified in column 1 and 2.
 * The first row is reserved for the column header that should contain the full parameter names and units in square brackets.
 * Several tables with different structures should always be provided as different data files (or spreadsheets).
 * Please refer to the sheet "Data" in our template files or use the most suitable one of them to report your data right away.

Dos:
 * All parameters/variables must be written out and given with their unit. Units, preferable SI compliant, should be given in square brackets.
 * Write out species names in biological studies, and do not abbreviate the genus name. Spell-check all taxonomic terms, e.g. by using the taxon match tools by the World Register of Marine Species or equivalent taxonomy data provider.
 * Use only the English language for parameters and any text in the data table.
 * The number format in PANGAEA requires a dot as a decimal separator and does not accept a thousands separator.
 * Decimal places should be chosen in a scientifically meaningful way. Do not specify an unnecessary and unrealistic number of decimals. Please note that the number of decimal places represents the precision of your measurement.
 * No special characters are allowed for numeric entries, except for PANGAEA Quality Flags.
 * Missing values are indicated with an empty cell, and NOT with placeholder characters such as '-', 'n/a', 'NaN', -9999 or '*' etc.
 * Measurement values below the detection limit are marked with ”<detection limit value”.
 * Provide only one parameter/variable per column. Multiple values separated by '-', '±', '' (ranges, values with errors, uncertainties, or alternative values in brackets) within a single cell will not be accepted.
 * Abbreviations in the data tables must be explained, for example in a separate comment column or in the file description.
 * Remove empty lines and columns if present.
 * Remove whitespace from file names before uploading them to PANGAEA.

Don‘ts:
 * Do not use any macros, active formulas or references to other files in spreadsheets
 * Do not use any specific formatting design, color coding, or line breaks in spreadsheet cells
 * Do not merge cells in spreadsheets
 * Do not use any note or comment features in spreadsheet files
 * Do not include graphs in spreadsheet files
 * Do not fill cells of missing values with placeholders such as '-', 'n/a', 'NaN', -9999, '*', etc.
 * Do not give more than one value per cell
 * Do not combine several independent tables in one sheet

Data Preparation - Binary files:

 * One data file: Please provide a brief description of what is contained in the data file (Submission: 4. Upload)
 * More than a data file: please add a table, TAB-delimited text files (UTF-8 encoding) or Excel-format, with filename, short description of each file (< 255 characters including spaces) and Geocodes.
 * Use filenames without spaces
 * Use the upload-link function, when you have more than 20 files, or files large than 100 Mb, see Submission: 4. Upload.

______________________________________________________________________________________________

Resources

 * Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10,


 * Elizabeth T. Borer, Eric W. Seabloom, Matthew B. Jones & Mark Schildhauer (2009) Some Simple Guidelines for Effective Data Management, The Bulletin of the Ecological Society of America, 90: 205-214.


 * Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, et al. (2017) Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510.

https://nceas.github.io/datateam-training/reference/

https://nceas.github.io/datateam-training/training/

Submission templates
Good examples! http://www.earthchem.org/data/templates

Examples of data publications
For more information on submissions of frequent types of data see best practice manuals and templates.

The examples below may give a first impression, which information is required for specific scientific fields. The export formats may differ slightly. Please keep in mind that the export format is dynamically produced by the relational database behind PANGAEA. It is thus NOT required to provide the data submission in the exact same technical format; the content is the important part of the data submission.
 * Moorings with trap/current meter
 * Vertical oceanographic profile
 * Horizontal profile/ships track
 * Horizontal distribution of irregular distributed samples
 * Vertical profile
 * Bulk sediment parameter
 * Core logging, Physical properties
 * Hole logging
 * Mineralogy
 * Grain size
 * Pollen
 * Geochemistry
 * Porewater
 * XRF
 * Horizontal profile
 * Ships track data in general
 * Intern:Geophysical profile
 * Reflection seismic
 * Refraction seismic
 * Magnetic
 * Gravimetry
 * Profile versus relative distance
 * Speleotheme
 * Coral
 * Time series
 * Radiation
 * Biological measurements
 * Binary object (data files in various binary formats)
 * photos, images, graphics
 * seismic profiles in sgy-format
 * models
 * Maps
 * Experiments