Best practice of data citation
PANGAEA publishes data in a similar way as scientific journals do. And as such, published data sets are cited in a similar manner. A data citation should contain:
- the authors (creators)
- the publication year
- the dataset title
- the publisher
- a unique persistent identifier (e.g. a DOI)
The full data citation of each referenced data set should be included in the reference list of any publication citing the data. For the general structure, we follow the DataCite recommendations:
|Creator (PublicationYear): Title. Publisher (PANGAEA). Identifier (DOI)
On the landing page of each data set, the suggested citation of the data set is displayed at the top, e.g. see here:
|Timofeeva, Anna; Smolyanitsky, Vasily; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20. PANGAEA, https://doi.org/10.1594/PANGAEA.912021
The citation can be copied or exported in the preferred format using the copy or export buttons below the title. Further buttons enable sharing the reference via social media.
If the data publication is not related to a journal article, it is possible to include the Institution as a source of the data. It will appear in the suggested citation:
|Creator (PublicationYear): Title. Institution. Publisher (PANGAEA). Identifier (DOI)
As an example, see here:
|Burkhardt, Elke (2020): Whale sightings during Polarstern cruise PS95.1 (ANT-XXXI/1.1). Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, PANGAEA, https://doi.org/10.1594/PANGAEA.92405
Where to refer to the data?
As stated above, the full data citation should be contained in the reference list of any work citing the data (Stall et al., 2023).
But where in the text is the right position to refer to the data? Obviously it depends on the context, for example on whether the data is original or reused. Generally, the data can be cited in the methods or results section, or in the data availability section if offered by the journal.
For the latter, a suggestion to refer to the data would be and in-text citation, such as:
"Data for this study were published open access (Authors, YYYY).", followed by the corresponding entry (full citation) of the dataset in the list of references.
Why is the correct citation of datasets important?
First of all, citing sources is good scientific practice. It gives credit to your work and the work of others, and increases reproducibility of findings and thus trust in your research (see also DataCite: why to cite data).
Secondly, in the increasing importance of Open data, citing data sets is an important part of the aspired rewarding system. Metrics are now starting to be provided by several platforms, such as DataCite. However, citations can only be counted if data sets are referred to in the correct manner.
Data sets "in review"
During the archiving and review process or when a moratorium is set on the data, for example due to the publication status of a connected manuscript, the data is kept in the status "in review". Datasets "in review" are usually only accessible for the contributing authors after logging in to PANGAEA. However, the metadata, including authors, title, references and parameters are already findable and visible in PANGAEA. At this stage, the data will be displayed on the website with a preliminary link instead of a registered, persistent DOI. This preliminary link can be recognized by the following format:
https://doi.pangaea.de/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)
It can only be resolved by the PANGAEA DOI resolver. Once the review process is finished, the DOI will be registered and take the form of
https://doi.org/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)
It is important to note that datasets "in review" might be modified or even deleted during the review process. Only the second form guarantees persistent access and reference to the data. Citation of any data with the status "in review" should be avoided.
Publication of data in PANGAEA
After technical review by the editor, import and approval of the author/PI, a dataset is set to status published and appears as citable on the Internet. Upon publication of a data set, the DOI registration is initiated. This process is finalized after 28 days. During this time, the data set can still be modified. However, after finalizing the DOI registration, the data set is published and cannot be changed anymore. Any changes to the dataset would be analogous to an erratum of a journal article.
Small adjustments, as the correction of small mistakes or typos, can still occur and are displayed as metadata „Change history“, both on the data set landing page (below the parameter overview) and the downloaded data set. As an example see here:
|Change history: 2020-03-25T13:34:53 – Parameter Ice thickness [m] exchanged with Parameter Thickness of ice accretion [cm], no recalculation of values necessary
For further details see the Author Preparation section of Stall et al., 2023. This includes information on datasets and software citation in research articles, how to structure these citations and provide information on selecting the best possible scientific repositories to use for data and software, and what information to put in an Availability Statement.
- Stall, S., Bilder, G., Cannon, M. et al. Journal Production Guidance for Software and Data Citations. Sci Data 10, 656 (2023). doi:10.1038/s41597-023-02491-7