Citation

From PANGAEA Wiki
Jump to navigation Jump to search

Why is the correct citation of datasets important?

Citing datasets is fundamental to good scientific practice (Stall et al. 2023). It gives proper credit to your own work as well as to the work of others, while also increasing the reproducibility of results and, in turn, trust in research (see also DataCite: Why cite data?). Proper citation also ensures that laboratory and support staff receive due recognition for their contributions to the measurements, even if they were not directly involved in the manuscripts derived from the data.

With the growing importance of open data, dataset citation has also become a central element of the scientific reward system. Citation metrics, increasingly provided by platforms such as DataCite, only function when datasets are cited correctly. By listing datasets in the reference section, you help ensure they are indexed, discoverable, and counted toward scholarly metrics.

Best practice for citing data

Citation tools (copy and export citation) located below the data set reference.

PANGAEA publishes datasets in much the same way that scientific journals publish articles. For this reason, PANGAEA datasets should be cited formally and consistently, just like other publications such as research articles. Citations should always appear in the reference list of any publication that uses the data.

PANGAEA’s citation practices follow international recommendations and guidelines (Stall et al., 2023), reflecting the growing recognition of data as a fundamental resource for scientific research.

On the landing page of each dataset, the suggested citation is displayed at the top. The citation can be copied or exported in the preferred format using the copy or export buttons below the title. Further buttons enable sharing the reference via social media.

Please note that citation of datasets "in review" should be avoided (see further information below).

In addition to citing individual datasets, it is strongly recommended to acknowledge PANGAEA as the data publisher, for instance in the method or data availibilty section. The following publication, authored by the PANGAEA team, describes the repository, data archiving workflow, and its infrastructure:

Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x

Citing PANGAEA is optional but highly appreciated, as it supports the maintenance and development of the service.

Where and how to cite datasets

1. Full citations in the reference list

The reference list of a publication should include full citations for all datasets used. This enables automated attribution and credit through Crossref’s Event Data.

a) Standalone datasets

Standalone datasets are self-contained data publications that are not part of a larger collection.

Example citations:

Timofeeva, Anna; Smolyanitsky, Vasily M; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021

Bauch, Dorothea; Meyer, Hanno; Damm, Ellen; D'Angelo, Alessandra; Mellat, Moein; Granskog, Mats A; Weiner, Mikaela; Marent, Andreas (2024): Stable water isotopes of sea ice at biogeochemistry sites (BGC) and Main Coring Sites (MCS) during MOSAiC expedition, leg 1 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.971330.

b) Bundled publications and publication series (Dependent Collections)

Some datasets are part of Dependent Collections, which distinguish between Parents and Child datasets:

  • A Parent provides the overarching collection and context but contains no data.
  • Child datasets are the individual, data-bearing datasets within a Parent.

Each Child dataset should be cited individually to improve visibility and ensure correct attribution, even when a Parent citation is provided. Child datasets are always cited with reference to the Parent, similar to citing a book chapter within a book. Citing the Parent alone is therefore redundant.

Example citations:

Child:

Zabel, Matthias (2022): Pore water analyses of sediment core GeoB16426-1 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.947262, In: Zabel, M (2022): Pore water and solid phase data from deep-sea trench sediments [dataset bundled publication]. PANGAEA, https://doi.org/10.1594/PANGAEA.947269

Parent:

Zabel, Matthias (2022): Pore water and solid phase data from deep-sea trench sediments [dataset bundled publication]. PANGAEA, https://doi.org/10.1594/PANGAEA.947269

c) Editorial Publications and Bibliographies (Independent Collections)

Independent Collections group datasets that can be used and cited independently. Individual datasets may belong to multiple collections.

Example citations:

Collection:

Eisen, Olaf; Steinhage, Daniel; Franke, Steven; Helm, Veit; Binder, Tobias; Drews, Reinhard; Eagles, Graeme; Humbert, Angelika; Jansen, Daniela; Jokat, Wilfried; Lambrecht, Astrid; Mieth, Matthias; Riedel, Sven; Miller, Heinrich (2024): Collection of datasets from AWI's radio-echo sounding systems on ice sheets and glaciers [dataset bibliography]. PANGAEA, https://doi.org/10.1594/PANGAEA.972094

Independent dataset:

Eagles, Graeme; Ruppel, Antonia; Läufer, Andreas; Steinhage, Daniel; Helm, Veit (2025): ANT 2015/16: AWI airborne Radio-Echo Sounding data western DML over the Maud Belt and Ekström Ice Shelf (GEA-V-FMA project) [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.987347

2. In-text citations (Author-year format)

a) Main part of a publication (e.g., methods or results sections)

In the body of a publication, datasets should be cited in the author-year format (Authors, YYYY), just like journal articles and other publications, and must be accompanied by a full entry in the reference list.

Example:

This study makes use of observational data from MOSAiC leg 1 (Timofeeva et al., 2020, Bauch et al., 2024), both datasets being published via the data publisher PANGAEA - Data Publisher for Earth & Environmental Science (see Felden et al., 2023 for a description of the repository)

b) Data availability statements

A data availability statement is written for the reader and clearly states where the supporting datasets and any relevant software are located, as well as details about accessibility. In accordance with the recommendations in Stall et al. (2023) and the AGU Availability and Citation Checklist for Authors, the statement should include an in-text citation in author-year format (authors, YYYY), along with a full reference list entry and key information about the datasets. Authors should provide a brief description of the data, the repository name, persistent identifiers (DOIs), and licensing or access conditions.

Example:

The data supporting this study include observational sea ice records (Timofeeva et al., 2020; https://doi.org/10.1594/PANGAEA.912021) and geochemical porewater analyses (Bauch et al., 2024; https://doi.org/10.1594/PANGAEA.956325). Both datasets are openly available through PANGAEA – Data Publisher for Earth & Environmental Science (Felden et al., 2023; https://doi.org/10.1038/s41597-023-02269-x). Data are licensed under CC-BY and accessible without restrictions.

Structure of a dataset citations

Required elements

A complete dataset citation should contain the following elements:

  • Authors (creators)
  • Year of publication
  • Title of dataset
  • Type of dataset publication (e.g., "Dataset", "Publication series", "Bundled publication", "Editorial publication" or "Dataset bibliography")
  • Publisher (e.g., PANGAEA)
  • Persistent identifier (DOI)

General Citation Format

Creator (PublicationYear): Title [type]. Publisher, Identifier

Example:

Timofeeva, Anna; Smolyanitsky, Vasily; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021

Institutional Authorship

In some cases, data are submitted on behalf of an institution rather than individual researchers. Here, the institution is listed as the creator and is responsible for the acquisition of data and related science:

Creator (PublicationYear): Title [type]. Institution, Publisher, Identifier

Example:

Nicolaus, Marcel; Hoppmann, Mario; Tao, Ran; Katlein, Christian (2023): Spectral radiation fluxes, albedo and transmittance from autonomous measurements from Radiation Station 2020R21, deployed during MOSAiC 2019/20 [dataset bundled publication]. Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, PANGAEA, https://doi.org/10.1594/PANGAEA.948838

Citation of datasets "in review" should be avoided

Data that are still in the archiving and review process, or that are under a moratorium, are not yet considered published entities and therefore must not be cited. Such datasets are not persistent because they may change or become unavailable. In practice, this means that:

  • they are usually accessible only to contributing authors after logging in to PANGAEA, and
  • they may be modified or even deleted during the review process.

Metadata of datasets under review is already displayed and findable on the PANGAEA website, but they contain a preliminary link ("...doi.pangaea.de") that can only be resolved with the PANGAEA DOI resolver and be easily confused with the final and persistent DOI (...doi.org..) . Please avoid using the preliminary link for citation.

The preliminary link can be recognized by the following format:

https://doi.pangaea.de/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)

It can only be resolved by the PANGAEA DOI resolver. Once the review process is finished, the link will take the final form, corresponding to the citable DOI:

https://doi.org/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)

Further reading

PANGAEA follows the DataCite recommendations and the Author Preparation guidelines of Stall et al. (2023) that includes information on datasets and software citation in research articles, how to structure these citations and provide information on selecting the best possible scientific repositories to use for data and software, and what information to put in an Availability Statement.

References: