Citation
Why is the correct citation of datasets important?
đ How to cite a PANGAEA dataset in a nutshell
- Go to the dataset landing page
- Copy the suggested citation at the top
- Paste it into your reference list
- Cite it in the text as (Author, Year)
â Always include the dataset in the reference list
â Always use the official DOI (https://doi.org/âŠ)
â Do not cite the non-persistent link (https://doi.pangaea.de/...)
Citing datasets is fundamental to good scientific practice (Stall et al. 2023). It gives proper credit to your own work as well as to the work of others, while also increasing the reproducibility of results and, in turn, trust in research (see also DataCite: Why cite data?). Proper citation also ensures that laboratory and support staff receive due recognition for their contributions to the measurements, even if they were not directly involved in the manuscripts derived from the data.
With the growing importance of open data, dataset citation has also become a central element of the scientific reward system. Citation metrics, increasingly provided by platforms such as DataCite, only function when datasets are cited correctly. By listing datasets in the reference section, you help ensure they are indexed, discoverable, and counted toward scholarly metrics.
Best practice for citing data
PANGAEA publishes datasets in much the same way that scientific journals publish articles. For this reason, PANGAEA datasets should be cited formally and consistently, just like other publications. Citations should always appear in the reference list of any publication that uses the data.
PANGAEAâs citation practices follow international recommendations and guidelines (Stall et al., 2023), reflecting the growing recognition of data as a fundamental resource for scientific research.
On the landing page of each dataset, the suggested citation is displayed at the top. The citation can be copied or exported in the preferred format using the copy or export buttons below the title. Further buttons enable sharing the reference via social media.
Please note that citation of datasets "in review" should be avoided (see further information below).
In addition to citing individual datasets, it is strongly recommended to acknowledge PANGAEA as the data publisher, for instance in the method or data availability section. The following publication, authored by the PANGAEA team, describes the repository, data archiving workflow, and its infrastructure:
Felden, Janine; Möller, Lars; Schindler, Uwe; Huber, Robert; Schumacher, Stefanie; Koppe, Roland; Diepenbroek, Michael; Glöckner, Frank Oliver (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x
Citing PANGAEA is optional but highly appreciated, as it supports the maintenance and development of the service.
Where and how to cite datasets
Every dataset citation has 2 parts:
- Full citation â reference list
- Short citation â in-text
1. Full citations in the reference list
The reference list of a publication should include full citations for all datasets used. This enables automated attribution and credit through Crossrefâs Event Data.
a) Standalone datasets
Standalone datasets are self-contained data publications that are not part of a larger collection.
Example citations:
Timofeeva, Anna; Smolyanitsky, Vasily M; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021
Bauch, Dorothea; Meyer, Hanno; Damm, Ellen; D'Angelo, Alessandra; Mellat, Moein; Granskog, Mats A; Weiner, Mikaela; Marent, Andreas (2024): Stable water isotopes of sea ice at biogeochemistry sites (BGC) and Main Coring Sites (MCS) during MOSAiC expedition, leg 1 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.971330.
b) Bundled publications and publication series (Dependent Collections)
Some datasets are part of Dependent Collections, which distinguish between the overall dataset Collection and Member datasets:
- A Collection provides the overarching framework and context but contains no data.
- Member datasets are the individual, data-bearing datasets within a Collection.
For Dependent Collections, either the Collection is cited - which is usually sufficient - or individual member datasets are cited when there is a need to refer explicitly to a specific subset. In the latter case, member datasets are always cited in the context of the Collection they belong to, similar to citing a chapter within a book.
Example citations:
Collection:
Zabel, Matthias (2022): Pore water and solid phase data from deep-sea trench sediments [dataset bundled publication]. PANGAEA, https://doi.org/10.1594/PANGAEA.947269
Member dataset:
Zabel, Matthias (2022): Pore water analyses of sediment core GeoB16426-1 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.947262, In: Zabel, M (2022): Pore water and solid phase data from deep-sea trench sediments [dataset bundled publication]. PANGAEA, https://doi.org/10.1594/PANGAEA.947269
c) Editorial Publications and Bibliographies (Independent Collections)
Independent Collections group datasets that can be used and cited independently. Individual datasets may belong to multiple collections.
Example citations:
Eisen, Olaf; Steinhage, Daniel; Franke, Steven; Helm, Veit; Binder, Tobias; Drews, Reinhard; Eagles, Graeme; Humbert, Angelika; Jansen, Daniela; Jokat, Wilfried; Lambrecht, Astrid; Mieth, Matthias; Riedel, Sven; Miller, Heinrich (2024): Collection of datasets from AWI's radio-echo sounding systems on ice sheets and glaciers [dataset bibliography]. PANGAEA, https://doi.org/10.1594/PANGAEA.972094
2. In-text citations (Author-year format)
a) Main part of a publication (e.g., methods or results sections)
In the body of a publication, datasets should be cited in the author-year format (Authors, YYYY), just like journal articles and other publications, and must be accompanied by a full entry in the reference list.
Example:
This study makes use of observational data from MOSAiC leg 1 (Timofeeva et al., 2020, Bauch et al., 2024), both datasets being published via the data publisher PANGAEA - Data Publisher for Earth & Environmental Science (see Felden et al., 2023 for a description of the repository)
b) Data availability statements
A data availability statement is written for the reader and clearly states where the supporting datasets and any relevant software are located, as well as details about accessibility. In accordance with the recommendations in Stall et al. (2023) and the AGU Availability and Citation Checklist for Authors, the statement should include an in-text citation in author-year format (authors, YYYY), along with a full reference list entry and key information about the datasets. Authors should provide a brief description of the data, the repository name, persistent identifiers (DOIs), and licensing or access conditions.
Example:
The data supporting this study include observational sea ice records (Timofeeva et al., 2020; https://doi.org/10.1594/PANGAEA.912021) and geochemical porewater analyses (Bauch et al., 2024; https://doi.org/10.1594/PANGAEA.956325). Both datasets are openly available through PANGAEA â Data Publisher for Earth & Environmental Science (Felden et al., 2023; https://doi.org/10.1038/s41597-023-02269-x). Data are licensed under CC-BY and accessible without restrictions.
Structure of a dataset citations
Required elements
A complete dataset citation should contain the following elements:
- Authors (creators)
- Year of publication
- Title of dataset
- Type of dataset publication (e.g., "Dataset", "Publication series", "Bundled publication", "Editorial publication" or "Dataset bibliography")
- Publisher (e.g., PANGAEA)
- Persistent identifier (DOI)
General Citation Format
Creator (PublicationYear): Title [type]. Publisher, Identifier
Example:
Timofeeva, Anna; Smolyanitsky, Vasily; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021
Citation of datasets "in review" should be avoided
Data that are still under review are not yet considered published entities and therefore should not be cited. Such datasets are not persistent because they may still undergo changes or become unavailable. Datasets under review are already displayed on the PANGAEA website, but they contain a non-persistent link that can only be resolved by the PANGAEA DOI resolver with the following format:
https://doi.pangaea.de/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)
Please avoid using the non-persistent link for citation! Once the review process is finished, the link in the dataset header will take the final form, corresponding to the citable DOI:
https://doi.org/10.1594/PANGAEA.XXXXXX (XXXXXX = DataSetID)
Further reading
PANGAEA follows the DataCite recommendations and the Author Preparation guidelines of Stall et al. (2023) that includes information on datasets and software citation in research articles, how to structure these citations and provide information on selecting the best possible scientific repositories to use for data and software, and what information to put in an Availability Statement.
References:
- Stall, S., Bilder, G., Cannon, M. et al. (2023): Journal Production Guidance for Software and Data Citations. Scientific Data, 10, 656. https://doi.org/10.1038/s41597-023-02491-7
- Felden, J.; Möller, L.; Schindler, U.; Huber, R.; Schumacher, S.; Koppe, R.; Diepenbroek, M.; Glöckner, F. O. (2023): PANGAEA - Data Publisher for Earth & Environmental Science. Sci Data 10, 347 (2023). https://doi.org/10.1038/s41597-023-02269-x
- DataCite. Why cite data
- Crossref. Event Data
- AGU Availability and Citation Checklist for Authors