Term catalogue
The PANGAEA term catalogue
The PANGAEA term catalogue is an integral part of the internal relational data management system. It functions as a thesaurus-like "construction kit" that enables the use of controlled vocabularies - including thesauri, taxonomies, terminologies, and ontologies – for enriching and standardizing metadata across datasets. It consists of terms (concepts) and relationships that can be linked to various components of the PANGAEA data model (e.g., parameters, methods/devices).
To support consistent metadata use across disciplines, the PANGAEA term catalogue includes both internally maintained vocabularies such as "Microbiochemistry, PANGAEA" or "Keywords" and externally curated vocabularies such as The World Register of Marine Species (WoRMS) and Chemical Entities of Biological Interest (ChEBI). PANGAEA maintains bidirectional workflows with some external vocabularies – new species names, for instance, can be submitted to WoRMS, while updated records at WoRMs are regularly imported to PANGAEA.
The term catalogue in PANGAEA is designed to support structured and semantically rich metadata annotation. Each term in the catalogue can be linked to other terms through qualified relationships. Current PANGAEA relationships are, e.g. "has broader term", "has attribute", "is synonym of", "is same as", "is related to", and more. These semantic relations form an open, hierarchical, and context-sensitive structure that enables flexible classification from multiple perspectives and supports the semantic alignment of equivalent concepts across different vocabularies. Wherever possible, terms are linked to persistent identifiers (URIs) to ensure compatibility with semantic web technologies and to minimize redundancy.
Purpose and benefits
By relying on shared, well-documented vocabularies, the term catalogue ensures metadata is described in a standardized, unambiguous, and machine-readable format. This improves semantic consistency within PANGAEA, enables the integration of heterogeneous datasets, and facilitates alignment with external systems.
In practice, the catalogue improves:
- Metadata quality by refering to unambigous concepts and a unifying terminology, and preventing duplication.
- Search functionality by supporting hierarchical filtering (faceting), synonym recognition, thematic grouping, and broader-term expansion - for example, a temperature parameter like “temperature, water” and “temperature, ice” is automatically annotated with two terms: the first, temperature, represents the measured quantity (quantity kind in PANGAEA), and the second, introduced here by a preposition (“water” or “ice”), defines the measurement environment (feature in PANGAEA; for more information see https://doi.org/10.1016/j.jbiotec.2017.07.016). This allows such parameters to be grouped differently depending on the domain context (e.g., measured in water vs. in ice). For further details on the PANGAEA search, also see PANGAEA search.
- Metadata validation - for example, a relation like "is unit of" or "is method of" could be used to ensure that only appropriate units or methods are assigned to a specific parameter.
How is the metadata annotated with terms from the term catalogue?
Many metadata fields in PANGAEA are initially populated using free-text descriptions. These entries are subsequently annotated with appropriate controlled terms from either externally maintained vocabularies or PANGAEA’s own internal terms. In most cases - particularly for parameters - this annotation process is semi-automated. Semi-automated annotation refers to a script-based approach that analyzes segments of the input text. The script parses the string based on predefined rules, such as character position or specific separators (e.g., commas), to identify distinct sections and match them with corresponding terms in the catalogue. For example, characters in positions 1–10 may be linked to one term, while a comma signals the beginning of a new annotation segment. A single parameter section can only be associated with one controlled term directly. When assigning the vocabularies, they are ranked differently, so that it is always clear which vocabulary is used. After this automatic step, data editors can manually review and approve the suggested annotations to ensure correctness and consistency.
In other metadata fields, such as methods or devices, the annotation is currently performed manually by data editors. In certain fields - especially those involving keywords or geographic locations - metadata is directly described using controlled terms from the catalogue without any free-text input.
Annotated terms are publicly visible and are displayed on the PANGAEA website. When users hover over an annotated element, the linked term and its source are revealed in a tooltip-like overlay. For technical details or contributions, see the parameter-annotator GitHub repository
Viewing the terms used to annotate metadata
The terms used to enrich PANGAEA datasets can be both inspected by users through the web interface and harvested by machines.

Seeing terms in the user interface
In the PANGAEA web interface, it is not always visible whether metadata entries (e.g., locations or keywords) were taken directly from a controlled vocabulary or entered as free text. However, when freetext metadata have been explicitly annotated with terms afterwards, this metadata will contain a link. Clicking on it will open a popup window showing which controlled terms were used for annotation
Retrieving terms through dataset harvesting
Terms are exposed in both the schema.org format (see example dataset, more details on the schema are available at schema.org and here) as well as in the PANGAEA metadata schema format (see example dataset; more information on standard PANGAEA metadata interfaces are available here). Terms appear in fields such as parameters, methods, Event, locations, and dataset keywords. Each term is provided with a semantic identifier (semantic URI) that can be harvested by external systems. The terms alone, as well as the full relations between terms and the underlying terminologies, cannot currently be obtained from PANGAEA as linked RDF data. For some terminologies, however, term lists are available for download.
List of implemented terminologies
External vocabularies
Name of taxonomy/dictionary/terminoloy/ontology | PANGAEA abbreviation | Description | URI | Technical details | Used to annotate which metadata? |
---|---|---|---|---|---|
The World Register of Marine Species, Aphia 1.0 | WoRMS | WoRMS: The World Register of Marine Species (WoRMS) provides an authoritative and comprehensive list of names of marine organisms, including information on synonymy.
Aphia: The Aphia platform is an infrastructure designed to capture taxonomic and related data and information, and includes an online editing environment. WoRMS includes information on algae by AlgaeBase, which is redistributed by WoRMS with permission. WoRMS does not include prokaryotes. |
https://www.marinespecies.org/ | PANGAEA imports taxonomic terms and relations from the World Register of Marine Species (WoRMS). These terms are updated monthly via data dumps provided by WoRMS.
Data editors may manually create “non-accepted” terms (with status “PANGAEA accepted”) if they are submitted to WoRMS and deleted if rejected. Accepted terms will be updated automatically. Example for a term entry: https://ws.pangaea.de/es/pangaea-terms/term/1047579 |
Parameters (text segments of parameters that are taxonomic features, e.g. "Ammodiscus planus" or "Calanus finmarchicus, female, biomass as dry weight"), and taxonomic contents of data series (if those are associated with term-related taxomic parameters).
WoRMS terms are semi-automatically annotated with first priority. Only if no suitable WoRMS term is available, an ITIS term is annotated where possible. |
Integrated Taxonomic Information System | ITIS | ITIS provides authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. | http://www.itis.gov/ | PANGAEA imports taxonomic terms and relations from ITIS. These terms are updated on a monthly basis
Importer available at Github https://github.com/pangaea-data-publisher/pg_itis_importer|| Parameters (taxonomical features), and contents of data series (if those are associated with term-related taxomic parameters) only if no WoRMS term is available Semi-automatic annotation | |
Environment ontology | ENVO | ENVO is a community ontology for the concise, controlled description of environments. | http://obofoundry.org/ontology/envo.html | Single-time import from ENVO to PANGAEA, no importer | Parameters (text segments of parameters that are environmental features, e.g. "Deuterium excess, water vapour")
Semi-automatic annotation |
Phenotype And Trait Ontology | PATO | PATO provides terms for phenotypic qualities (properties). This ontology can be used in conjunction with other ontologies such as Gene ontology (GO) or anatomical ontologies to refer to phenotypes. Examples of qualities are red, ectopic, high temperature, fused, small, edematous and arrested. | http://obofoundry.org/ontology/pato.html | The availability of new PATO versions is checked on a monthly basis. If a new version is available, selected ontology terms and relation of the ontology are imported into PANGAEA by processing the OWL file provided at Github
During this import, only selected ontology elements (not all relations) are imported. || Parameters (text segments of parameters that are phenotypic features, e.g. "Depth, water") Semi-automatic annotation | |
Quantities, Units, Dimensions and DataTypes Ontology | qudt | QUDT.org is a public charity nonprofit organization founded to provide semantic specifications for units of measure, quantity kind, dimensions and data types.
QUDT version 1.1 is a structured vocabulary and ontology that provides a standardized representation of units of measurement and their corresponding quantities. It includes definitions for physical quantities (e.g., length, mass, temperature), units (e.g., meter, kilogram, kelvin), and the relationships between them (e.g., conversion factors, symbols, dimensional analysis). |
https://www.qudt.org/ | Onetime Import of version QUDT 1.1 as part of the "Quantities, PANGAEA" terminology.
Later versions of QUDT diverged from the PANGAEA requirements, as they incorporate broader and more complex structures beyond the scope of units and their associated measurement quantities, but lack some of the necessary mappings PANGAEA relies on. |
QUDT forms the core of the custom "Quantities, PANGAEA", powering the background mapping of units to UCUM and linking parameters to quantities.
For details, see description below Note: PANGAEA plans to change all units to UCUM |
Chemical Entities of Biological Interest | ChEBI | ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. | https://www.ebi.ac.uk/chebi/ | ChEBI terms are imported to PANGAEA on a monthly basis by processing the OWL file located at Github | Parameters (chemical features)
Semi-automatic annotation Challenge: Systematics is somewhat more detailed and sophisticated than we can make use of; the assignment doesn't work particularly well because it's often not clear which term is the right one; Based on InCHi keys |
NERC device categories | NERC-L05-L22 | NERC SeaDataNet device categories (L05) is a terminology providing standardized categories for marine instruments (“device types”)
The NERC SeaVoX Device Catalogue (L22) lists specific device models with detailed metadata. |
https://vocab.nerc.ac.uk/collection/L05/current/ | Importer available at Github | Many existing methods have been manually annotated with terms from the terminology. Newly created methods can be annotated manually by data editors.
Automated annotation is planned for the future. Annotation can be done with terms of any granularity level (Level 1, level 2, level 3, level 4) depending on how generic the method is and how many details the authors provided. |
PANGAEA-own vocabularies
Name of terminoloy | PANGAEA abbreviation | Description | URI | Technical details | Used to annotate which metadata instances? |
---|---|---|---|---|---|
Quantities, PANGAEA | PAN-Quantity | The "Quantities, PANGAEA" terminology comprises units and quantities from the harvested QUDT 1.1 terminology as well as other complementary quantities defined by PANGAEA
Later versions of QUDT diverged from the PANGAEA requirements, as they incorporate broader and more complex structures beyond the scope of units and their associated measurement quantities, but lack some of the necessary mappings PANGAEA relies on. |
Unpublished, only annotated terms are visible to the public | Onetime import of Quantities of QUDT version 1.1 | Parameters (quantities)
Units archived in PANGAEA are automatically processed by a script from the GitLab repository PUCUM, which converts them into the standardized UCUM format (Unified Code for Units of Measure). In addition to the unit itself, this format also specifies the underlying dimension (e.g., {Length}, {Mass}). A JSON file in the repository defines how these UCUM dimensions are mapped to the corresponding Quantities in PANGAEA. This ensures that units are interpreted consistently and linked reliably to the relevant quantities in the system. |
Classifying terms, PANGAEA | PAN-CT | The "Classifying terms, PANGAEA" terminology is used to define the thematic faceting displayed under “Topics” on the PANGAEA start page. It contains mappings to other terms, enabling terms used within datasets to be correctly classified into the appropriate topic (for further information, see Topic). | Visible via the Facetting in the PANGAEA Search (“Topic” Facets) | Weekly dump
Generates search index |
|
PAN-MicroBio | Microbiochemistry, PANGAEA | Manually created feature ontology meant to build a framework that embeds and connects other vocabularies in the term catalague, e.g. CHEBI and Classifying Terms
|
Unpublished, only the annotated terms are visible to the public | Created manually, not completed | Parameters (features) |
Methods and Devices, PANGAEA | PAN-M&D | The PANGAEA Methods and Devices terminology was developed in 2021 with the aim of structuring and harmonizing the methods/devices metadata. The terminology is a mix of own PANGAEA terms for broad categories for methods and devices and terms from NERC L05 and L22 (see above) that are integrated and represent more detailed information on device types and device models (for further information, see Intern:Method and Devices Terminology) | Unpublished, only the annotated terms are visible to the public | Created manually
Structure of the terminology:
Note: For methods/devices used in events, more generic terms might apply than for methods/devices associated with parameters In the long term, a process for automated annotation is planned. | |
Locations, PANGAEA | PAN-Loc | PANGAEA-own terminology for geographic locations with terms created and edited by data editors in the course of the curational workflow (for further information, see Intern:Locations)
Should be reviewed and cleaned up when time allows (e.g. upper/lower cases, remove duplicates etc.)
|
Unpublished, only the annotated terms are visible to the public | PANGAEA locations are created as needed and maintained by the PANGAEA data editors. Naming is guided by established standards like ISO-3166, marineregions.org, gebco.net, and geonames.org. For more details, see Intern:Locations | Secondary (optional) Event metadata, supplementing coordinates or geocodes (manual addition by data editors)
In addition to the locations manually assigned to Events by curators, additional locations are automatically added to datasets in the background to enhance search structuring. Based on latitude and longitude, certain locations (e.g. continents) are derived from this IHO list (available at https://www.marineregions.org/files/S23_1953.pdf). Note: Location information is also added as start/end attributes to “Campaigns”, but so far only City names with country, i.e. Bremerhaven, Germany, (added as text string) from the C38 SeaDataNet list of port cities |
Keywords, PANGAEA | PAN-Key | PANGAEA keyword terminology with terms created and edited by data editors in the course of the curational workflow.
Should be reviewed and cleaned up when time allows (e.g. upper/lower cases, remove duplicates etc.) |
?? | PANGAEA locations are created and added to metadata by PANGAEA data editors as needed (often based on suggestions by the data authors). | Manual tagging of data sets, staffs, institutions, Events, parameters, and references - extendible to other tables |
Technical keywords, PANGAEA | PAN-TechKey | A vocabulary used for the manual tagging of metadata, primarily for technical classification and organizational purposes (e.g., creation of selections).
Vocabulary comprises parts of the keywords from the former, no longer existing “PANGAEA thesaurus” -The other part moved into “PANGAEA, keywords”. |
PANGAEA-own list, partially downloadable (only keywords that are in use): | PANGAEA technical keywords are created and maintained by the technical PANGAEA staff.
Formatting strictly regulated, only letters and numbers, no special characters (ASCI only) |
Manual tagging of personnel entries, institutions, events, parameters, records and references (and possibly other tables) |
Data Model Extensions, PANGAEA | PAN-ModExt | PANGAEA terminology collecting all PANGAEA attributes and their relationship to scientific disciplines. | Unpublished, only the annotated terms are visible to the public | For more details, see Intern:Attributes | In selected cases, attributes can be added to Events, campaigns, datasets, and, in the future, data series. |
PANGAEA is considering including more recognized terminologies, such as Uberon and SWEET, in the catalogue over time.
References
Diepenbroek, M et al. (2017): Terminology supported archiving and publication of environmental science data in PANGAEA. Journal of Biotechnology, 261, 177-186, https://doi.org/10.1016/j.jbiotec.2017.07.016
WoRMS Editorial Board (2023) World Register of Marine Species. Available from https://www.marinespecies.org at VLIZ. https://doi.org/10.14284/170
Integrated Taxonomic Information System (ITIS) (2023) www.itis.gov, CC0, https://doi.org/10.5066/F7KH0KBK
Guiry MD & Guiry GM (2023) AlgaeBase. World-wide electronic publication, National University of Ireland, Galway. https://www.algaebase.org
Buttigieg, P.L., Morrison, N., Smith, B. et al. (2013) The environment ontology: contextualising biological and biomedical entities. Journal of Biomedical Semantics;4:43. https://doi.org/10.1186/2041-1480-4-43
FAIRsharing.org (2023) QUDT; Quantities, Units, Dimensions and Types, https://doi.org/10.25504/FAIRsharing.d3pqw7
Hastings J, Owen G, Dekker A, et al. (2016) ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research;44(D1):D1214-D1219. https://doi.org/10.1093/nar/gkv1031
British Oceanographic Data Centre (2023) The NERC Vocabulary Server, Natural Environment Research Council. https://vocab.nerc.ac.uk
Kim S, Chen J, Cheng T, et al. (2023) PubChem 2023 update. Nucleic Acids Research;51(D1):D1373–D1380. https://doi.org/10.1093/nar/gkac956