PANGAEA Wiki - User contributions [en]

Event-Campaign-Merge

2026-04-28T13:22:48Z

Uschindler: image size

__NOTOC__
= Modernizing the PANGAEA Event Data Structure =
[[File:Event-campaign-merge-logo.png|thumb|300x300px]]
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the [https://marine-data.de/ Marine Data Portal]. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own [https://www.pangaea.de/expeditions/ Expeditions] web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

File:Event-campaign-merge-logo.png

2026-04-28T13:20:15Z

Uschindler: Uschindler uploaded a new version of File:Event-campaign-merge-logo.png

Logo of the Event-Campaign merge

Event-Campaign-Merge

2026-04-28T11:38:28Z

Uschindler: /* For General Web Users and Website Visitors */ add links

__NOTOC__
= Modernizing the PANGAEA Event Data Structure =
[[File:Event-campaign-merge-logo.png|thumb]]
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the [https://marine-data.de/ Marine Data Portal]. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own [https://www.pangaea.de/expeditions/ Expeditions] web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

Event-Campaign-Merge

2026-04-28T11:35:36Z

Uschindler: move image to right

__NOTOC__
= Modernizing the PANGAEA Event Data Structure =
[[File:Event-campaign-merge-logo.png|thumb]]
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the Marine Data Portal. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own Expeditions web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

Event-Campaign-Merge

2026-04-28T11:34:31Z

Uschindler: Add logo

__NOTOC__
= Modernizing the PANGAEA Event Data Structure =
[[File:Event-campaign-merge-logo.png|left|thumb]]
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the Marine Data Portal. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own Expeditions web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

File:Event-campaign-merge-logo.png

2026-04-28T11:33:58Z

Uschindler:

Logo of the Event-Campaign merge

Event-Campaign-Merge

2026-04-28T11:27:32Z

Uschindler: remove TOC

__NOTOC__
= Modernizing the PANGAEA Event Data Structure =
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the Marine Data Portal. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own Expeditions web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

Event-Campaign-Merge

2026-04-28T11:24:59Z

Uschindler: change title

= Modernizing the PANGAEA Event Data Structure =
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the Marine Data Portal. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own Expeditions web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

Event-Campaign-Merge

2026-04-28T11:24:11Z

Uschindler: Uschindler moved page Event-Campaign Merge to Event-Campaign-Merge without leaving a redirect

= Modernizing the PANGAEA Data Structure – The Event-Campaign Merge =
To better support the evolving needs of the global research community across all disciplines, we are implementing a significant update to our database architecture. The centerpiece of this modernization is the '''"Event-Campaign Merge,"''' which transitions our metadata structure around data collection events from a rigid hierarchy into a flexible, hierarchical tree structure.

== Why is the structure changing? ==
Until now, PANGAEA was built on a fixed two-level hierarchy (Campaign ⇒ Event). While effective for marine expeditions, this was often too restrictive for other fields of science. The new structure allows for:

* '''Interdisciplinary Support:''' Better representation for Social Sciences (e.g., using "Study" instead of “Campaign”) and Lab Experiments.
* '''Greater Flexibility:''' The process of data collection (events) can now be better organized into multiple nested levels, making it easier to represent complex sampling or long-term monitoring.
* '''Reduced Redundancy:''' Streamlining how event metadata is stored and retrieved.

== Key Improvements at a Glance: ==

* '''Hierarchical Events:''' Technically, Campaigns are being integrated as a specific type of "Event." This allows for a more natural "parent-child" relationship between different stages of data collection. A campaign event is the parent of several generic events describing the sampling or data collection.
* '''Enhanced Metadata:''' Events can now directly store information regarding the '''"Basis"''' (e.g., a research vessel or a specific instrument) and '''"Responsible Staff / Principal investigators'''” (formerly only available as “chief scientists” in campaigns).
* '''Advanced Geolocation:''' We are introducing a '''"Location 2"''' field complementing the already existing “Latitude/Longitude 2” fields. This allows us to accurately map trajectories or transect events by defining both start and end points.

== What to Expect During the Transition ==
We are entering a transition phase of a few weeks as we migrate our datasets and infrastructure. Depending on how you interact with PANGAEA, this may affect you differently.

=== For General Web Users and Website Visitors ===
The impact on the user interface will be minimal. You may notice some minor inconsistencies in labels or metadata display (e.g., how a "Campaign" is titled or nested). However, the search functionality and data access will remain fully operational.

Please be aware that PANGAEA cannot control how event and campaign metadata is displayed on external websites or third-party platforms, such as the Marine Data Portal. These portals use their own code and logic to interpret our data structures. Until these external providers update their systems to align with our new architecture, there may be discrepancies in how our metadata appears on their sites.

The same applies for PANGAEA’s own Expeditions web page that also needs to be adapted, but with lower priority. During the transition newly added campaigns which are only available in the new database structure may not appear there.

=== For Technical Users, API Consumers, and Data Harvesters ===
If you rely on PANGAEA’s XML metadata, OAI-PMH, or other API services, please take note of the following critical technical details:

* '''Schema Validation:''' While we have prioritized backwards compatibility, the transition involves changing XML element names (campaign names get labels). During this migration, our XML schema will not validate because records will contain a mix of old, redundant elements and new elements. We will separately announce when the XML files exposed by our APIs no longer contain redundant compatibility elements.
* '''Harvesting Recommendations:''' If you are harvesting PANGAEA’s own metadata schema, we recommend '''temporarily pausing the retrieval of updates''' during this transition phase to avoid processing inconsistent records.
* '''Code and Script Adaptation:''' API users should prepare their scripts to interpret the new, more flexible hierarchical structure. Ensure that your parsers are resilient and do not fail when encountering new or unknown XML elements. Please adapt to new element names as soon as possible, as the old, redundant elements will be removed.
* '''OAI-PMH Consumers:''' Please verify that your harvesting pipelines can handle these schema shifts without breaking, particularly during the period when element names are in a state of flux.
* '''pangaeapy / pangaear Users:''' Those libraries should work without problems to search and download data. We will later adapt '''pangaeapy''' to better reflect the new event campaign structure. '''pangaear''' is a third party product; to our knowledge it is not affected by the event-campaign merge.

These changes represent a major step forward in making PANGAEA more versatile and future-proof.

Event-Campaign-Merge

2026-04-28T11:23:28Z

Uschindler: Prepare Event Campaign Merge

Data Access and Reuse

2026-04-22T18:39:41Z

Uschindler: /* The PANGAEA Data Warehouse */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods

The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")

=== Loading a Dataset ===
library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata

=== Searching PANGAEA from R ===
res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via SOAP API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:31:20Z

Uschindler: fix code formatting

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods

The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")

=== Loading a Dataset ===
library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata

=== Searching PANGAEA from R ===
res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:28:27Z

Uschindler: /* Access to Restricted Datasets */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>

The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:28:16Z

Uschindler: /* Searching PANGAEA from Python */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>

The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:28:06Z

Uschindler: /* Loading a Dataset */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>

The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:27:33Z

Uschindler: /* Installation */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
pip install pangaeapy

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:27:18Z

Uschindler: /* OAI-PMH Metadata Harvesting */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
https://ws.pangaea.de/oai/
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:27:01Z

Uschindler: /* Access to Restricted Datasets */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:26:41Z

Uschindler: /* The Landing Page as Access Hub */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
https://doi.pangaea.de/10.1594/PANGAEA.841672
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:26:26Z

Uschindler: /* Discovering Available Representations */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
curl -LI https://doi.org/10.1594/PANGAEA.841672
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:26:03Z

Uschindler: /* Retrieving Metadata in a Specific Format */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:25:46Z

Uschindler: /* Downloading the Data File */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'
Equivalently, using content negotiation directly against the DOI:
curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
<curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672</code>
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:24:46Z

Uschindler: /* Downloading the Data File */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
<code>curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'</code>
Equivalently, using content negotiation directly against the DOI:
<curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename (<code>-O</code>, <code>-J</code>), follow redirects (<code>-L</code>, to resolve the DOI), and fail with appropriate exit code (<code>-f</code>) on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
<curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672</code>
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:23:37Z

Uschindler: /* Downloading the Data File */

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
<code>curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'</code>
Equivalently, using content negotiation directly against the DOI:
<curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename ("-O", "-J"), follow redirects ("-L", to resolve the DOI), and fail with appropriate exit code ("-f") on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
<curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672</code>
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T18:22:56Z

Uschindler: /* Downloading the Data File */ add missing flag "f" and describe what each flag means.

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
<code>curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'</code>
Equivalently, using content negotiation directly against the DOI:
<curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>
The <code>-OJLf</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename ("O", "J"), follow redirects (to resolve the DOI, "L"), and fail with appropriate exit code ("f") on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
<curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672</code>
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Data Access and Reuse

2026-04-22T14:56:41Z

Uschindler: /* Access to Restricted Datasets */ token

This article describes the different methods available for discovering, accessing, and reusing data published in PANGAEA. It is intended for researchers and data practitioners who want to interact with PANGAEA data beyond the standard web interface, including scripted and automated workflows. The methods range from interactive web-based search to fully programmatic access via HTTP content negotiation and dedicated client libraries.

Hands-on training materials covering many of the topics below are available in the [[PANGAEA Community Workshops|PANGAEA Community Workshop Series]], including Jupyter notebooks, R scripts, and slide decks in the [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA community workshop GitHub repository].

== Overview ==
Every dataset published in PANGAEA is assigned a globally unique and persistent Digital Object Identifier (DOI). The DOI is the single entry point for all forms of programmatic access: it resolves to a dataset landing page that, in addition to its human-readable representation, exposes all available metadata formats and data download options through standard HTTP mechanisms. There is no separate PANGAEA data API — all data and metadata access is built on top of the DOI and its landing page using standard web protocols. This design makes PANGAEA data directly accessible without any vendor-specific API key or proprietary client, while remaining fully compatible with FAIR principles.

== Web-Based Search and Discovery ==

=== PANGAEA Search Interface ===
The primary discovery tool for PANGAEA data is the PANGAEA search engine at https://www.pangaea.de/, based on Elasticsearch. It supports full-text search and faceted filtering across all published metadata. Faceted navigation allows users to constrain results by topic, device type, geographic region, and temporal coverage. Documentation on search functionality and syntax is available in the Wiki at [[PANGAEA search]].

=== External Portals and Registries ===
PANGAEA metadata is harvested by a large number of disciplinary and generic portals via OAI-PMH, making datasets discoverable beyond the PANGAEA website through services such as Google Dataset Search, OpenAIRE, DataCite Commons, DataONE, GBIF, EMODnet, GFBio, and others. PANGAEA is registered in re3data.org, FAIRsharing.org, RIsources, and the EOSC Marketplace.

An alternative for search when a PANGAEA-specific search interface is not required is the '''DataCite Search API''' (https://support.datacite.org/docs/api), which searches across a large inventory of research data from many repositories. It returns summary metadata including DOI names, and subsequent data access from any PANGAEA result can then proceed via content negotiation as described below.

== DOI Landing Pages and Link Discovery ==

=== The Landing Page as Access Hub ===
Each PANGAEA dataset is represented by a landing page accessible at its DOI. For example:
<code>https://doi.pangaea.de/10.1594/PANGAEA.841672</code>
The landing page serves a dual function: it presents dataset metadata and a data preview in human-readable form, and it simultaneously exposes all available machine-readable representations of the same resource through standard HTTP link headers and HTML <code><link></code> elements. This implementation follows the '''Signposting standard''' (https://signposting.org/), which allows any HTTP client to discover all alternate representations of a dataset — metadata in various formats, the data file itself, and the ORCID iDs of the authors — without any prior knowledge of PANGAEA-specific URL patterns.

=== Discovering Available Representations ===
A simple HTTP HEAD request to the landing page returns the full set of typed link relations in the response header. The following example illustrates this using <code>curl</code>:
<code>curl -LI https://doi.org/10.1594/PANGAEA.841672</code>
The response <code>Link:</code> header will contain relations of the following types:

* '''<code>cite-as</code>''' — the canonical DOI citation URL
* '''<code>describedby</code>''' — links to metadata in various formats (ISO 19139, DataCite XML, PANGAEA XML, BibTeX, RIS, JSON-LD)
* '''<code>item</code>''' — links to the data itself (tab-delimited text, HTML view)
* '''<code>author</code>''' — ORCID iDs of the dataset authors

This mechanism allows scripts, harvesters, and other machine clients to discover and retrieve any representation of a dataset from its DOI alone.

== HTTP Content Negotiation ==
HTTP content negotiation allows a client to request a specific representation of a resource by specifying a MIME type in the <code>Accept</code> header. All PANGAEA dataset landing pages support this mechanism, so both data and metadata can be downloaded programmatically by querying the DOI directly with the appropriate content type — no PANGAEA-specific URL construction is required.

=== Downloading the Data File ===
The tabular data file for a dataset can be downloaded as tab-delimited text:
<code>curl -OJLf 'https://doi.pangaea.de/10.1594/PANGAEA.841672?format=textfile'</code>
Equivalently, using content negotiation directly against the DOI:
<curl -OJLf -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>
The <code>-OJL</code> flags instruct curl to save the file using the server-provided (i.e. meaningful) filename, follow redirects (to resolve the DOI), and fail with appropriate exit code on errors (easier to handle in scripts than the respective HTML errors).

=== Retrieving Metadata in a Specific Format ===
The same mechanism applies to metadata. To retrieve a dataset's metadata in ISO 19139/19115 format:
<curl -L -H 'Accept: application/vnd.iso19139.metadata+xml' https://doi.org/10.1594/PANGAEA.841672</code>
The following MIME types are currently supported for metadata retrieval via content negotiation:
{| class="wikitable"
!Format
!MIME type
|-
|PANGAEA internal XML
|<code>application/vnd.pangaea.metadata+xml</code>
|-
|DataCite XML (v4)
|<code>application/vnd.datacite.datacite+xml</code>
|-
|ISO 19139 / 19115
|<code>application/vnd.iso19139.metadata+xml</code>
|-
|NASA DIF
|<code>application/vnd.nasa.dif-metadata+xml</code>
|-
|JSON-LD (Schema.org)
|<code>application/ld+json</code>
|-
|BibTeX
|<code>application/x-bibtex</code>
|-
|RIS
|<code>application/x-research-info-systems</code>
|-
|Plain text citation
|<code>text/x-bibliography</code>
|-
|Tab-separated data
|<code>text/tab-separated-values</code>
|}
Alternatively, the same formats can be requested using explicit URL parameters:
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_datacite4
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_iso19139
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=metadata_jsonld
https://doi.pangaea.de/10.1594/PANGAEA.841672?format=citation_bibtex

== Access to Restricted Datasets ==
A small fraction of PANGAEA datasets are under an active moratorium — access to the data is temporarily restricted while the metadata remains publicly visible. Access to your personal moratorium datasets requires authentication using a '''bearer token''', which is the standard mechanism for API authorization (RFC 6750). Note that access to protected datasets of other authors is, of course, still not possible.

To obtain your bearer token, log in to PANGAEA. Login is supported both with a PANGAEA username and password, and via ORCID iD. After logging in, the [https://pangaea.de/user/ user profile page] displays the current session token under "Your temporary login token".

'''Please note:''' The bearer token is private and should be treated like a password — it must not be shared with others or included in publicly accessible scripts or repositories. If a token has been accidentally disclosed, the user profile page at https://pangaea.de/user/ provides a "Log out from all devices" option, which immediately invalidates all active tokens. When using content negotiation via the DOI resolver at https://doi.org/, it is safe to include the Authorization header: PANGAEA trusts the DOI Foundation's infrastructure, and the request is redirected to PANGAEA's own servers before any protected content is served.

The token can be passed as an <code>Authorization: Bearer</code> header in any HTTP request:
curl -OJLf -H 'Authorization: Bearer <your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672
If a request is made without a token to a restricted dataset, the server responds with <code>401 Bearer Realm</code>, signaling that authentication is required. In recent versions, curl can provide the token with an explicit flag (<code>--oauth2-bearer</code>). The respective command example would then be:
<code>curl -OJLf --oauth2-bearer '<your-token>' -H 'Accept: text/tab-separated-values' https://doi.org/10.1594/PANGAEA.841672</code>

== OAI-PMH Metadata Harvesting ==
For bulk metadata harvesting, PANGAEA provides an OAI-PMH endpoint at:
<code>https://ws.pangaea.de/oai/</code>
OAI-PMH allows systematic, incremental harvesting of all PANGAEA metadata in supported formats. The following metadata standards are available via OAI-PMH: Dublin Core, DataCite v3 and v4, ISO 19139, and DIF (Directory Interchange Format). This endpoint is used by the portals and registries listed in the discovery section above, and is equally available to any user who wishes to build their own index or integrate PANGAEA metadata into an institutional system.

== Programmatic Access with Python: pangaeapy ==
'''pangaeapy''' is the official Python client library for PANGAEA, developed and maintained by the PANGAEA team. It provides a high-level interface for loading and analyzing PANGAEA datasets directly into native Python data structures, without requiring manual HTTP requests or format parsing.

* PyPI package: https://pypi.org/project/pangaeapy/
* Source code: https://github.com/pangaea-data-publisher/pangaeapy
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Introduction_to_pangaeapy.pdf

=== Installation ===
<code>pip install pangaeapy</code>

=== Loading a Dataset ===
The central object in pangaeapy is <code>PanDataSet</code>, which takes a DOI or PANGAEA dataset ID and retrieves both the data and the associated metadata:
<code>from pangaeapy import PanDataSet

ds = PanDataSet('10.1594/PANGAEA.841672')

# Access the data as a pandas DataFrame
print(ds.data.head())

# Access dataset metadata
print(ds.title)
print(ds.authors)
print(ds.parameters) # list of measured parameters with units and methods</code>
The <code>data</code> attribute is a pandas DataFrame in which each column corresponds to a measured parameter, and each row to a measurement. Parameter metadata — including units, standard names, and method descriptions — is accessible through the <code>parameters</code> attribute.

=== Searching PANGAEA from Python ===
pangaeapy also provides a <code>PanQuery</code> class for querying the PANGAEA search engine programmatically. Note that the underlying search API used by pangaeapy is currently internal and undocumented; an official, publicly documented Search REST API is under development. Until that is available, pangaeapy offers the most straightforward path to programmatic search for Python users:
<code>from pangaeapy import PanQuery

q = PanQuery('temperature salinity Atlantic', limit=10)
for result in q.results:
print(result['doi'], result['title'])</code>

=== Access to Restricted Datasets ===
A bearer token obtained from the PANGAEA user profile can be passed to <code>PanDataSet</code> to enable access to your datasets that are subject to a moratorium (not of other authors!):
<code>ds = PanDataSet('10.1594/PANGAEA.841672', token='<your-bearer-token>')</code>

=== Further Training Materials ===
Jupyter notebooks with worked examples for data discovery, loading, and analysis with pangaeapy are available in the PANGAEA Community Workshop GitHub repository at https://github.com/pangaea-data-publisher/community-workshop-material (see the <code>Python/</code> directory).

== Programmatic Access with R: pangaear ==
'''pangaear''' is a community-developed R client for PANGAEA, maintained as part of the rOpenSci ecosystem. It provides equivalent functionality to pangaeapy for R users.

* CRAN: https://cran.r-project.org/package=pangaear
* GitHub: https://github.com/ropensci/pangaear
* Introduction slides: https://github.com/pangaea-data-publisher/community-workshop-material/blob/master/Intro_PangaeaR.pdf

=== Installation ===
<code>install.packages("pangaear")
# or the development version:
# devtools::install_github("ropensci/pangaear")</code>

=== Loading a Dataset ===
<code>library(pangaear)

# Download and load a dataset by DOI
ds <- pg_data(doi = '10.1594/PANGAEA.841672')

# Access the data as a data frame
head(ds[[1]]$data)

# Metadata is accessible from the list object
ds[[1]]$metadata</code>

=== Searching PANGAEA from R ===
<code>res <- pg_search(query = 'temperature salinity Atlantic', count = 10)
print(res$doi)</code>
R scripts and worked examples are available in the <code>R/</code> directory of the PANGAEA Community Workshop GitHub repository.

== The PANGAEA Data Warehouse ==
The PANGAEA data warehouse (based on Clickhouse) provides a powerful complementary access path for users who need to compile and aggregate data across large numbers of datasets — potentially across hundreds of thousands of individual publications — without having to download and merge files manually.

The data warehouse supports spatially and chronologically constrained queries at the parameter level, returning data together with the DOI of each contributing dataset to maintain full provenance traceability. It is accessible in two ways:

'''Through the PANGAEA website:''' The data warehouse interface is integrated into the search results page. After performing a search, users can select "Data Warehouse" to configure and download a parameter-level aggregation from all datasets in the result set.

Documentation: https://wiki.pangaea.de/wiki/Data_warehouse

'''Programmatically via REST API:''' The data warehouse is also accessible through the PANGAEA web services at https://ws.pangaea.de/. This allows automated, scripted aggregations and is supported by both pangaeapy and pangaear.

Note that data warehouse exports represent compiled data products and do not replace the need to consult individual dataset landing pages to assess the fitness of individual studies for a specific scientific application. Every export includes the DOI name for each data point, ensuring that citation and provenance are preserved.

== Summary of Access Methods ==
{| class="wikitable"
!Use case
!Method
!Entry point
|-
|Interactive discovery
|PANGAEA search
|https://www.pangaea.de/
|-
|Single dataset access
|Browser / landing page
|https://doi.pangaea.de/10.1594/PANGAEA.xxxxx
|-
|Programmatic data download
|HTTP content negotiation
|<code>curl -H 'Accept: text/tab-separated-values' https://doi.org/...</code>
|-
|Programmatic metadata download
|HTTP content negotiation
|<code>curl -H 'Accept: application/ld+json' https://doi.org/...</code>
|-
|Link and format discovery
|HTTP HEAD / Signposting
|<code>curl -I https://doi.pangaea.de/...</code>
|-
|Restricted dataset access (to your own publications)
|Bearer token authentication
|retrievable via https://pangaea.de/user/
|-
|Bulk metadata harvesting
|OAI-PMH
|https://ws.pangaea.de/oai/
|-
|Scripted data access (Python)
|pangaeapy
|https://pypi.org/project/pangaeapy/
|-
|Scripted data access (R)
|pangaear
|https://cran.r-project.org/package=pangaear
|-
|Cross-dataset aggregation
|Data warehouse
|https://wiki.pangaea.de/wiki/Data_warehouse
|-
|Discovery across repositories
|DataCite Search API
|https://support.datacite.org/docs/api
|}
== Further Resources ==

* [[PANGAEA search]] — documentation on search syntax and faceted filters
* [[Data warehouse]] — data warehouse documentation
* [[PANGAEA Community Workshops]] — hands-on training workshops on finding and using PANGAEA data
* [https://github.com/pangaea-data-publisher/community-workshop-material PANGAEA Community Workshop GitHub] — Jupyter notebooks, R scripts, and slide decks
* [https://pypi.org/project/pangaeapy/ pangaeapy on PyPI] — Python client
* [https://cran.r-project.org/package=pangaear pangaear on CRAN] — R client
* [https://ws.pangaea.de/oai/ PANGAEA web services] — REST and OAI-PMH endpoints
* [https://signposting.org/ Signposting standard] — link discovery for FAIR data

Preservation Plan

2026-03-31T14:06:15Z

Uschindler: /* Metadata Preservation */ add ROR for crossref funder id

PANGAEA is committed to the long-term preservation of all data and metadata entrusted to it by the research community. This commitment extends beyond bit-level storage integrity to encompass active management of format usability, semantic consistency over time, and formally documented procedures for all stages of the archival lifecycle. The following article describes the technical and organizational measures that together constitute PANGAEA's preservation strategy. It supplements the information provided in Felden et al. (2023) and is one of the reference documents for PANGAEA's CoreTrustSeal certification.
== Principles ==
PANGAEA's preservation approach is grounded in three guiding principles. First, data are not merely stored as files but are ingested into a structured, normalized relational database that preserves the full semantic context of each measurement — ensuring that data remain interpretable independently of any external documentation. Second, preservation is an active process: PANGAEA monitors the long-term usability of archived formats and takes preventive action against obsolescence, including the creation of format-migrated copies when required. Third, institutional commitment is formally secured: the AWI/MARUM cooperation agreement (AMAR) guarantees that all archived data and metadata will remain accessible for a minimum of ten years following any formal decommissioning of PANGAEA, and that the host institutions will maintain the necessary infrastructure and expertise to honor this commitment.

PANGAEA's ingest and archiving workflow is compliant with the Open Archival Information System (OAIS) standard (ISO 14721).
== Metadata Preservation ==
PANGAEA treats metadata as essential for the long-term reusability of data. Metadata are stored in a highly normalized PostgreSQL relational database, whose schema is modeled to be compatible with international standards including schema.org and ISO 19115. This normalized structure allows dataset representations to be compiled dynamically and serialized into a wide range of output formats on demand, without modifying the underlying archived records.

The following metadata categories are collected and preserved for every published dataset:

'''Citation metadata:''' author and contributor names with ORCID iDs; institutional affiliations with ROR identifiers; dataset title; publication year; publisher; DOI name; [resource type] according to Stall et al., 2023

'''Funding information:''' project names, grant numbers, and funder identifiers (Crossref Funder IDs or ROR identifiers).

'''Event information:''' detailed spatial and temporal coverage of sampling or measurement events, including methods, devices, and campaign context.

'''Related documentation:''' links (using DOIs or other persistent identifiers) to related scientific articles, reports, and supplementary materials. Where related documentation is not held in an external repository with a persistent identifier, PANGAEA stores a local copy in PDF/A format. PDF/A is preferred for its long-term stability; copies will be migrated to successor standards if PDF/A itself becomes obsolete.

PANGAEA's database schema is continuously adapted to accommodate new and evolving metadata standards. When schema extensions are introduced, the metadata of existing datasets are reviewed and updated accordingly. All such changes are managed carefully to avoid incompatible modifications to existing records.
== Data Object Preservation ==

=== Tabular Data ===
Except for binary objects, all submitted data values are imported into the PANGAEA relational database as structured data series. Each data entry carries metadata about its type (numeric, date/time, string), the responsible scientist (PI), the methodology applied, and, for numerical values, format information including significant digits. This structured representation decouples the data from any particular file format, ensuring long-term interpretability regardless of changes in software environments.

At the time of archival, a copy of each dataset (with checksum and timestamp) is additionally marshaled to disk as a tab-delimited text file. These copies serve as reference files for integrity verification, and the tab-delimited format ensures readability without specialized software.

=== Binary Data ===
Not all data held in PANGAEA is available in tabular form. Some datasets are archived in compact, community-specific binary formats — including NetCDF files, images, video recordings, and geophysical data products. For these, long-term usability is an active responsibility: the PANGAEA team monitors software dependencies, version changes, and backward compatibility issues for all archived binary formats. Where continued readability requires it, new format-migrated copies are created and archived; the original submitted file is always retained alongside any migrated copy.

PANGAEA applies format rules before accepting binary data for archival. Where possible, uncompressed or widely supported open formats are preferred. Currently accepted formats are:

* '''Images:''' JPEG, PNG, TIFF
* '''Documents:''' PDF/A (preferred), ODF, OOXML
* '''Media containers:''' MP4, MPG, OGG, Matroska; audio and video content within containers must comply with the following codecs:
** ''Video:'' uncompressed, MPEG-1, MPEG-4 Part 2, AVC/H.264, H.265
** ''Audio:'' uncompressed, MPEG Layer III (MP3), MPEG-4 Part 3/AAC
* '''Scientific data:''' NetCDF, preferably using the Climate and Forecast (CF) Metadata Conventions; detailed documentation is required in all other cases

This list is not exhaustive and is updated as community standards evolve. If any accepted format becomes deprecated or is superseded, PANGAEA will migrate archived copies to modern equivalents while retaining the originals.

Raw data — defined as level-0 data without any accompanying metadata — are not accepted for archival. Data at processing level 1 (raw data with a minimum set of metadata) may be accepted if adequate contextual information is provided. No guarantees are given for the long-term usability of level-0 or level-1 datasets. Processing levels are documented here: [[Processing levels]].
== Storage and Physical Infrastructure ==
PANGAEA's storage infrastructure is operated by the computing center of the Alfred Wegener Institute (AWI) in Bremerhaven, in accordance with the AMAR cooperation agreement. A comprehensive set of technical and organizational measures (TOM) is in place:

'''Redundant storage:''' all data are stored using erasure coding across disk and tape, with write caches battery-backed to ensure integrity at the point of write. Data on disk is replicated to tape nightly and saved to snapshots retained for six months. Tape copies are replicated to a physically separate building within two hours of creation; decommissioned tapes are retained for one year before reuse. Virtual machine working data is captured in nightly machine snapshots.

'''Tape archive:''' the central archival storage system consists of two SpectraLogic TFinity ExaScale robotic tape libraries, housed in separate buildings at AWI, with a combined capacity of up to 60 PB and using high-capacity LTO-tape drives.

'''Database integrity:''' PostgreSQL streaming replication to a dedicated backup system enables point-in-time recovery to any moment prior to a failure event.

'''Facility measures:''' fire and smoke detection systems; server room monitoring of temperature and humidity; server room air-conditioning; uninterruptible power supplies (UPS) capable of sustaining all PANGAEA-relevant hardware for up to 60 minutes, backed by a diesel-powered emergency generator providing a further 23 hours of operation; RAID and hard disk mirroring in the virtualization environment; user permission management; network firewall and intrusion detection systems; anti-virus email filtering.

'''Documentation:''' all systems are documented in an internal Confluence Wiki kept operationally isolated from the main PANGAEA infrastructure for disaster-safety purposes. A ticket system is used to track and manage incidents.

Hardware is typically renewed every three to four years through the AWI computing center's lifecycle management program, implemented transparently via virtualization.
== Off-Site Replica at MARUM ==
Since 2025, PANGAEA operates an off-site replica of the relational database and web frontend at MARUM/University of Bremen, hosted in the Green IT Housing Center (Rechenzentrum) of Bremen University. This facility provides geographic and institutional separation from the primary AWI infrastructure and is a key component of PANGAEA's resilience strategy against both technical failure and cyberattack scenarios.

The off-site installation currently covers:

* All dataset metadata, including individual DOI landing pages and all metadata serializations available via harvesting endpoints (OAI-PMH, schema.org, DataCite, Dublin Core, DIF, ISO 19139)
* Full representations of tabular data publications

Extension of the replica to include binary data files is planned as part of the ongoing development of this facility. The replica enables recovery of data delivery services within 24 hours following a catastrophic failure at AWI.

The MARUM Green IT Housing Center maintains the following physical and technical safeguards: two separate fire sections with servers distributed across both for site redundancy; 24/7 on-site monitoring by Bremen University staff; automated fire alarms with a fire brigade station less than 1 km away; redundant power supply with battery backup; and multilevel physical access control. The off-site installation operates in a fully isolated network environment (Layer 2 separation), accessible only via a firewalled VPN gateway at AWI. The number of access tokens and keys is strictly limited to the corresponding gateway host and PANGAEA DevOps staff. Replication is unidirectional from AWI to MARUM, using snapshot-based transfers executed multiple times per day.
== Versioning and Persistent Identifiers ==
Every published dataset is assigned a universally unique Digital Object Identifier (DOI) minted at DataCite. DOI resolution is actively maintained: PANGAEA keeps its authoritative metadata records at DataCite synchronized with dataset landing pages, and all external links in metadata records are checked automatically on a weekly basis for broken (HTTP 404) or permanently redirected (HTTP 301) responses.

New DOI names are created under clearly defined conditions:

* A new identifier is issued upon the initial publication of each dataset.
* A new identifier is issued when a published dataset undergoes a substantive revision of data or metadata that would affect reproducibility or scientific interpretation. The prior version remains accessible and is cross-referenced in the metadata record of the new version.
* Minor editorial corrections that do not affect scientific content are applied without creating a new identifier. Instead, the corrections are transparently documented in the “Change history” section of the dataset landing pages, including the date and a short summary of the changes applied.

All versions are linked in the metadata record, ensuring full traceability of the publication history for data users and citing authors.
== Deletion and Tombstone Records ==
PANGAEA does not routinely delete or remove published datasets. In exceptional cases where retraction is required — for example, due to demonstrated scientific error, misconduct, copyright infringement, or data privacy obligations under applicable law (e.g., GDPR Art. 17) — the following procedure applies: the data itself is made inaccessible, but the DOI and the dataset landing page are retained as a tombstone record. The tombstone record clearly indicates the dataset's status and the documented reason for its withdrawal, in accordance with DataCite's tombstone policy. All such actions are logged in the editorial system's change history.
== Custody Transfer and Decommissioning ==
Should PANGAEA cease operations, the host institutions guarantee that all data and metadata will remain accessible for a minimum of ten years following any formal decommissioning. In such a scenario, only the submission and editorial system would be terminated; the database and data delivery services would remain operational. A full transition of custody to another repository could be supported by the off-site replica at MARUM as a concrete technical starting point. As a further fallback, PANGAEA can be reduced to a file-based repository: a complete file-based copy of all datasets, including binary objects, can be assembled and made available independently by AWI and/or the University of Bremen. The legal and institutional basis for these guarantees is the AWI/MARUM cooperation agreement (AMAR); a summary of its key commitments is available in the [[Continuity Plan]].
== Community-Specific Preservation Documentation ==
In coordination with scientific communities, PANGAEA has developed detailed documentation on the harmonization and preservation of specific data types. These include guidance on CTD data, Thermosalinograph (TSG) underway data, and bathymetric data, among others. Where applicable, these documents contain information on format choices and long-term preservation handling specific to the relevant data type. See [[Best practice manuals and templates]] for the full collection.
== References ==

* Felden, J., Möller, L., Schindler, U., Huber, R., Schumacher, S., Koppe, R., Diepenbroek, M. & Glöckner, F.O. (2023). PANGAEA — Data Publisher for Earth & Environmental Science. ''Scientific Data'', 10, 347. https://doi.org/10.1038/s41597-023-02269-x
* Stall, S., Bilder, G., Cannon, M. ''et al.'' Journal Production Guidance for Software and Data Citations. ''Sci Data'' '''10''', 656 (2023). https://doi.org/10.1038/s41597-023-02491-7
* Consultative Committee for Space Data Systems (2012). Reference Model for an Open Archival Information System (OAIS). Recommended Practice CCSDS 650.0-M-2. https://public.ccsds.org/Pubs/650x0m2.pdf

== See Also ==

* [[Technology]]
* [[Continuity Plan]] — https://www.pangaea.de/about/continuity.php
* [[Authors Guides]]
* [[Format]]
* [[Processing levels]]
* [[Curation levels]]
* [[Best practice manuals and templates]]
* [[PANGAEA XML schema]]

Preservation Plan

2026-03-31T14:02:35Z

Uschindler: /* References */ add links

PANGAEA is committed to the long-term preservation of all data and metadata entrusted to it by the research community. This commitment extends beyond bit-level storage integrity to encompass active management of format usability, semantic consistency over time, and formally documented procedures for all stages of the archival lifecycle. The following article describes the technical and organizational measures that together constitute PANGAEA's preservation strategy. It supplements the information provided in Felden et al. (2023) and is one of the reference documents for PANGAEA's CoreTrustSeal certification.
== Principles ==
PANGAEA's preservation approach is grounded in three guiding principles. First, data are not merely stored as files but are ingested into a structured, normalized relational database that preserves the full semantic context of each measurement — ensuring that data remain interpretable independently of any external documentation. Second, preservation is an active process: PANGAEA monitors the long-term usability of archived formats and takes preventive action against obsolescence, including the creation of format-migrated copies when required. Third, institutional commitment is formally secured: the AWI/MARUM cooperation agreement (AMAR) guarantees that all archived data and metadata will remain accessible for a minimum of ten years following any formal decommissioning of PANGAEA, and that the host institutions will maintain the necessary infrastructure and expertise to honor this commitment.

PANGAEA's ingest and archiving workflow is compliant with the Open Archival Information System (OAIS) standard (ISO 14721).
== Metadata Preservation ==
PANGAEA treats metadata as essential for the long-term reusability of data. Metadata are stored in a highly normalized PostgreSQL relational database, whose schema is modeled to be compatible with international standards including schema.org and ISO 19115. This normalized structure allows dataset representations to be compiled dynamically and serialized into a wide range of output formats on demand, without modifying the underlying archived records.

The following metadata categories are collected and preserved for every published dataset:

'''Citation metadata:''' author and contributor names with ORCID iDs; institutional affiliations with ROR identifiers; dataset title; publication year; publisher; DOI name; [resource type] according to Stall et al., 2023

'''Funding information:''' project names, grant numbers, and funder identifiers (Crossref Funder IDs).

'''Event information:''' detailed spatial and temporal coverage of sampling or measurement events, including methods, devices, and campaign context.

'''Related documentation:''' links (using DOIs or other persistent identifiers) to related scientific articles, reports, and supplementary materials. Where related documentation is not held in an external repository with a persistent identifier, PANGAEA stores a local copy in PDF/A format. PDF/A is preferred for its long-term stability; copies will be migrated to successor standards if PDF/A itself becomes obsolete.

PANGAEA's database schema is continuously adapted to accommodate new and evolving metadata standards. When schema extensions are introduced, the metadata of existing datasets are reviewed and updated accordingly. All such changes are managed carefully to avoid incompatible modifications to existing records.
== Data Object Preservation ==

=== Tabular Data ===
Except for binary objects, all submitted data values are imported into the PANGAEA relational database as structured data series. Each data entry carries metadata about its type (numeric, date/time, string), the responsible scientist (PI), the methodology applied, and, for numerical values, format information including significant digits. This structured representation decouples the data from any particular file format, ensuring long-term interpretability regardless of changes in software environments.

At the time of archival, a copy of each dataset (with checksum and timestamp) is additionally marshaled to disk as a tab-delimited text file. These copies serve as reference files for integrity verification, and the tab-delimited format ensures readability without specialized software.

=== Binary Data ===
Not all data held in PANGAEA is available in tabular form. Some datasets are archived in compact, community-specific binary formats — including NetCDF files, images, video recordings, and geophysical data products. For these, long-term usability is an active responsibility: the PANGAEA team monitors software dependencies, version changes, and backward compatibility issues for all archived binary formats. Where continued readability requires it, new format-migrated copies are created and archived; the original submitted file is always retained alongside any migrated copy.

PANGAEA applies format rules before accepting binary data for archival. Where possible, uncompressed or widely supported open formats are preferred. Currently accepted formats are:

* '''Images:''' JPEG, PNG, TIFF
* '''Documents:''' PDF/A (preferred), ODF, OOXML
* '''Media containers:''' MP4, MPG, OGG, Matroska; audio and video content within containers must comply with the following codecs:
** ''Video:'' uncompressed, MPEG-1, MPEG-4 Part 2, AVC/H.264, H.265
** ''Audio:'' uncompressed, MPEG Layer III (MP3), MPEG-4 Part 3/AAC
* '''Scientific data:''' NetCDF, preferably using the Climate and Forecast (CF) Metadata Conventions; detailed documentation is required in all other cases

This list is not exhaustive and is updated as community standards evolve. If any accepted format becomes deprecated or is superseded, PANGAEA will migrate archived copies to modern equivalents while retaining the originals.

Raw data — defined as level-0 data without any accompanying metadata — are not accepted for archival. Data at processing level 1 (raw data with a minimum set of metadata) may be accepted if adequate contextual information is provided. No guarantees are given for the long-term usability of level-0 or level-1 datasets. Processing levels are documented here: [[Processing levels]].
== Storage and Physical Infrastructure ==
PANGAEA's storage infrastructure is operated by the computing center of the Alfred Wegener Institute (AWI) in Bremerhaven, in accordance with the AMAR cooperation agreement. A comprehensive set of technical and organizational measures (TOM) is in place:

'''Redundant storage:''' all data are stored using erasure coding across disk and tape, with write caches battery-backed to ensure integrity at the point of write. Data on disk is replicated to tape nightly and saved to snapshots retained for six months. Tape copies are replicated to a physically separate building within two hours of creation; decommissioned tapes are retained for one year before reuse. Virtual machine working data is captured in nightly machine snapshots.

'''Tape archive:''' the central archival storage system consists of two SpectraLogic TFinity ExaScale robotic tape libraries, housed in separate buildings at AWI, with a combined capacity of up to 60 PB and using high-capacity LTO-tape drives.

'''Database integrity:''' PostgreSQL streaming replication to a dedicated backup system enables point-in-time recovery to any moment prior to a failure event.

'''Facility measures:''' fire and smoke detection systems; server room monitoring of temperature and humidity; server room air-conditioning; uninterruptible power supplies (UPS) capable of sustaining all PANGAEA-relevant hardware for up to 60 minutes, backed by a diesel-powered emergency generator providing a further 23 hours of operation; RAID and hard disk mirroring in the virtualization environment; user permission management; network firewall and intrusion detection systems; anti-virus email filtering.

'''Documentation:''' all systems are documented in an internal Confluence Wiki kept operationally isolated from the main PANGAEA infrastructure for disaster-safety purposes. A ticket system is used to track and manage incidents.

Hardware is typically renewed every three to four years through the AWI computing center's lifecycle management program, implemented transparently via virtualization.
== Off-Site Replica at MARUM ==
Since 2025, PANGAEA operates an off-site replica of the relational database and web frontend at MARUM/University of Bremen, hosted in the Green IT Housing Center (Rechenzentrum) of Bremen University. This facility provides geographic and institutional separation from the primary AWI infrastructure and is a key component of PANGAEA's resilience strategy against both technical failure and cyberattack scenarios.

The off-site installation currently covers:

* All dataset metadata, including individual DOI landing pages and all metadata serializations available via harvesting endpoints (OAI-PMH, schema.org, DataCite, Dublin Core, DIF, ISO 19139)
* Full representations of tabular data publications

Extension of the replica to include binary data files is planned as part of the ongoing development of this facility. The replica enables recovery of data delivery services within 24 hours following a catastrophic failure at AWI.

The MARUM Green IT Housing Center maintains the following physical and technical safeguards: two separate fire sections with servers distributed across both for site redundancy; 24/7 on-site monitoring by Bremen University staff; automated fire alarms with a fire brigade station less than 1 km away; redundant power supply with battery backup; and multilevel physical access control. The off-site installation operates in a fully isolated network environment (Layer 2 separation), accessible only via a firewalled VPN gateway at AWI. The number of access tokens and keys is strictly limited to the corresponding gateway host and PANGAEA DevOps staff. Replication is unidirectional from AWI to MARUM, using snapshot-based transfers executed multiple times per day.
== Versioning and Persistent Identifiers ==
Every published dataset is assigned a universally unique Digital Object Identifier (DOI) minted at DataCite. DOI resolution is actively maintained: PANGAEA keeps its authoritative metadata records at DataCite synchronized with dataset landing pages, and all external links in metadata records are checked automatically on a weekly basis for broken (HTTP 404) or permanently redirected (HTTP 301) responses.

New DOI names are created under clearly defined conditions:

* A new identifier is issued upon the initial publication of each dataset.
* A new identifier is issued when a published dataset undergoes a substantive revision of data or metadata that would affect reproducibility or scientific interpretation. The prior version remains accessible and is cross-referenced in the metadata record of the new version.
* Minor editorial corrections that do not affect scientific content are applied without creating a new identifier. Instead, the corrections are transparently documented in the “Change history” section of the dataset landing pages, including the date and a short summary of the changes applied.

All versions are linked in the metadata record, ensuring full traceability of the publication history for data users and citing authors.
== Deletion and Tombstone Records ==
PANGAEA does not routinely delete or remove published datasets. In exceptional cases where retraction is required — for example, due to demonstrated scientific error, misconduct, copyright infringement, or data privacy obligations under applicable law (e.g., GDPR Art. 17) — the following procedure applies: the data itself is made inaccessible, but the DOI and the dataset landing page are retained as a tombstone record. The tombstone record clearly indicates the dataset's status and the documented reason for its withdrawal, in accordance with DataCite's tombstone policy. All such actions are logged in the editorial system's change history.
== Custody Transfer and Decommissioning ==
Should PANGAEA cease operations, the host institutions guarantee that all data and metadata will remain accessible for a minimum of ten years following any formal decommissioning. In such a scenario, only the submission and editorial system would be terminated; the database and data delivery services would remain operational. A full transition of custody to another repository could be supported by the off-site replica at MARUM as a concrete technical starting point. As a further fallback, PANGAEA can be reduced to a file-based repository: a complete file-based copy of all datasets, including binary objects, can be assembled and made available independently by AWI and/or the University of Bremen. The legal and institutional basis for these guarantees is the AWI/MARUM cooperation agreement (AMAR); a summary of its key commitments is available in the [[Continuity Plan]].
== Community-Specific Preservation Documentation ==
In coordination with scientific communities, PANGAEA has developed detailed documentation on the harmonization and preservation of specific data types. These include guidance on CTD data, Thermosalinograph (TSG) underway data, and bathymetric data, among others. Where applicable, these documents contain information on format choices and long-term preservation handling specific to the relevant data type. See [[Best practice manuals and templates]] for the full collection.
== References ==

* Felden, J., Möller, L., Schindler, U., Huber, R., Schumacher, S., Koppe, R., Diepenbroek, M. & Glöckner, F.O. (2023). PANGAEA — Data Publisher for Earth & Environmental Science. ''Scientific Data'', 10, 347. https://doi.org/10.1038/s41597-023-02269-x
* Stall, S., Bilder, G., Cannon, M. ''et al.'' Journal Production Guidance for Software and Data Citations. ''Sci Data'' '''10''', 656 (2023). https://doi.org/10.1038/s41597-023-02491-7
* Consultative Committee for Space Data Systems (2012). Reference Model for an Open Archival Information System (OAIS). Recommended Practice CCSDS 650.0-M-2. https://public.ccsds.org/Pubs/650x0m2.pdf

== See Also ==

* [[Technology]]
* [[Continuity Plan]] — https://www.pangaea.de/about/continuity.php
* [[Authors Guides]]
* [[Format]]
* [[Processing levels]]
* [[Curation levels]]
* [[Best practice manuals and templates]]
* [[PANGAEA XML schema]]

Data Rescue

2025-05-21T15:35:45Z

Uschindler: formatting

== PANGAEA Data Rescue Initiative 2025 ==
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

=== 1. Stabilising Links to NOAA Scientific Documents and Images: ===
[https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents or images], we downloaded the documents/images and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

=== 2. Permanent Archiving of NOAA Data Products: ===
We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

'''This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us].'''

Data Rescue

2025-05-21T15:34:20Z

Uschindler: update formatting

== PANGAEA Data Rescue Initiative 2025 ==
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

=== 1. Stabilising Links to NOAA Scientific Documents and Images: ===
[https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents or images], we downloaded the documents/images and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

=== 2. Permanent Archiving of NOAA Data Products: ===
We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us].

Data Rescue

2025-05-21T10:12:21Z

Uschindler: missing dot

== PANGAEA Data Rescue Initiative 2025 ==
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

'''1. Stabilising Links to NOAA Scientific Documents:'''

* [https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents], we downloaded the documents and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

'''2.''' '''Permanent Archiving of NOAA Data Products:'''

* We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us].

Data Rescue

2025-05-21T10:09:44Z

Uschindler: formatting

== PANGAEA Data Rescue Initiative 2025 ==
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

'''1. Stabilising Links to NOAA Scientific Documents:'''

* [https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents], we downloaded the documents and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

'''2.''' '''Permanent Archiving of NOAA Data Products:'''

* We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us]

Data Rescue

2025-05-21T10:09:12Z

Uschindler: Remove image

== PANGAEA Data Rescue Initiative 2025 ==
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

'''1. Stabilising Links to NOAA Scientific Documents:'''

* [https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents], we downloaded the documents and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

'''2.''' '''Permanent Archiving of NOAA Data Products:'''

* We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us]

Data Rescue

2025-05-21T10:08:50Z

Uschindler: /* PANGAEA Data Rescue Initiative 2025 */ Update links

== PANGAEA Data Rescue Initiative 2025 ==
[[File:Z2.jpg|thumb|''Image (without logo) created with FLUX.1 on fal.ai'']]
Starting in 2025, the United States is facing unprecedented budget cuts to federal science agencies such as the National Science Foundation (NSF), the National Oceanic and Atmospheric Administration (NOAA), and the National Aeronautics and Space Administration (NASA). These cuts specifically target climate research, environmental monitoring, and public health data programs, with plans to significantly reduce funding for data services and websites.

In response, members of the scientific community—both within and outside the U.S.—approached '''PANGAEA''' to help preserve critical data products that were at immediate risk of being decommissioned. This includes the potential loss of data availability and the shutdown of data portals, which would make it much harder to locate and access existing datasets.

Consequently, PANGAEA has started data rescue efforts in agreement with the respective data providers by following the FAIR data principles. The following two approaches have been applied:

'''1. Stabilising Links to NOAA Scientific Documents:'''

* [https://www.pangaea.de/?q=%40USDataRescue2025 For existing PANGAEA datasets that reference NOAA-hosted documents], we downloaded the documents and replaced unstable direct NOAA links with permanent links provided through PANGAEA’s infrastructure. This ensures continued accessibility even if NOAA pages are taken offline.

'''2.''' '''Permanent Archiving of NOAA Data Products:'''

* We have directly archived various NOAA datasets that were scheduled to be decommissioned, ensuring they remain publicly accessible and [https://www.pangaea.de/?q=project%3AUSDataRescue2025 easy to find].

This initiative is ongoing. PANGAEA welcomes contributions from scientists who hold Earth and Environmental Science datasets at risk of going offline. If you have such data or are aware of endangered data, please [https://www.pangaea.de/contact/ contact us]

Coverage

2025-05-15T09:53:07Z

Uschindler: /* What is shown in the map? */ fixup

The coverage describes the spatial and/or temporal distribution of the data set. It is calculated automatically using the [[geocode|geocodes]] from the data matrix or the [[event]] information.

Because coverage is a mandatory discovery property of metadata standards like ISO 19115/19139, Schema.org for Datasets, or DataCite, PANGAEA displays the calculated coverage on the landing page of datasets to make it clear that this information is part of the distributed metadata. 3rd party systems harvesting PANGAEA metadata may use this metadata for discovery and may display the coverage information on their own landing pages.

The purpose of this document is to describe the details of the coverage calculation depending on the use case, especially when relevant information is provided with the data matrix and [[event]].

=== Calculation of the spatial coverage or geolocation ===

The algorithm will only calculate the geolocation from either the coordinates provided in the data matrix or the [[event]], never from (a mixture of) both. The reason for this is that there is no clear and universal way to map multiple instances of coordinates to the required [[geocode|GEOCODES]] latitude and longitude. The same is true for the 3rd [[geocode|GEOCODE]], elevation. Multiple columns reflecting height or depth information, e.g. “depth, sediment” & “depth, mbsf.” or elevation information in the [[event]] and data matrix cannot be mapped to a single vertical [[geocode|GEOCODE]].

However, the geolocation information is often more accurate in data matrices. Therefore, it is used with priority during calculation. Accordingly, the geolocation will be calculated from the coordinates provided in the data matrix if both are present.

However, it’s possible that the geolocation in data matrices sometimes has gaps because, for example, a certain method was not applied at a certain position. If these gaps were on the “boundary” of the area of interest, the resulting collection of positions wouldn’t correctly reflect the sampled area that may (or may not) be specified in the [[event]] information.

For the bounding box given in the web interface, we only provide the boundaries of the latitude (north, south) and longitude (west, east) values. Internally, however, we store the exact locations of each of the four bounding box corners. Please note: A dataset that crosses date-line will has a west-bound longitude that is larger than the east-bound. Therefore PANGAEA avoids to talk about min/max values.

=== The calculation of mean (or median) geolocation values ===

The center of the sample area is determined by calculating the average of all individual sample coordinate pairs. Thus, the mean or median is not the arithmetic mean (or centroid) of the area covered by the constituents, but the arithmetic mean calculated from the entirety of the individual coordinate pairs provided. This is because the mean was introduced at that time to place the map bounding box where most of the data points were located. However, unlike the calculations for geolocation, there is no priority given to information from data matrices and [[Event|events]]. As a result, if they are provided in both, this could lead to incorrect values being provided for the mean geolocation.

The value for median found on dataset landing pages does not refer to the strict mathematical term, but seemed to be the more appropriate term to describe the mean in the context of geo-referenced data.

=== Temporal coverage - Date/Time values ===

Similar to the spatial coverage calculation, the algorithm will only use either the information provided in the data matrices or the [[event]], never both. Using information from both is therefore impossible. If no temporal reference is provided with the matrices, the information from the [[event]] will be used and displayed instead. Temporal coverage provided in data matrices thus takes precedence over [[Event|events]].
In cases where date lines are passed, a special calculation routine ensures that the correct temporal coverage is displayed.

=== Modifications to calculated values ===

PANGAEA editors cannot change the information displayed in the coverage, as those values are automatically calculated. There is no way to modify the values, unless the coordinates given for associated [[Event|events]] and/or [[geocode|geocodes]] in the data matrix are modified, which is not recommended as this may change the interpretation of data in the matrix.

The coverage values are not part of the curated metadata and may also change over the time, e.g., when PANGAEA's algorithms change or are optimized to improve discoverability of datasets (e.g., the way how dataset [[PANGAEA publication types|collections]] are handled by the system). The coverage is shown on the dataset landing pages to allow users to get a quick overview of the geolocation in addition to the displayed map, especially when they did a geospatial search.

=== Collection datasets ===
For [[Collection|dataset collections]], coordinates are not taken from events. The coverage of a dataset collection is calculated with the previously calculated coverage of the contained datasets. It is basically the same calculation as described before, but instead of events or data points it is using the boundaries and mean (or median) values of the contained datasets.

=== Visualization in the overview map ===
[[File:PANGAEA google map with tracks.png|alt=A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as yellow line as a track.|thumb|A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as yellow line as a track.]]
The map in the Dataset landing pages is a by-product of the coverage calculation, provided for a fast and convenient preview of the geographical coverage. It is not part of the official PANGAEA metadata. The implementation/appearance and limits - and whether it appears at all - can change at any time.

==== What is shown in the map? ====
* Markers (red): these represent individual [[Event|Events]]. When an event has start and end (not just a single position), a pair of markers per event is shown. They are numbered 1 and 2 for start and end. Limitation: the markers are not shown when the number of events exceeds XXX.
* Lines (yellow): these can provide tracks for datasets where all 3 [[Geocode|geocodes]] data series (Latitude, Longitude and Date/time) are present in the data table. Limitation: tracks are only shown for datasets under open access license. The lines are also not shown when the number of Lat/Lon/Date/time entries exceeds 10000. If a dataset contains 3 geocodes, but the track visualisation does not make sense in the context of the data, it can be configured out by the editor.
* [[Collection|Dataset collections]] (e.g., publication series or bundled publications) display not events but markers of the contained datasets. This is the same display as the [[PANGAEA search|PANGAEA search engine]] shows when you click on "show map". A "point marker" is shown for each dataset in the collection that has a single event and has no geographical extent. A "polygon marker" is shown for datasets with a non-zero geographical extent (having a bounding box), the marker is placed at the mean latitude/longitude as described above. If multiple datasets of the collection are located at the same location, a "group marker" is displayed.

==== What is not shown in the map? ====
* If no event is present in the dataset, no map is shown.
* A map doesn't show the position of each individual sample or measurement unless these represent individual events. For example, if samples were taken at different locations within a single event, only the position of the event is shown, even though the location changes. An exception to this is the visualization of the "tracks" as yellow line (see above).
* If events are located in polar areas, they might not be shown in the map. The preview map doesn't allow for changing projections.

Coverage

2025-05-15T09:48:59Z

Uschindler: add info for collections; describe map symbols

The coverage describes the spatial and/or temporal distribution of the data set. It is calculated automatically using the [[geocode|geocodes]] from the data matrix or the [[event]] information.

Because coverage is a mandatory discovery property of metadata standards like ISO 19115/19139, Schema.org for Datasets, or DataCite, PANGAEA displays the calculated coverage on the landing page of datasets to make it clear that this information is part of the distributed metadata. 3rd party systems harvesting PANGAEA metadata may use this metadata for discovery and may display the coverage information on their own landing pages.

The purpose of this document is to describe the details of the coverage calculation depending on the use case, especially when relevant information is provided with the data matrix and [[event]].

=== Calculation of the spatial coverage or geolocation ===

The algorithm will only calculate the geolocation from either the coordinates provided in the data matrix or the [[event]], never from (a mixture of) both. The reason for this is that there is no clear and universal way to map multiple instances of coordinates to the required [[geocode|GEOCODES]] latitude and longitude. The same is true for the 3rd [[geocode|GEOCODE]], elevation. Multiple columns reflecting height or depth information, e.g. “depth, sediment” & “depth, mbsf.” or elevation information in the [[event]] and data matrix cannot be mapped to a single vertical [[geocode|GEOCODE]].

However, the geolocation information is often more accurate in data matrices. Therefore, it is used with priority during calculation. Accordingly, the geolocation will be calculated from the coordinates provided in the data matrix if both are present.

However, it’s possible that the geolocation in data matrices sometimes has gaps because, for example, a certain method was not applied at a certain position. If these gaps were on the “boundary” of the area of interest, the resulting collection of positions wouldn’t correctly reflect the sampled area that may (or may not) be specified in the [[event]] information.

For the bounding box given in the web interface, we only provide the boundaries of the latitude (north, south) and longitude (west, east) values. Internally, however, we store the exact locations of each of the four bounding box corners. Please note: A dataset that crosses date-line will has a west-bound longitude that is larger than the east-bound. Therefore PANGAEA avoids to talk about min/max values.

=== The calculation of mean (or median) geolocation values ===

The center of the sample area is determined by calculating the average of all individual sample coordinate pairs. Thus, the mean or median is not the arithmetic mean (or centroid) of the area covered by the constituents, but the arithmetic mean calculated from the entirety of the individual coordinate pairs provided. This is because the mean was introduced at that time to place the map bounding box where most of the data points were located. However, unlike the calculations for geolocation, there is no priority given to information from data matrices and [[Event|events]]. As a result, if they are provided in both, this could lead to incorrect values being provided for the mean geolocation.

The value for median found on dataset landing pages does not refer to the strict mathematical term, but seemed to be the more appropriate term to describe the mean in the context of geo-referenced data.

=== Temporal coverage - Date/Time values ===

Similar to the spatial coverage calculation, the algorithm will only use either the information provided in the data matrices or the [[event]], never both. Using information from both is therefore impossible. If no temporal reference is provided with the matrices, the information from the [[event]] will be used and displayed instead. Temporal coverage provided in data matrices thus takes precedence over [[Event|events]].
In cases where date lines are passed, a special calculation routine ensures that the correct temporal coverage is displayed.

=== Modifications to calculated values ===

PANGAEA editors cannot change the information displayed in the coverage, as those values are automatically calculated. There is no way to modify the values, unless the coordinates given for associated [[Event|events]] and/or [[geocode|geocodes]] in the data matrix are modified, which is not recommended as this may change the interpretation of data in the matrix.

The coverage values are not part of the curated metadata and may also change over the time, e.g., when PANGAEA's algorithms change or are optimized to improve discoverability of datasets (e.g., the way how dataset [[PANGAEA publication types|collections]] are handled by the system). The coverage is shown on the dataset landing pages to allow users to get a quick overview of the geolocation in addition to the displayed map, especially when they did a geospatial search.

=== Collection datasets ===
For [[Collection|dataset collections]], coordinates are not taken from events. The coverage of a dataset collection is calculated with the previously calculated coverage of the contained datasets. It is basically the same calculation as described before, but instead of events or data points it is using the boundaries and mean (or median) values of the contained datasets.

=== Visualization in the overview map ===
[[File:PANGAEA google map with tracks.png|alt=A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as yellow line as a track.|thumb|A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as yellow line as a track.]]
The map in the Dataset landing pages is a by-product of the coverage calculation, provided for a fast and convenient preview of the geographical coverage. It is not part of the official PANGAEA metadata. The implementation/appearance and limits - and whether it appears at all - can change at any time.

==== What is shown in the map? ====
* Markers (red): these represent individual [[Event|Events]]. When an event has start and end (not just a single position), a pair of markers per event is shown. They are numbered 1 and 2 for start and end. Limitation: the markers are not shown when the number of events exceeds XXX.
* Lines (yellow): these can provide tracks for datasets where all 3 [[Geocode|geocodes]] data series (Latitude, Longitude and Date/time) are present in the data table. Limitation: tracks are only shown for datasets under open access license. The lines are also not shown when the number of Lat/Lon/Date/time entries exceeds 10000. If a dataset contains 3 geocodes, but the track visualisation does not make sense in the context of the data, it can be configured out by the editor.
* [[Collection|Dataset collections]] (e.g., publication series or bundled publications) display not events but markers of the contained datasets. This is the same display as the [[PANGAEA search|PANGAEA search engine]] shows when you click on "show map". A "point marker" is shown for each dataset in the collection that has a single event and has no geographical extent; a "polygon marker" is shown for datasets with some geographical extent. If multiple datasets of the collection are located at the same place a "group marker" is displayed.

==== What is not shown in the map? ====
* If no event is present in the dataset, no map is shown.
* A map doesn't show the position of each individual sample or measurement unless these represent individual events. For example, if samples were taken at different locations within a single event, only the position of the event is shown, even though the location changes. An exception to this is the visualization of the "tracks" as yellow line (see above).
* If events are located in polar areas, they might not be shown in the map. The preview map doesn't allow for changing projections.

PANGAEA XML schema

2025-01-10T08:17:33Z

Uschindler: update new search schema

Field names as given in the '''PANGAEA Extensible Markup Language (XML) schema''' can be used for specific queries using the [[PANGAEA search|PANGAEA search engine]]. Queries are not case sensitive and can be combined by boolean operators. A blank is equivalent to ''AND'', use the ''minus sign (-)'' to exclude specifications. See also the ''Help'' function of PangaVista.

Not all fields are searchable this way (e.g., numeric fields aren't). For some fields a different syntax is required. "attribute" and "reference" need to be searched using the actual names of attributes or the relation type, e.g. "relatedto:".

Search examples:
* data sets with a specific parameter could be found by using e.g.
** [http://www.pangaea.de/?q=parameter:obsidian ''parameter:obsidian''] or
** [http://www.pangaea.de/?q=parameter:name:%22Carbon,%20organic,%20dissolved%22 ''parameter:name:"Carbon, organic, dissolved"]
* data sets with a specific event [http://www.pangaea.de/?q=event:64-474 ''event:64-474'']
* data sets of a certain author in the citation [http://www.pangaea.de/?q=citation:author:schumacher ''citation:author:schumacher'']
* all data sets of project EPOCA [http://www.pangaea.de/?q=project:epoca ''project:epoca'']
* data sets following two queries combined by boolean operators
**keyword ''radiocarbon'' in related to linked reference AND published via PANGAEA in the year 2000 [http://www.pangaea.de/?q=relatedto:radiocarbon+citation:year:2000 ''relatedto:radiocarbon citation:year:2000'']
**keyword ''radiocarbon'' in related to linked reference OR published via PANGAEA in the year 2000 [http://www.pangaea.de/?q=relatedto:radiocarbon+or+citation:year:2000 ''relatedto:radiocarbon or citation:year:2000'']
**keyword ''radiocarbon'' in related to linked reference and NOT published via PANGAEA in the year 2000 [http://www.pangaea.de/?q=relatedto:radiocarbon+-citation:year:2000 ''relatedto:radiocarbon -citation:year:2000'']

== Schema ==

http://ws.pangaea.de/schemas/pangaea/MetaData.png

[http://ws.pangaea.de/schemas/pangaea/MetaData.xsd XML variant for writing transformations]

PANGAEA search

2025-01-10T08:11:16Z

Uschindler: update new search schema

=Basic search=
[[Image:PANGAEA search 2018-07.PNG|thumb|200px|Search field on PANGAEA home]]

The most convenient and fastest way to find data is using the search engine on [http://www.pangaea.de/ PANGAEA home].
Each predefined dataset in its granularity as defined by the PI can be found by keywords and any expressions matching the data set description. Search is supported by an autocomplete functionality. Keywords can be
combined to create [https://en.wikipedia.org/wiki/Boolean_expression Boolean expressions] using a syntax identical to those used by search engines.

As a result of a query the titles of datasets are listed, linking to the full meta-description.

By prefixing keywords (using the format "prefix:keyword") with a tag name from the [[PANGAEA XML schema]] the search can be performed inside specific parts of the schema. Exceptions from that are schema parts like "attribute" and "reference" which are not searchable this way.

=Filtering of search results=

[[Image:PANGAEA search 2 2018-07.PNG|thumb|200px|Filtering of search results]]
The results of search can be filtered using facets in the left panel:
* [[Staff|Dataset Author]]
* Dataset Publication Year
* Topic
* [[Project]]
* [[Basis]]
* [[Method|Method/Device]]
* [[Campaign]] and
* Location

[[Image:PANGAEA geographical filter by 2018-07.PNG|thumb|200px|Filtering of search results]]
Additionally, the search results can be filtered by:
* Geographical coordinates and
* Date

=Advanced search=

'''Choosing search terms'''

When choosing search terms keep in mind:
* Try the obvious first. If you're looking for information on the grain size of sediment, enter "grain size" rather than "sediments"
* Use words likely to appear on a site with the information you want. "Holocene ice Lazarev" gets better results than "Holocene ice extension from the Lazarev Sea shelf".

'''Capitalization'''

PANGAEA searches are NOT case sensitive. All letters, regardless of how you type them, will be understood as lower case. For example, searches for "marine geology", "Marine Geology", and "mArInE gEoLoGy" will all return the same results.

'''Using query operators'''

PANGAEA Search uses per default the "AND" logic to combine the search terms. This means that all entered terms must be in the searched documents. To find documents that contain either one or another term (or both) concatenate by "OR". For example, enter "falconensis OR bulloides" to get all datasets that contain one of the terms.

The use of "AND" between keywords is optional. If you want to combine "AND" and "OR", use brackets - for example: "Globigerina AND (falconensis OR bulloides)".

'''Excluding searches by using "-"'''

To exclude certain keywords add a minus sign ("-") immediately before the search term you want to avoid (be sure to include a space before the minus sign).

'''Approximate searches'''

If you do not exactly know the spelling of a word, you may want to search not only for a particular keyword, but also for variants in spelling. Indicate a search for all by placing the tilde sign ("~") immediately in front of the keyword.

'''Wildcards'''

Wildcards allow a substitution of unknown characters in the item used for searching. The following table describes the wildcard characters and their attributes:

{| class="wikitable"
|-
! Wildcard !! Function !! Syntax !! Locates
|-
| ? || Specifies one alphanumeric character.|| m?ller || "müller", "miller", and "muller"
|-
| * || Specifies zero or more of any alphanumeric character. You should not use the asterisk to specify the first character of a wildcard-character string (slow search).|| corp*|| "corporate", "corporation", "corporal", and "corpulent"
|}

'''Phrase searches'''

Search for complete phrases by enclosing them in quotation marks. Words enclosed in double quotes ("like this") will appear together in all results exactly as you have entered them. Phrase searches are especially useful when searching for phrases or full names.

'''Searches in specific fields'''

[[PANGAEA XML schema]] can be used for specific queries using the PANGAEA search engine.
Search for keywords in specific fields by putting a the field name with a ':' immediately in front of the term you want to match. Exceptions from that are schema parts like "attribute" and "reference" which are not searchable this way, instead references can be searched using the reference relation type (e.g., "supplementto" or "relatedto"). Inside attributes can also be searched by using their name as field in front of the term. The most used field names are:

{| class="wikitable"
|-
! Field name !! Function
|-
| project:|| Search for keywords in [[Project|projects]]
|-
| project:label:|| Matches a project label
|-
| author:|| Search for authors of datasets or assigned references
|-
| citation:author:|| Search for authors of datasets only in the citation
|-
| pi:|| Search for datasets with Principal Investigator (PI)
|-
| citation:|| Search for keywords in the [[citation]]
|-
| relatedto:|| Search for keywords in assigned "Related to" [[Reference|references]].
|-
| supplementto:|| Search for keywords in assigned "Supplement to" [[Reference|references]].
|-
| year:|| Search for datasets or assigned references published in a specific year
|-
|citation:year:
|Search for datasets only published in a specific year
|-
| parameter:|| Search for keywords in [[parameter]] names
|-
| method:|| Search for keywords in [[method]] names
|-
| event:label:|| Search for [[event]] labels
|-
|basis:
|Search for [[basis]] eg. ship or research station
|-
|campaign:
|Search for reasearch [[Campaign|campaigns]]
|-
|[https://www.pangaea.de/?q=O2ARegistryURI%3A* O2ARegistryURI:*]
|Search for datasets with a O2A Registry URI (link to registry.awi.de). This is an example of a special case for attributes and uses plain wildcard to find all datasets where the specific field is actually used.
|}

'''Query examples'''

{| class="wikitable"
|-
| marine|| Finds datasets that contain "marine".
|-
| marine geology || Finds datasets that contain both "marine" and "geology"
|-
| "marine geology"|| Placing quotation marks around any series of words turns them into a phrase and tells PANGAEA Search that you are only interested in data sets that have the words in this specific order.
|-
| marine geology -organic|| Finds datasets that contains both "marine" and "geology" but not "organic"
|-
| Globigerina AND (falconensis OR bulloides)|| Finds datasets that contain "Globigerina" and either "falconensis" or "bulloides"
|-
| ~Neogloboqadrina|| Finds datasets with "Neogloboquadrina" regardless of your spelling mistake
|-
| project:label:IMAGES|| Finds datasets that belong to project "IMAGES"
|-
| citation:author:Mackensen|| Finds datasets of author "Mackensen"
|-
| m?ller|| Finds "Müller", "Muller" or "Miller". Use this to specify characters you cannot type in with your keyboard
|}

'''PANGAEA Search Results'''

The results page shows a list of abbreviated dataset descriptions (thumbs) including the links to the full dataset description and links to download the dataset in either html or text format. The score gives an estimate on the relevance of the search result: a higher score means that the entered words can be found more often and closer together.

Datasets are listed with ordinal numbers and are shown in ten hits per page. Above and below the listing, one may click the page number or the NEXT (or PREV) link to see more results.



=Data warehouse=

[[Image:PANGAEA go to warehouse 2018-07.PNG|thumb|200px|Entering data warehouse]]
The '''[http://en.wikipedia.org/wiki/Data_warehouse data warehouse]''' is a tool to combine data from different PANGAEA datasets in one file. With a login the < Data warehouse > button is visible after submitting a query. The button links to a page which allows to configure geocodes and parameters for an export table. Parameters are listed in order by a score which depends on the query.

''' Example:'''
[[Image:Bulloides_1.png|thumb|200px|Data set of a planktonic foraminifera extracted with the data warehouse. Map plotted with [[ODV]], interpolated with the ''diva'' algorithm.]]

The following example will produce a distribution map of a plankton shell in the world ocean sediments.

* go to http://www.pangaea.de
* login (or sign up for an account)
* search for ''bulloides'' (species name of a planktonic foraminifera)
* click on < Data warehouse > (a button on the upper right of the page)
* choose:
** Latitude
** Longitude
** Depth, sediment [m]
** Globigerina bulloides [%]
* < Start Data Warehouse Query >
* find a file ''bulloides.zip'' on your desktop and extract it. The file contains a list of all dataset citations in various formats (text only, RIS/Endnote, BIBTEX) and the datafile ''bulloides.tab''.
* start [[Pan2Applic]] (needs to be installed first)
* drag'n drop ''bulloides.tab'' to the empty window
* choose Convert/Ocean Data View ([[ODV]] needs to be installed first)
'''Important:''' '''''It is required to cite all datasets which are referenced in the data file and the citation list!'''''

=External services=
* '''[https://pypi.org/project/pangaeapy/ pangaeapy]''': a Python module to download and analyse metadata as well as data from tabular PANGAEA datasets
* '''[https://ropensci.github.io/pangaear/ pangaear]''': an R client to interact with the PANGAEA

Citation

2024-10-28T09:44:48Z

Uschindler: change image

==Best practice of data citation==
[[File:Citation_tools.png|400px|thumbnail|right|Citation tools (copy and export citation) located below the data set reference.]]

PANGAEA publishes data in a similar way as scientific journals do. And as such, published data sets are cited in a similar manner. A data citation should contain:
* the authors (creators)
* the publication year
* the dataset title
* the [[PANGAEA publication types|type]] "dataset" or the corresponding collection type for series or bundled publications
* the publisher
* a unique persistent identifier (e.g. a DOI)

The full data citation of each referenced data set should be included in the reference list of any publication citing the data. For the general structure, we follow the DataCite recommendations:

{| class="wikitable"
|-
| ''Creator (PublicationYear): Title [dataset]. Publisher (PANGAEA). Identifier (DOI)''
|}

On the landing page of each data set, the suggested citation of the data set is displayed at the top, e.g. see [https://doi.pangaea.de/10.1594/PANGAEA.912021 here]:
{| class="wikitable"
|-
| ''Timofeeva, Anna; Smolyanitsky, Vasily; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021''
|}

The citation can be copied or exported in the preferred format using the copy or export buttons below the title. Further buttons enable sharing the reference via social media.

In a few cases, authors were appointed to submit data on behalf of their institution. In these cases also the institution, not individual researchers, is responsible for the acquisition of data and related science. It will appear in the suggested citation:

{| class="wikitable"
|-
| ''Creator (PublicationYear): Title [dataset]. '''Institution'''. Publisher (PANGAEA). Identifier (DOI)''
|}

As an example, see [https://doi.pangaea.de/10.1594/PANGAEA.948838 here]:
{| class="wikitable"
|-
| ''Nicolaus, Marcel; Hoppmann, Mario; Tao, Ran; Katlein, Christian (2023): Spectral radiation fluxes, albedo and transmittance from autonomous measurements from Radiation Station 2020R21, deployed during MOSAiC 2019/20 [dataset bundled publication]. Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, PANGAEA, https://doi.org/10.1594/PANGAEA.948838''
|}

===Where to refer to the data?===
As stated above, the full data citation should be contained in the reference list of any work citing the data [https://doi.org/10.1038/s41597-023-02491-7 (Stall et al., 2023)].

But where in the text is the right position to refer to the data?
Obviously it depends on the context, for example on whether the data is original or reused.
Generally, the data can be cited in the methods or results section, or in the data availability section if offered by the journal.

For the latter, a suggestion to refer to the data would be and in-text citation, such as:

"Data for this study were published open access (Authors, YYYY).", followed by the corresponding entry (full citation) of the dataset in the list of references.

==Why is the correct citation of datasets important?==

First of all, citing sources is good scientific practice. It gives credit to your work and the work of others, and increases reproducibility of findings and thus trust in your research (see also [https://datacite.org/cite-your-data.html DataCite: why to cite data]).

Secondly, in the increasing importance of Open data, citing data sets is an important part of the aspired rewarding system. Metrics are now starting to be provided by several platforms, such as [https://search.datacite.org DataCite]. However, citations can only be counted if data sets are referred to in the correct manner.

==Data sets ''"in review"''==
During the archiving and review process or when a [[status|moratorium]] is set on the data, for example due to the [[status|publication status]] of a connected manuscript, the data is kept in the [[status]] ''"in review"''. Datasets "in review" are usually only accessible for the contributing authors after logging in to PANGAEA. However, the metadata, including authors, title, references and parameters are already findable and visible in PANGAEA. At this stage, the data will be displayed on the website with a preliminary link instead of a registered, persistent DOI. This preliminary link can be recognized by the following format:

https://doi.pangaea.de/10.1594/PANGAEA.XXXXXX (''XXXXXX = DataSetID'')

It can only be resolved by the PANGAEA DOI resolver. Once the review process is finished, the DOI will be registered and take the form of

https://doi.org/10.1594/PANGAEA.XXXXXX (''XXXXXX = DataSetID'')

It is important to note that datasets "in review" might be modified or even deleted during the review process. Only the second form guarantees persistent access and reference to the data. Citation of any data with the [[status]] "in review" should be avoided.

==Publication of data in PANGAEA==
After technical review by the editor, import and approval of the author/PI, a dataset is set to [[status]] ''published'' and appears as ''citable'' on the Internet. Upon publication of a data set, the DOI registration is initiated. This process is finalized after 28 days. During this time, the data set can still be modified. However, after finalizing the DOI registration, the data set is published and cannot be changed anymore. Any changes to the dataset would be analogous to an erratum of a journal article.

Small adjustments, as the correction of small mistakes or typos, can still occur and are displayed as metadata „Change history“, both on the data set landing page (below the parameter overview) and the downloaded data set. As an example see [https://doi.pangaea.de/10.1594/PANGAEA.882624 here]:

{| class="wikitable"
|-
| ''Change history'': 2020-03-25T13:34:53 – Parameter Ice thickness [m] exchanged with Parameter Thickness of ice accretion [cm], no recalculation of values necessary
|}

==Further reading==
For further details see the '''Author Preparation''' section of Stall et al., 2023. This includes information on datasets and software citation in research articles, how to structure these citations and provide information on selecting the best possible scientific repositories to use for data and software, and what information to put in an Availability Statement.
* Stall, S., Bilder, G., Cannon, M. et al. Journal Production Guidance for Software and Data Citations. Sci Data 10, 656 (2023). {{doi|10.1038/s41597-023-02491-7}}

File:Citation tools.png

2024-10-28T09:43:57Z

Uschindler:

Citation

2024-10-28T09:22:29Z

Uschindler: Update citation format and copy some text from new source page

==Best practice of data citation==
[[File:Citation tools.JPG|400px|thumbnail|right|Citation tools (copy and export citation) located below the data set reference.]]

PANGAEA publishes data in a similar way as scientific journals do. And as such, published data sets are cited in a similar manner. A data citation should contain:
* the authors (creators)
* the publication year
* the dataset title
* the [[PANGAEA publication types|type]] "dataset" or the corresponding collection type for series or bundled publications
* the publisher
* a unique persistent identifier (e.g. a DOI)

The full data citation of each referenced data set should be included in the reference list of any publication citing the data. For the general structure, we follow the DataCite recommendations:

{| class="wikitable"
|-
| ''Creator (PublicationYear): Title [dataset]. Publisher (PANGAEA). Identifier (DOI)''
|}

On the landing page of each data set, the suggested citation of the data set is displayed at the top, e.g. see [https://doi.pangaea.de/10.1594/PANGAEA.912021 here]:
{| class="wikitable"
|-
| ''Timofeeva, Anna; Smolyanitsky, Vasily; Bessonov, Vladimir; Petrovskiy, Tomash (2020): Special sea ice observations aboard Akademik Fedorov MOSAiC leg 1, 2019-09-25 to 2019-10-20 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.912021''
|}

The citation can be copied or exported in the preferred format using the copy or export buttons below the title. Further buttons enable sharing the reference via social media.

In a few cases, authors were appointed to submit data on behalf of their institution. In these cases also the institution, not individual researchers, is responsible for the acquisition of data and related science. It will appear in the suggested citation:

{| class="wikitable"
|-
| ''Creator (PublicationYear): Title [dataset]. '''Institution'''. Publisher (PANGAEA). Identifier (DOI)''
|}

As an example, see [https://doi.pangaea.de/10.1594/PANGAEA.948838 here]:
{| class="wikitable"
|-
| ''Nicolaus, Marcel; Hoppmann, Mario; Tao, Ran; Katlein, Christian (2023): Spectral radiation fluxes, albedo and transmittance from autonomous measurements from Radiation Station 2020R21, deployed during MOSAiC 2019/20 [dataset bundled publication]. Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, PANGAEA, https://doi.org/10.1594/PANGAEA.948838''
|}

===Where to refer to the data?===
As stated above, the full data citation should be contained in the reference list of any work citing the data [https://doi.org/10.1038/s41597-023-02491-7 (Stall et al., 2023)].

But where in the text is the right position to refer to the data?
Obviously it depends on the context, for example on whether the data is original or reused.
Generally, the data can be cited in the methods or results section, or in the data availability section if offered by the journal.

For the latter, a suggestion to refer to the data would be and in-text citation, such as:

"Data for this study were published open access (Authors, YYYY).", followed by the corresponding entry (full citation) of the dataset in the list of references.

==Why is the correct citation of datasets important?==

First of all, citing sources is good scientific practice. It gives credit to your work and the work of others, and increases reproducibility of findings and thus trust in your research (see also [https://datacite.org/cite-your-data.html DataCite: why to cite data]).

Secondly, in the increasing importance of Open data, citing data sets is an important part of the aspired rewarding system. Metrics are now starting to be provided by several platforms, such as [https://search.datacite.org DataCite]. However, citations can only be counted if data sets are referred to in the correct manner.

==Data sets ''"in review"''==
During the archiving and review process or when a [[status|moratorium]] is set on the data, for example due to the [[status|publication status]] of a connected manuscript, the data is kept in the [[status]] ''"in review"''. Datasets "in review" are usually only accessible for the contributing authors after logging in to PANGAEA. However, the metadata, including authors, title, references and parameters are already findable and visible in PANGAEA. At this stage, the data will be displayed on the website with a preliminary link instead of a registered, persistent DOI. This preliminary link can be recognized by the following format:

https://doi.pangaea.de/10.1594/PANGAEA.XXXXXX (''XXXXXX = DataSetID'')

It can only be resolved by the PANGAEA DOI resolver. Once the review process is finished, the DOI will be registered and take the form of

https://doi.org/10.1594/PANGAEA.XXXXXX (''XXXXXX = DataSetID'')

It is important to note that datasets "in review" might be modified or even deleted during the review process. Only the second form guarantees persistent access and reference to the data. Citation of any data with the [[status]] "in review" should be avoided.

==Publication of data in PANGAEA==
After technical review by the editor, import and approval of the author/PI, a dataset is set to [[status]] ''published'' and appears as ''citable'' on the Internet. Upon publication of a data set, the DOI registration is initiated. This process is finalized after 28 days. During this time, the data set can still be modified. However, after finalizing the DOI registration, the data set is published and cannot be changed anymore. Any changes to the dataset would be analogous to an erratum of a journal article.

Small adjustments, as the correction of small mistakes or typos, can still occur and are displayed as metadata „Change history“, both on the data set landing page (below the parameter overview) and the downloaded data set. As an example see [https://doi.pangaea.de/10.1594/PANGAEA.882624 here]:

{| class="wikitable"
|-
| ''Change history'': 2020-03-25T13:34:53 – Parameter Ice thickness [m] exchanged with Parameter Thickness of ice accretion [cm], no recalculation of values necessary
|}

==Further reading==
For further details see the '''Author Preparation''' section of Stall et al., 2023. This includes information on datasets and software citation in research articles, how to structure these citations and provide information on selecting the best possible scientific repositories to use for data and software, and what information to put in an Availability Statement.
* Stall, S., Bilder, G., Cannon, M. et al. Journal Production Guidance for Software and Data Citations. Sci Data 10, 656 (2023). {{doi|10.1038/s41597-023-02491-7}}

User:Uschindler

2024-09-05T09:20:49Z

Uschindler: /* Uwe 👮 Schindler */

== Uwe 👮 Schindler ==
* MARUM - University of Bremen, Mary-Somerville-Straße 2-4, D-28359 Bremen
* phone +49 421 218 65595
* mobile +49 179 1224881
* mailto:uschindler@pangaea.de

== Responsibilities ==

Web services, [[web server]], system management, [[domains]], [[middleware]], [[PangaeaWiki:About|{{SITENAME}}]]

== Bio (first posted on [http://markmail.org/message/klznohkgwxtkq5j3 Lucene Mailing list)] ==

I am 36 years old, born in southern Germany (Bamberg). I studied physics at
University Erlangen-Nürnberg and graduated in 2004. Since 1996 I was working
in parallel for [[PANGAEA]] (Publishing Network for Geological and Environmental
Data, http://www.pangaea.de/, it is a library for data
publications) as software designer. In 1996 I was one of the early persons
started to program Java applets with JDK 1.0.2 (one of them is still
available on the PANGAEA homepage, [[ART]]) and since then I used Java as my primary
programming language.

Since end of 2004, I am working full-time for PANGAEA, employed at the
Center for Marine Environmental Sciences belonging to the University of
Bremen, where I live now (and trying to get a PhD in parallel). My
primary business is now design and development of the fundamental middleware
components for PANGAEA. In 2004 we started to use full-text search engines
for metadata search. We noticed early, that geographical information systems
can take advantage of full text engines, but that search using geographical
constraints is also important. Our first FTS was embedded into our backend
database (at this time PostgreSQL), so it was possible to do joins between the FTS and
relational tables. Looking for a better solution, I came to Lucene. In
2005-2006 I started a new Open Source Project called [[panFMP]] (PANGAEA
Framework for Metadata Portals, http://www.panfmp.org/),
that uses Lucene for full text indexing and enriched Lucene with the
TrieRangeQuery. panFMP is a little bit like Solr, but at the time when the
project started, Solr was not well-known to the community. [[PanFMP]] uses a
little bit different metadata approach and uses the harvesting approach
(OAI-PMH), but is comparable.

Since 2003 I am also working in the PHP (http://www.php.net/) development crew, maintaining the
web server plug-in for Sun Java System Web Servers (my favourite web
server). I also have a small one-man software-company called "Schindlers
Software" (http://www.schindlers-software.de). My (IT) interests are Lucene, XML techniques, Data
Warehousing, Sensor Networks, metadata dissemination using global standards,
global unique identifiers like DOIs,...

== Lucene Presentation ==
* http://www.gossamer-threads.com/lists/lucene/java-user/90174
* http://www.heise.de/ix/meldung/Suchmaschine-Lucene-3-raeumt-auf-Update-869645.html
* http://www.golem.de/0911/71491.html
* http://freshmeat.net/projects/lucene

Coverage

2024-08-26T11:10:53Z

Uschindler: /* Modifications to calculated values */

The coverage describes the spatial and/or temporal distribution of the data set. It is calculated automatically using the geocodes from the data matrix and the event information.

The purpose of this document is to describe the details of the coverage calculation depending on the use case, especially when relevant information is provided with the data matrix and event.

=== Calculation of the spatial coverage or geolocation ===

The algorithm will only '''calculate the geolocation from either the coordinates provided in the data matrix or the event, never from (a mixture of) both'''. The reason for this is that there is no clear and universal way to map multiple instances of coordinates to the required GEOCODES latitude and longitude. '''The same is true for the 3rd GEOCODE, elevation. Multiple columns reflecting height or depth information''', e.g. “depth, sediment” & “depth, mbsf.” or elevation information in the event and data matrix '''cannot be mapped to a single vertical GEOCODE'''.

However, the geolocation information is often more accurate in data matrices. Therefore, it is used with priority during calculation. Accordingly, '''the geolocation will be calculated from the coordinates provided in the data matrix if both are present'''.

However, it’s possible that the geolocation in data matrices sometimes has gaps because, for example, a certain method was not applied at a certain position. If these gaps were on the “boundary” of the area of interest, the resulting collection of positions wouldn’t correctly reflect the sampled area that may (or may not) be specified in the event information.

For the bounding box given in the web interface, we only provide the minimum/maximum of the latitude and longitude values. Internally, however, we store the exact locations of each of the four bounding box corners.

=== The calculation of mean (or median) geolocation values ===

'''The center of the sample area is determined by calculating the average of all individual sample coordinate pairs. Thus, the mean or median is not the arithmetic mean (or centroid)''' of the area covered by the constituents, but the arithmetic mean calculated from the entirety of the individual coordinate pairs provided. This is because the mean was introduced at that time to place the map bounding box where most of the data points were located.
However, unlike the calculations for geolocation, there is '''no priority given to information from data matrices and events'''. As a result, if they are provided in both, this could lead to incorrect values being provided for the mean geolocation.

The value for median found on dataset landing pages does not refer to the strict mathematical term, but seemed to be the more appropriate term to describe the mean in the context of geo-referenced data. [community consensus or rather arbitrary?]

=== Temporal coverage - Date/Time values ===

Similar to the spatial coverage calculation, '''the algorithm will only use either the information provided in the data matrices or the event, never both'''. Using information from both is therefore impossible. '''If no temporal reference is provided with the matrices, the information from the event will be used and displayed instead'''. Temporal coverage provided in data matrices thus takes precedence over events.
In cases where date lines are passed, a special calculation routine ensures that the correct temporal coverage is displayed.

=== Modifications to calculated values ===

PANGAEA editors cannot change the information displayed in the coverage, as those values are automatically calculated. There is no way to modify the values, unless the coordinates given for associated events and/or geocodes in the data matrix are modified, which is not recommended as this may change the interpretation of data in the matrix.

The coverage values are not part of the curated metadata and may also change over the time, e.g., when PANGAEA's algorithms change or are optimized to improve discoverability of datasets (e.g., the way how dataset collections are handled by the system). The coverage is shown on the dataset landing pages to allow users to get a quick overview of the geolocation in addition to the displayed map, especially when they did a geospatial search.

Because coverage is a mandatory discovery property of metadata standards like ISO 19115/19139, Schema.org for Datasets, or DataCite, PANGAEA displays the calculated coverage on the landing page of datasets to make it clear that this information is part of the distributed metadata. 3rd party systems harvesting PANGAEA metadata may use this metadata for discovery and may display the coverage information on their own landing pages.

Coverage

2024-08-26T11:06:12Z

Uschindler: some additions regarding why the values are shown and remove "import" from text. The coverage may be recalculated after import

The coverage describes the spatial and/or temporal distribution of the data set. It is calculated automatically using the geocodes from the data matrix and the event information.

The purpose of this document is to describe the details of the coverage calculation depending on the use case, especially when relevant information is provided with the data matrix and event.

=== Calculation of the spatial coverage or geolocation ===

The algorithm will only '''calculate the geolocation from either the coordinates provided in the data matrix or the event, never from (a mixture of) both'''. The reason for this is that there is no clear and universal way to map multiple instances of coordinates to the required GEOCODES latitude and longitude. '''The same is true for the 3rd GEOCODE, elevation. Multiple columns reflecting height or depth information''', e.g. “depth, sediment” & “depth, mbsf.” or elevation information in the event and data matrix '''cannot be mapped to a single vertical GEOCODE'''.

However, the geolocation information is often more accurate in data matrices. Therefore, it is used with priority during calculation. Accordingly, '''the geolocation will be calculated from the coordinates provided in the data matrix if both are present'''.

However, it’s possible that the geolocation in data matrices sometimes has gaps because, for example, a certain method was not applied at a certain position. If these gaps were on the “boundary” of the area of interest, the resulting collection of positions wouldn’t correctly reflect the sampled area that may (or may not) be specified in the event information.

For the bounding box given in the web interface, we only provide the minimum/maximum of the latitude and longitude values. Internally, however, we store the exact locations of each of the four bounding box corners.

=== The calculation of mean (or median) geolocation values ===

'''The center of the sample area is determined by calculating the average of all individual sample coordinate pairs. Thus, the mean or median is not the arithmetic mean (or centroid)''' of the area covered by the constituents, but the arithmetic mean calculated from the entirety of the individual coordinate pairs provided. This is because the mean was introduced at that time to place the map bounding box where most of the data points were located.
However, unlike the calculations for geolocation, there is '''no priority given to information from data matrices and events'''. As a result, if they are provided in both, this could lead to incorrect values being provided for the mean geolocation.

The value for median found on dataset landing pages does not refer to the strict mathematical term, but seemed to be the more appropriate term to describe the mean in the context of geo-referenced data. [community consensus or rather arbitrary?]

=== Temporal coverage - Date/Time values ===

Similar to the spatial coverage calculation, '''the algorithm will only use either the information provided in the data matrices or the event, never both'''. Using information from both is therefore impossible. '''If no temporal reference is provided with the matrices, the information from the event will be used and displayed instead'''. Temporal coverage provided in data matrices thus takes precedence over events.
In cases where date lines are passed, a special calculation routine ensures that the correct temporal coverage is displayed.

=== Modifications to calculated values ===

PANGAEA editors cannot change the information displayed in the coverage, as those values are automatically calculated. There is no way to modify the values, unless the coordinates given for associated events and/or geocodes in the data matrix are modified, which is not recommended as this may change the interpretation of data in the matrix.

The coverage values are not part of the curated metadata and may also change over the time, e.g., when PANGAEA's algorithms change or are optimized to improve discoverability of datasets (e.g., the way how dataset collections are handled by the system). The coverage is shown on the dataset landing pages to allow users to get a quick overview of the geolocation in addition to the displayed map, especially when they did a geospatial search.

Because coverage is a mandatory discovery property of metadata standards like ISO 19115/19139 or DataCite, PANGAEA displays the calculated coverage on the landing page of datasets to make it clear that this information is part of the distributed metadata. 3rd party systems harvesting PANGAEA metadata may use this metadata for discovery and may display the coverage information on their own landing pages.