Coverage: Difference between revisions

From PANGAEA Wiki
Jump to navigation Jump to search
Abuecker (talk | contribs)
Pittauer (talk | contribs)
Tag: Reverted
Line 40: Line 40:


==== What is shown in the map? ====
==== What is shown in the map? ====
* Markers (red): these represent individual [[Event|Events]]. When an event has start and end (not just a single position), a pair of markers per event is shown. They are numbered 1 and 2 for start and end. Limitation: the markers are not shown when the number of events exceeds XXX.
* Markers (red): these represent individual [[Event|Events]]. When an event has start and end (not just a single position), a pair of markers per event is shown. They are numbered 1 and 2 for start and end. Limitation: the markers are not shown when the number of events exceeds 10000.
* Lines (yellow): these can provide tracks for datasets where all 3 [[Geocode|geocodes]] data series (Latitude, Longitude and Date/time) are present in the data table. Limitation: tracks are only shown for datasets under open access license.  The lines are also not shown when the number of Lat/Lon/Date/time entries exceeds 10000. If a dataset contains 3 geocodes, but the track visualisation does not make sense in the context of the data, it can be configured out by the editor.
* Lines (yellow): these can provide tracks for datasets where all 3 [[Geocode|geocodes]] data series (Latitude, Longitude and Date/time) are present in the data table. Limitation: tracks are only shown for datasets under open access license.  The lines are also not shown when the number of Lat/Lon/Date/time entries exceeds 10000. If a dataset contains 3 geocodes, but the track visualisation does not make sense in the context of the data, it can be configured out by the editor.
* [[Collection|Dataset collections]] (e.g., publication series or bundled publications) do not display events but markers of the contained datasets. This is the same display as the [[PANGAEA search|PANGAEA search engine]] shows when you click on "show map". A "point marker" is shown for each dataset in the collection that has a single event and has no geographical extent. A "polygon marker" is shown for datasets with a non-zero geographical extent (having a bounding box), the marker is placed at the mean latitude/longitude as described above. If multiple datasets of the collection are located at the same location, a "group marker" is displayed.
* [[Collection|Dataset collections]] (e.g., publication series or bundled publications) do not display events but markers of the contained datasets. This is the same display as the [[PANGAEA search|PANGAEA search engine]] shows when you click on "show map". A "point marker" is shown for each dataset in the collection that has a single event and has no geographical extent. A "polygon marker" is shown for datasets with a non-zero geographical extent (having a bounding box), the marker is placed at the mean latitude/longitude as described above. If multiple datasets of the collection are located at the same location, a "group marker" is displayed.

Revision as of 2025-05-15T11:55:25

The coverage describes the spatial and/or temporal distribution of the data set. It is calculated automatically using the geocodes from the data matrix or the event information.

Because coverage is a mandatory discovery property of metadata standards like ISO 19115/19139, Schema.org for Datasets, or DataCite, PANGAEA displays the calculated coverage on the landing page of datasets to make it clear that this information is part of the distributed metadata. 3rd party systems harvesting PANGAEA metadata may use this metadata for discovery and may display the coverage information on their own landing pages.

The purpose of this document is to describe the details of the coverage calculation depending on the use case, especially when relevant information is provided with the data matrix and event.

Calculation of the spatial coverage or geolocation

The algorithm will only calculate the geolocation from either the coordinates provided in the data matrix or the event, never from (a mixture of) both. The reason for this is that there is no clear and universal way to map multiple instances of coordinates to the required GEOCODES latitude and longitude. The same is true for the 3rd GEOCODE, elevation. Multiple columns reflecting height or depth information, e.g. “depth, sediment” & “depth, mbsf.” or elevation information in the event and data matrix cannot be mapped to a single vertical GEOCODE.

However, the geolocation information is often more accurate in data matrices. Therefore, it is used with priority during calculation. Accordingly, the geolocation will be calculated from the coordinates provided in the data matrix if both are present.

However, it’s possible that the geolocation in data matrices sometimes has gaps because, for example, a certain method was not applied at a certain position. If these gaps were on the “boundary” of the area of interest, the resulting collection of positions wouldn’t correctly reflect the sampled area that may (or may not) be specified in the event information.

For the bounding box given in the web interface, we only provide the boundaries of the latitude (north, south) and longitude (west, east) values. Internally, however, we store the exact locations of each of the four bounding box corners. Please note: A dataset that crosses date-line will has a west-bound longitude that is larger than the east-bound. Therefore PANGAEA avoids to talk about min/max values.

The calculation of mean (or median) geolocation values

The center of the sample area is determined by calculating the average of all individual sample coordinate pairs. Thus, the mean or median is not the arithmetic mean (or centroid) of the area covered by the constituents, but the arithmetic mean calculated from the entirety of the individual coordinate pairs provided. This is because the mean was introduced at that time to place the map bounding box where most of the data points were located. However, unlike the calculations for geolocation, there is no priority given to information from data matrices and events. As a result, if they are provided in both, this could lead to incorrect values being provided for the mean geolocation.

The value for median found on dataset landing pages does not refer to the strict mathematical term, but seemed to be the more appropriate term to describe the mean in the context of geo-referenced data.

Temporal coverage - Date/Time values

Similar to the spatial coverage calculation, the algorithm will only use either the information provided in the data matrices or the event, never both. Using information from both is therefore impossible. If no temporal reference is provided with the matrices, the information from the event will be used and displayed instead. Temporal coverage provided in data matrices thus takes precedence over events. In cases where date lines are passed, a special calculation routine ensures that the correct temporal coverage is displayed.

Modifications to calculated values

PANGAEA editors cannot change the information displayed in the coverage, as those values are automatically calculated. There is no way to modify the values, unless the coordinates given for associated events and/or geocodes in the data matrix are modified, which is not recommended as this may change the interpretation of data in the matrix.

The coverage values are not part of the curated metadata and may also change over the time, e.g., when PANGAEA's algorithms change or are optimized to improve discoverability of datasets (e.g., the way how dataset collections are handled by the system). The coverage is shown on the dataset landing pages to allow users to get a quick overview of the geolocation in addition to the displayed map, especially when they did a geospatial search.

Collection datasets

For dataset collections, coordinates are not taken from events. The coverage of a dataset collection is calculated from the previously calculated coverage of the contained datasets. It is basically the same calculation as described before, but instead of events or data points it is using the boundaries and mean (or median) values of the contained datasets.

Visualization in the overview map

A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as yellow line as a track.
A map at a dataset landing page showing markers for Events (1: start and 2: end), as well as a yellow line marking the track.

The map in the dataset landing pages is a by-product of the coverage calculation, provided for a fast and convenient preview of the geographical coverage. It is not part of the official PANGAEA metadata. The implementation/appearance and limits - and whether it appears at all - can change at any time.

What is shown in the map?

  • Markers (red): these represent individual Events. When an event has start and end (not just a single position), a pair of markers per event is shown. They are numbered 1 and 2 for start and end. Limitation: the markers are not shown when the number of events exceeds 10000.
  • Lines (yellow): these can provide tracks for datasets where all 3 geocodes data series (Latitude, Longitude and Date/time) are present in the data table. Limitation: tracks are only shown for datasets under open access license. The lines are also not shown when the number of Lat/Lon/Date/time entries exceeds 10000. If a dataset contains 3 geocodes, but the track visualisation does not make sense in the context of the data, it can be configured out by the editor.
  • Dataset collections (e.g., publication series or bundled publications) do not display events but markers of the contained datasets. This is the same display as the PANGAEA search engine shows when you click on "show map". A "point marker" is shown for each dataset in the collection that has a single event and has no geographical extent. A "polygon marker" is shown for datasets with a non-zero geographical extent (having a bounding box), the marker is placed at the mean latitude/longitude as described above. If multiple datasets of the collection are located at the same location, a "group marker" is displayed.

What is not shown in the map?

  • If no event is present in the dataset, no map is shown.
  • A map doesn't show the position of each individual sample or measurement unless these represent individual events. For example, if samples were taken at different locations within a single event, only the position of the event is shown, even though the location changes. An exception to this is the visualization of the "tracks" as yellow line (see above).
  • If events are located in polar areas, they might not be shown in the map. The preview map doesn't allow for changing projections.