Data Usage Statistics

From PANGAEA Wiki
Jump to navigation Jump to search

On the webpages of published datasets, PANGAEA offers its users an aggregated statistical evaluation of past interactions with related content. The aggregations are generated under rigorous data protection considerations in accordance with the European Data Protection Regulation (GDPR), and do not include any personal data in the process, thus strictly preserving the anonymity of our users. See our privacy policy for more details.

The usage statistics of datasets are displayed as a small widget at the top of each (landing) page, just below the dataset's citation and next to the download options for citations in various formats and the (optional) "Altmetrics Usage" widget.

Although these statistical evaluations are inspired by the services of https://www.projectcounter.org/, they do not offer the full range of functionality and are therefore not directly comparable. PANGAEA tries to follow those guidelines as closely as possible.

The provided statistical surveys include the following parameters (where applicable):

  1. “Views” represent visits to dataset webpages by individuals (users)
  2. “Data Views” are counts of user accesses to the visualization of tabular data that have been ingested into the PANGAEA database system
  3. "Downloads" accordingly correspond to the downloadable representations of these datasets from the PANGAEA system

Below we provide a brief summary of the most important key aspects about the functionality and constraints of these user statistics, which are described in more detail in the subsequent sections.

Summary:

  • no usage records exist earlier than April 2012
  • for technical reasons, only manual, browser-based interactions are included in the statistical analyses of data usage
  • any kind of machine-to-machine interaction with PANGAEA content (e.g. script-based access) is thus explicitly excluded from the analysis
  • only access to tabular data ingested into the PANGAEA database system is counted
  • any interactions with binary file formats deposited on PANGAEA or linked resources (e.g. file storage systems) are not part of the aggregations due to technical restrictions
  • statistical aggregations are calculated once per day
  • the statistical evaluations are fully anonymized and obey European data protection regulations

What is displayed in the “usage statistics”?

Example of usage statistics (https://doi.org/10.1594/PANGAEA.926582)

The data usage statistics provided for individual datasets published on PANGAEA comprise “Views”, “Data Views” and “Downloads”.

“Views" correspond to the number of (human) visits to the landing pages of the dataset on which the corresponding metadata is presented. Visits are recorded for each individual browser identity, and recurring views and downloads from the same browser identity (page refreshes, repeated access) are not taken into account accordingly.

“Data Views” are the aggregated counts of users that displayed the data matrix in their browser (i.e., made use of the “View dataset as HTML” feature), provided it exists for the given dataset (see remarks on limitations below).

“Downloads” refer to the counts of browser-based download attempts of datasets by users since its creation. This is confined to data that was ingested to PANGAEA’s relational database system (“Download dataset as tab-delimited text” or “Download ZIP file containing all datasets as tab-delimited text” for bundled publications or publication series).

Graphical representation:

Clicking on the widget presents users with a bar chart of cumulative usage over the life of the dataset from the first detected click (later than April 2012) to the end of the last month. The time span covered by each bar is also based on this lifetime, and dynamically takes up values of 1, 2, 3, 4, 6 and 12 months, respectively.

If no or not enough data is available for the aggregations yet, because the final release of the dataset of interest is only a few days old for example, the widget and chart may not be displayed at all or may not contain all statistical parameters. Displayed information is updated once per day.

What are its limitations?

Since the statistics for records published before April 2012 are incomplete and therefore of very limited use, we have decided not to provide usage statistics before this date.

With regard to "Data Views" and "Downloads", there is also the technical restriction that data usage can only be statistically captured when a user clicks on the download link. Therefore, access to other file deposits (e.g. NetCDF, media files and any other form of binary data) both on PANGAEA file systems and linked resources cannot be included in the statistics. In addition, users visiting the PANGAEA website with scripting disabled, privacy add-ons enabled, or those that have opted out from Matomo Analytics cannot be counted. And finally, it cannot be distinguished between successful and unsuccessful downloads of data (failed download, user login failed etc.)

Disclaimer

The usage statistics presented on dataset landing pages are a service of PANGAEA to its users. This service is provided as is, and although prepared according to scientific principles and with the greatest possible care, PANGAEA assumes no liability for the completeness and accuracy of the statistics presented.

Please understand that we do not answer inquiries about individual statistics of datasets, and do not want to get involved in discussions about the validity and meaningfulness of one or the other of these values. We therefore kindly ask you to refrain from such inquiries.