Data policy

Data policy for using the information system PANGAEA as Open Access archive, data library and publishing system operated by Alfred Wegener Institute for Polar and Marine Research (AWI), Bremerhaven & Center for Marine Environmental Sciences (MARUM), Bremen, Germany

The aim of this policy is to facilitate operation and use of the information system PANGAEA - Publishing Network for Geoscientific & Environmental Data by the research community. This policy recognises the benefits of providing free and open access to good quality data from earth and environmental sciences for future use in global change studies, research projects, and operational services such as portals, search engines and library catalogs. The operating institutes encourage the widest possible use of the Pangaea library, in order to best realise its potential value. ￼ Principles The guiding principle of the PANGAEA - Publishing Network for Geoscientific & Environmental Data is free and open access to its content by research and education communities in non commercial activities. This is in line with data policies of the IOC, the WDC System and the OECD. For any data, provided by Pangaea, the format and content of a data set must ensure its most widespread and easiest use by the scientific community. Pangaea is open to any scientist for data archiving. Other projects and investigators are encouraged to integrate any relevant data to the Pangaea library. Users of data from Pangaea are urged to properly use the data set citation and/or quote the related reference. ￼ Data provision Data archiving includes:
 * 1) Metadata(*) of expeditions, stations, samples and activities;
 * 2) Scientific primary data from (a) archives/exististing collections, (b) expeditions/monitoring, (c) publications;
 * 3) Metadata related to the primary data 2. (authors, PI, reference, method, comment...);
 * 4) Products resulting from compilations and interpretations of primary data.

Chief scientists are requested to send cruise reports including a station list to the project management office. Station labels as published in the cruise reports station list must remain the same at any time when used in data submissions or publications. The data librarian maintains a dictionary of parameter definitions with unit, to be used as the aggreed standard for all project data. Parameter are grouped into categories according to their related scientific field. Data submissions are required to use parameters and units as defined in the dictionary. New parameters are defined by the data librarian on request.

Data are archived in a relational database, georeferenced in space and time; if a data set is very large or, for certain format reasons, must have a proprietary format, it is archived as a binary object in a file system with a metadescription only, linked to the file. As soon as data become available and are validated, the providers are urged to submit the data in agreement with the import format. Any type of data must always be accompanied by a description (metadata) allowing future users to understand and process the data at any time. The granularity and format of data sets have to be defined in agreement between the principle investigator (PI) and the data librarian. The export format in principle is tab-delimited ASCII, headed by metadata fields according to ISO19115, GCMD-DIF and DublinCore standards.

Quality assurance Data submitted for archiving have to be documented properly; documentation is archived together with each dataset. The scientific quality is always in the responsibility of the PI or the authors. Fields for its documentation like quality flags for single values, adjustable precission or documentation of methods are available in the Pangaea data model. Technical quality control, i.e. completeness of metadata, consistence of formats and correctness of download is in the responsibility of the data managers. After import, the PI/authors is requested to proof read data sets on the Internet and submit corrections to the data manager.

Access and Publication The project data management provides an up-to-date list of publications with links to the related data. Any scientific primary data* related to publications shall be submitted to the data management at the same time as the manuscript is submitted to the editor. Authors will receive in return a persistent identifier (DOI, Digital Object Identifier) for each data set that can be cited in the publication. Likewise, one to many data sets can be made citable with a reference added to a public library catalog and will receive a DOI. Those data publications may also be added to personal or project publication lists.

Higher level data products* in electronic form can also be archived through Pangaea and will receive a persistent identifer and citation on request. Partner institutes and data providers aggree, that data archived in Pangaea are made public available through appropriate technical setups on the Internet (e.g. portals, search engines, library catalogs, GIS) without further notification. Unpublished data are password protected by default; password protection for published data is set on request for a moratorium period. Providers may decide to withdraw data from the archive as long as it is not published. Metadata are always freely accessible. According to EU data policy all data collected during the lifetime of the project are made public two years after the termination of the project; regulations may differ in agreement between coordinator, partners and funding organization. Following recommendations of the EU (Colour of Ocean Data Symposium, Brussels 2002), metadata are archived only in relation to available factual data. The metadata solely may be mirrored to other systems like the Global Change Master Directory (GCMD). ￼ Operation Long-term availability of data is ensured by the institutions AWI and MARUM, responsible for the technical operation and the consistency of the content. The Backup of the data inventory is in the responsibility of the computer center of the AWI with daily incremental backup and weekly full backup in two mirrored tape drive archives, located in different buildings. Data flow is organized from the workpackages via the data managers of the project to the archiving facility, monitored by the project management and supervised by the data librarian. Persistent identification, data publication and widespread distribution is performed by the networking functionality and webservices of Pangaea.

''(*) Depending on the level of processing scientific primary (or factual) data can be differentiated between raw data, primary data and secondary data. Raw data are provided by a measuring system and are unprocessed; scientific primary data are resulting from the processing of raw data and are the basis for scientific interpretations and publications. Primary data have the highest priority for archiving; the related raw data files may be added if appropriate. Secondary data are higher level products resulting from compilations and interpretations of primary data, i.e. maps, profiles, statistics, graphics, models or any material produced for education and outreach. All information describing any of these three data types are metadata.''

This text may be used as a draft for project specific data policies. A policy for using pangaea in marine research projects may be found at.