Data policy

The aim of this data policy is to facilitate operation and use of the information system PANGAEA - Publishing Network for Geoscientific & Environmental Data by the research community as an Open Access archive, library and publishing system for data from earth system research. This policy recognises the benefits of providing free and open access to good quality data from earth and environmental sciences for future use in global change studies, research projects, and operational services such as portals, search engines and library catalogs. PANGAEA is operated by the Alfred Wegener Institute for Polar and Marine Research (AWI), Bremerhaven and the Center for Marine Environmental Sciences (MARUM), Bremen, Germany for the benefit of the scientific community. The operating institutes encourage the widest possible use of Pangaea as a data library, in order to best realise its potential value. ￼ Principles The guiding principle of PANGAEA is free and open access to its content by research and education communities in non commercial activities. This is in line with data policies of the IOC, the WDC System of ICSU and the OECD. For any data, provided by Pangaea, the format and description of a data set must ensure its most widespread and easiest use by the scientific community. ￼ Operation Long-term availability (>10 years) of data in PANGAEA is ensured through a commitment of the institutions AWI and MARUM, responsible for the technical quality, operation and consistency of the content. Persistent identification, data publication and widespread distribution is performed by the networking functionality and webservices on the Internet using international standards. Daily incremental backup and weekly full backup in two mirrored tape drive archives (capacity >1 PB), located in different buildings with a distance of 1 km, ensures data safety and integrity.
 * Projects, institutes as well as individual scientists are encouraged to upload any relevant data for long-term archiving.
 * Users who have downloaded data are urged to properly use the data set citation and/or quote the related reference.

The data flow is organized from the PI via the data curator of the project or institute to the archive for upload. Data availability should be monitored by the project/institutes management and reviewed by the PI. Data curators are supervised by the data librarian. Individual scientists may send data directly to the librarian for archiving.

Data provision for upload
Data archiving includes:
 * 1) Metadata(*) of expeditions, stations, samples and activities;
 * 2) Scientific primary data from (a) archives/exististing collections, (b) expeditions/monitoring, (c) publications;
 * 3) Metadata related to the primary data 2. (authors, PI, reference, method, comment...);
 * 4) Products resulting from compilations and interpretations of primary data.

Chief scientists are requested to send cruise reports including a station list to the project management office. Station labels as published in the cruise reports station list must remain the same at any time when used in data submissions or publications. The data librarian maintains a dictionary of parameter definitions with unit, to be used as the aggreed standard for all project data. Parameter are grouped into categories according to their related scientific field. Data submissions are required to use parameters and units as defined in the dictionary. New parameters are defined by the data librarian on request.

Data are archived in a relational database, georeferenced in space and time; if a data set is very large or, for certain format reasons, must have a proprietary format, it is archived as a binary object in a file system with a metadescription only, linked to the file. As soon as data become available and are validated, the providers are urged to submit the data in agreement with the import format. Any type of data must always be accompanied by a description (metadata) allowing future users to understand and process the data at any time. The granularity and format of data sets have to be defined in agreement between the principle investigator (PI) and the data librarian. The export format in principle is tab-delimited ASCII, headed by metadata fields according to ISO19115, GCMD-DIF and DublinCore standards.

Quality assurance Data submitted for archiving have to be documented properly; documentation is archived together with each dataset. The scientific quality is always in the responsibility of the PI or the authors. Fields for its documentation like quality flags for single values, adjustable precission or documentation of methods are available in the Pangaea data model. Technical quality control, i.e. completeness of metadata, consistence of formats and correctness of download is in the responsibility of the data managers. After import, the PI/authors is requested to proof read data sets on the Internet and submit corrections to the data manager.

Access and Publication The project data management provides an up-to-date list of publications with links to the related data. Any scientific primary data* related to publications shall be submitted to the data management at the same time as the manuscript is submitted to the editor. Authors will receive in return a persistent identifier (DOI, Digital Object Identifier) for each data set that can be cited in the publication. Likewise, one to many data sets can be made citable with a reference added to a public library catalog and will receive a DOI. Those data publications may also be added to personal or project publication lists.

Higher level data products* in electronic form can also be archived through Pangaea and will receive a persistent identifer and citation on request. Partner institutes and data providers aggree, that data archived in Pangaea are made public available through appropriate technical setups on the Internet (e.g. portals, search engines, library catalogs, GIS) without further notification. Unpublished data are password protected by default; password protection for published data is set on request for a moratorium period. Providers may decide to withdraw data from the archive as long as it is not published. Metadata are always freely accessible. According to EU data policy all data collected during the lifetime of the project are made public two years after the termination of the project; regulations may differ in agreement between coordinator, partners and funding organization. Following recommendations of the EU (Colour of Ocean Data Symposium, Brussels 2002), metadata are archived only in relation to available factual data. The metadata solely may be mirrored to other systems like the Global Change Master Directory (GCMD). ￼

''(*) Depending on the level of processing scientific primary (or factual) data can be differentiated between raw data, primary data and secondary data. Raw data are provided by a measuring system and are unprocessed; scientific primary data are resulting from the processing of raw data and are the basis for scientific interpretations and publications. Primary data have the highest priority for archiving; the related raw data files may be added if appropriate. Secondary data are higher level products resulting from compilations and interpretations of primary data, i.e. maps, profiles, statistics, graphics, models or any material produced for education and outreach. All information describing any of these three data types are metadata.''

This text may be used as a draft for project specific data policies. A policy for using pangaea in marine research projects may be found at.