Intern:Project data management

A curator, responsible for the data management in a research project, should keep the partners informed about the work package, commitments and responsible persons. The following hints may help:


 * Keep project members informed about the system, the data flow and the availability of the data.
 * Explain possible access right definitions and password protection.
 * Explain how proper citation and persistent identification of data is assured.
 * Avoid high-end project web pages.
 * Concentrate on data collection, technical quality, complete metainformation, import and availability.
 * Define a data policy in agreement with the partners.
 * Define the goals of the project data management: availability, long-term archiving, publication, compilations ...
 * Define a dynamic project specific list of parameters with unit on the project web pages.
 * Collect any metainformation related to the project: adress of institutions, scientists with email and phone number and provide dynamic lists through the project web pages.
 * Keep track on
 * data provider, parameter, amount and date of delivery; keep contact and ensure proper data flow and collection; personal meetings might be better than email or phone,
 * activities, e.g. expeditions, cruises; import sample or station lists when expedition report is available; keep list of campaigns and events up-to-date,
 * publications; provide a dynamic list of project related publications with links to data,
 * existing data usefull to the project and import.
 * Assure consequent usage of event labels throughout the project!
 * List possible groups/PI with related data contributions on a web page and povide status of availability.
 * Give feedback to data contributors about the availability of their data; always ask for proof read.
 * Definition of the data flow should include
 * identification of existing and new data,
 * quality control and proof read,
 * formats for import,
 * information about citation and DOI.

Proposal draft text
The following text may be used in project proposals to describe the participation of Pangaea and its role in project data management:

or
 * The data of this project will be archived, published and distributed through the World Data Center for Marine Environmental Sciences (WDC-MARE) using the information system PANGAEA (http://www.pangaea.de).
 * The information system PANGAEA (http://www.pangaea.de) will be used for the management of the project data. The system is aimed at archiving, distributing and publishing georeferenced scientific primary data related to the marine environment, to climatic variability and to the solid earth. Pangaea is a long-term operated archive comparable to a library. It can be used as a scientific tool to support the interpretation of comprehensive data collections. Technical operation is ensured by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the Center for Marine Environmental Sciences (MARUM) through a long-term commitment.

Data are stored in a consistent format with related meta-information in a relational database. The data exchange between project partners and the availability of products and published data to the scientific community will be established through a client/server system and web services on the Internet providing data in ISO and XML standard formats.

The system operates a dictionary of 50,000+ parameter from earth system research; new parameter required for the project, can be defined at any time. A full georeference in time and space allows the extraction of individually configured subsets of data from the inventory for further processing. To exchange unpublished data during the runtime of the project, data sets may be password protected. Long-term publication and distribution of data is defined in the data policy.

A dedicated data curator will operate at ???, considering the proximity of data sources, participants of the project and users in the wider scientific community. After the termination of the project and a final moratorium of ... (definition of the project) month, data will become public information, accessible through the projects web pages, webservice and access clients of Pangaea. For marine research projects the offical host is the World Data Center for Marine Environmental Sciences (http://www.wdc-mare.org).

The projects web pages will be operated by the data curator and will provide a service to all project participants, containing details of the established procedures for sampling and calibration to ensure that the merged data sets are both internally and externally compatible (quality assurance). The web pages will be the basis for the internal workflow and will include the gateway to the data.

Projects data page
(draft text for use on project web pages to inform about data management)

Motivation One of the major sustainable outcome of any research project beside publications is its data. The availability of publications is well established through libraries but the availability of data can only be achieved if (1) data are provided by the principle investigators to an established data archive/center and (2) the data center ensures easy access, widespread distribution to search engines and portals, and a full citation (in the bibliographic sense) including a persistent identification. In particular as a basis for any future scientific work a research project is urged to add its scientific primary data to an archive including a description (metadata) following technical and scientific standards.

Following the workprogram and deliverables as outlined in the contract (1) existing scientific primary data (1) of relevance to the project and (2) new data produced through the projects research should be archived. This includes data resulting from expeditions, on board as well as postcruise measurements, documentations and data related to publications.

Access This project is using the information system PANGAEA as an archive and publisher for any georeferenced data from the project study areas. PANGAEA is operated as an electronic library providing the technical potential and scientific requirements as described above and is the technical system of the data center WDC-MARE as the projects partner. As an access client for queries, the search engine PangaVista is recommended. The total number of data sets of the project archived so far is listed with the dynamic query http://www.pangaea.de/PangaVista?query=projectlabel:hermes

Data provision Following its data model in a first step the metadata for cruises and stations have to be entered into the system. Before archiving the factual data, the metadata related to the data have to be defined. In a final step, the factual data are imported and the relations between metadata fields and data are set. Data submission follows this procedure in principle.

1 cruise report with station list 2 shipborn data 3 postcruise date

Examples Data are stored georeferenced in a relational database with metainformation. Each data set exported from the system is technicaly formated and thus has a consistent format provided as a html-web page or for download as text-file. To give the user an impression about content, format and access to data in PANGAEA, example links to data sets from the OMARC cluster (the predecessor of HERMES) are provided.
 * Parameter dictionary

Contact The responsible data curators and contact for data submission are
 * Hannes Grobe, data librarian of PANGAEA at AWI/WDC-MARE - mailto:hgrobe@pangaea.de
 * Veith Huehnerbach, geologist at NOCS - mailto:vhh@noc.soton.ac.uk
 * Ingo Schewe, biologist at AWI - mailto:ischewe@awi-bremerhaven.de