Intern:Project data management

A curator, responsible for the data management in a research project, should keep the partners informed about the content of the work package, data flow, commitments and responsible persons. The following hints may help:


 * Keep project members informed about the system, the data flow and the availability of the data.
 * Explain possible access right definitions and password protection.
 * Explain how proper citation and persistent identification of data is assured.
 * Avoid high-end project web pages.
 * Concentrate on data collection, technical quality, complete metainformation, import and availability.
 * Define a data policy in agreement with the partners.
 * Define the goals of the project data management: availability, long-term archiving, publication, compilations ...
 * Define a dynamic project specific list of parameters with unit on the project web pages.
 * Collect any metainformation related to the project: adress of institutions, scientists with email and phone number and provide dynamic lists through the project web pages.
 * Keep track on
 * data provider, parameter, amount and date of delivery; keep contact and ensure proper data flow and collection; personal meetings might be better than email or phone,
 * activities, e.g. expeditions, cruises; import sample or station lists when expedition report is available; keep list of campaigns and events up-to-date,
 * publications; provide a dynamic list of project related publications with links to data,
 * existing data usefull to the project and import.
 * Assure consequent usage of event labels throughout the project!
 * List possible groups/PI with related data contributions on a web page and povide status of availability.
 * Give feedback to data contributors about the availability of their data; always ask for proof read.
 * Definition of the data flow should include
 * identification of existing and new data,
 * quality control and proof read,
 * formats for import,
 * information about citation and DOI.

Proposal draft text
The following text may be used in project proposals to describe the participation of Pangaea and its role in project data management:

or
 * The data of this project will be archived, published and distributed through the World Data Center for Marine Environmental Sciences (WDC-MARE) using the information system PANGAEA (http://www.pangaea.de).
 * The information system PANGAEA (http://www.pangaea.de) will be used for the management of the project data. The system is aimed at archiving, distributing and publishing georeferenced scientific primary data related to the marine environment, to climatic variability and to the solid earth. Pangaea is a long-term operated archive comparable to a library. It can be used as a scientific tool to support the interpretation of comprehensive data collections. Technical operation is ensured by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the Center for Marine Environmental Sciences (MARUM) through a long-term commitment.

Data are stored in a consistent format with related meta-information in a relational database. The data exchange between project partners and the availability of products and published data to the scientific community will be established through a client/server system and web services on the Internet providing data in ISO and XML standard formats.

The system operates a dictionary of 50,000+ parameter from earth system research; new parameter required for the project, can be defined at any time. A full georeference in time and space allows the extraction of individually configured subsets of data from the inventory for further processing. To exchange unpublished data during the runtime of the project, data sets may be password protected. Long-term publication and distribution of data is defined in the data policy.

A dedicated data curator will operate at ???, considering the proximity of data sources, participants of the project and users in the wider scientific community. After the termination of the project and a final moratorium of ... (definition of the project) month, data will become public information, accessible through the projects web pages, webservice and access clients of Pangaea. For marine research projects the offical host is the World Data Center for Marine Environmental Sciences (http://www.wdc-mare.org).

The projects web pages will be operated by the data curator and will provide a service to all project participants, containing details of the established procedures for sampling and calibration to ensure that the merged data sets are both internally and externally compatible (quality assurance). The web pages will be the basis for the internal workflow and will include the gateway to the data.

Projects data page draft text
Motivation Beside publications data is one of the major outcome of any research project which must be archived in a sustainable way. The availability of publications is well established through libraries but the reliable availability of data can only be archieved if Also as a basis for any future scientific work, a research project is urged to add its scientific primary data to an archive including a description (metadata) following technical and scientific standards.
 * 1) the data are provided by the principle investigators to an established data archive/center and
 * 2) the data center ensures long-term archiving, easy access, widespread distribution and a bibliographic citation.

Following the workprogram and deliverables as outlined in the contract This includes data resulting from expeditions, on board as well as postcruise measurements, documentations and data related to publications.
 * 1) existing scientific primary data of relevance to the project and
 * 2) new data produced through the projects research should be archived.

Data provision Following the Pangaea data model
 * 1) The metadata for cruises and stations have to be entered into the system.
 * 2) Cruise report with station list is submitted to the PO and processed by the data management
 * 3) Before archiving the factual data, the metadata related to the data have to be defined.
 * 4) The factual data are imported and the relations between metadata and data are set.

Data submission should follow this procedure in principle. Data sets can be password protected on request.
 * 1) Cruise report with station list is submitted to the PO and processed by the data management
 * 2) Shipborn data are quality controled and submitted to the data management
 * 3) Postcruise data dito
 * 4) Data related to publication should be archived latest, when a paper is submitted

Access This project is using the information system PANGAEA as an archive, publisher and library, providing the technical potential and scientific requirements as described above. It is the technical system of the ICSU World Data Center WDC-MARE, hosted by AWI and MARUM as the projects partner. As an access client for individual queries on predefined data sets, the search engine PangaVista is recommended. The total number of data sets of HERMES archived so far is listed by the query
 * http://www.pangaea.de/PangaVista?query=projectlabel:hermes

Examples Data are stored with metainformation georeferenced in a relational database. Each data set is formated by the system and thus has a consistent and well defined format. Data sets are0 provided as a html-web page and for download as text-file. To give an impression about content, format and access to data in PANGAEA, examples from some OMARC cluster projects (predecessor of HERMES) are provided. Data are from various sources and research discipline and have been processed as part of the HERMES deliverable Banking of existing data (D26). The file format is described in chapter


 * ACES - d15N isotope ratios measured on deep water corals
 * ASSEMBLAGE - description of a sediment core as foto and graphic in pdf format
 * ECOMOUND - reflection seismic profile archived as a georeferened cruise track with links to sgy-files
 * EURODELTA - annual mean river discharge from 1980-2000
 * EUROSTRATAFORM - ocanographic profile of a CTD station
 * GEOMOUND - terrain model of the Porcupine Seabight in different formats
 * METROL - geochemistry of a sediment profile
 * PROMESS - downhole logging from a drill site  (password protected)
 * STRATAGEM - linking an atlas

Contact The responsible data curators and contact for data submission are
 * Hannes Grobe, data librarian of PANGAEA at AWI/WDC-MARE - mailto:hgrobe@pangaea.de
 * Veith Huehnerbach, geologist at NOCS - mailto:vhh@noc.soton.ac.uk
 * Ingo Schewe, biologist at AWI - mailto:ischewe@awi-bremerhaven.de