Intern:Project data management

A curator, responsible for the data management in a research project, should keep the partners informed about the work package, commitments and responsible persons. The following hints may help:


 * Keep project members informed about the system, the data flow and the availability of the data.
 * Explain possible access right definitions and password protection.
 * Explain how proper citation and persistent identification of data is assured.
 * Avoid high-end project web pages.
 * Concentrate on data collection, technical quality, complete metainformation, import and availability.
 * Define a data policy in agreement with the partners.
 * Define the goals of the project data management: availability, long-term archiving, publication, compilations ...
 * Define a dynamic project specific list of parameters with unit on the project web pages.
 * Collect any metainformation related to the project: adress of institutions, scientists with email and phone number and provide dynamic lists through the project web pages.
 * Keep track on
 * data provider, parameter, amount and date of delivery; keep contact and ensure proper data flow and collection; personal meetings might be better than email or phone,
 * activities, e.g. expeditions, cruises; import sample or station lists when expedition report is available; keep list of campaigns and events up-to-date,
 * publications; provide a dynamic list of project related publications with links to data,
 * existing data usefull to the project and import.
 * Assure consequent usage of event labels throughout the project!
 * List possible groups/PI with related data contributions on a web page and povide status of availability.
 * Give feedback to data contributors about the availability of their data; always ask for proof read.
 * Definition of the data flow should include
 * identification of existing and new data,
 * quality control and proof read,
 * formats for import,
 * information about citation and DOI.

The following text is a draft and may be used in project proposals to describe the participation of Pangaea and its role in project data management:

or
 * The data of this project will be archived, published and distributed through the World Data Center for Marine Environmental Sciences (WDC-MARE) using the information system PANGAEA (http://www.pangaea.de).
 * The information system PANGAEA (http://www.pangaea.de) will be used for the management of the project data. The system is aimed at archiving, distributing and publishing georeferenced scientific primary data related to the marine environment, to climatic variability and to the solid earth. Pangaea is a long-term operated archive comparable to a library. It can be used as a scientific tool to support the interpretation of comprehensive data collections. Technical operation is ensured by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the Center for Marine Environmental Sciences (MARUM) through a long-term commitment.

Data are stored in a consistent format with related meta-information in a relational database. The data exchange between project partners and the availability of products and published data to the scientific community will be established through a client/server system and web services on the Internet providing data in ISO and XML standard formats.

The system operates a dictionary of 50,000+ parameter from earth system research; new parameter required for the project, can be defined at any time. A full georeference in time and space allows the extraction of individually configured subsets of data from the inventory for further processing. To exchange unpublished data during the runtime of the project, data sets may be password protected. Long-term publication and distribution of data is defined in the data policy.

A dedicated data curator will operate at ???, considering the proximity of data sources, participants of the project and users in the wider scientific community. After the termination of the project and a final moratorium of ... (definition of the project) month, data will become public information, accessible through the projects web pages, webservice and access clients of Pangaea. For marine research projects the offical host is the World Data Center for Marine Environmental Sciences (http://www.wdc-mare.org).

The projects web pages will be operated by the data curator and will provide a service to all project participants, containing details of the established procedures for sampling and calibration to ensure that the merged data sets are both internally and externally compatible (quality assurance). The web pages will be the basis for the internal workflow and will include the gateway to the data.