Intern:Project data management/Proposal

The following text may be used in project proposals as part of the required data management concept by using Pangaea. Funding through the data management workpackage should include the effort for data curation, in larger projects this will be a 1/2 to 2 scientist positions. It also inludes the project-specific support and operation expenses of the Pangaea core group. The data management chapter of the proposal should include an estimate of the expected amount of data and its different variables/parameters, a description of the workflow, an agreement with those archives which will be used for data storage and a commitment that all data of the project will be archived and made available in a sustainable way (i.e. through an established library catalog).

Proposal draft information
The data of this project will be archived, published and distributed through the data library PANGAEA - Publishing Network for Geoscientific and Environmental Data (http://www.pangaea.de).

Data provision is in the responsibility of the project partners, data ingest will be performed through a data curator in cooperation with the Pangaea core group.


 * 1) The basic operation of the data library PANGAEA is long-term committed and funded through the host institutes (AWI & MARUM). This includes hardware/software, Internet connection, maintenance, web services, and backup.
 * 2) The data curator will be funded and employed through the project.
 * 3) The project specific operation expense of the Pangaea core group should be calculated in the amount of 1 % of the projects budget. Due to its dependence on the amount and complexity of the projects data output details are a matter of negotiation. (A project specific expense is not required for proposals of the host institutes without external partners). Costs do not include the maintanance of web pages. Support includes:
 * training of the curator on the editorial system, data workflow and ingest,
 * definition of new project specific parameters,
 * support during the liftetime of the project in any questions related to data curation,
 * support in the administration of passwords,
 * provision of data citations with DOI for each data set (for more than 5000 data sets additional DOI cost should be taken into account),
 * distribution of the projects data through library catalogs, portals and search engines via web services,
 * publication of a data report on request.

System description
The system is aimed at archiving, distributing and publishing georeferenced data from earth system sciences. Pangaea is a long-term operated archive comparable to a library. It can be used as a scientific tool to support the interpretation of comprehensive data collections. Technical operation is ensured by the host institutes Alfred Wegener Institute for Polar and Marine Research (AWI) and the Center for Marine Environmental Sciences (MARUM) through a long-term commitment.

Data are stored in a consistent format with related meta-information in a relational database. The data exchange between project partners and the availability of products and published data to the scientific community will be established through a client/server system and web services on the Internet. Data are provided in various standard formats for harvesting by portals, catalogs and search engines.

The system operates a dictionary of scientific parameter definitions. New parameters can be defined at any time. A georeference in time and space allows the extraction of individually configured subsets of data from the inventory for further processing. To exchange unpublished data during the runtime of the project, data sets may be password protected. Muratorium, publication and distribution of data is defined in the projects data policy.

A dedicated data curator as a member of the project will operate at the data center or the office of the coordinator, considering the proximity of the data producers. After the termination of the project and a final moratorium of (defined by the project) x month, data will become public information, accessible via library catalogs, search engines and portals.

The projects web pages will be operated by the data curator and will provide a service to all project participants, containing details of the established procedures for sampling and calibration to ensure that the merged data sets are both internally consistence and externally compatible (quality assurance). The web pages will be the basis for the internal workflow. As the gateway to the data in the first instance it will provide dynamic links always serving the community with the most recent result set archived so far.