Intern:Project data management/Proposal

''The following text may be used in project proposals as part of a data management concept. Funding through the data management workpackage should include the effort for data curation, in larger projects this should be a 0.5 to 2 scientist(s) positions. It also inludes the project-specific support and operation expenses of the Pangaea core group. The data management chapter of the proposal may include
 * an estimate of the expected amount of data,
 * a list of expected variables/parameters with unit,
 * a description of the internal workflow from data generation to publication,
 * the projects commitment that project data will be archived through an established library catalog (in our case=PANGAEA.''

Proposal draft information
The data of this project will be archived, published and distributed through the data library PANGAEA - Data Publisher for Earth and Environmental Science (http://www.pangaea.de). PANGAEA is a member of the World Data System (WDS).

Data provision is in the responsibility of the partner instituts. Data ingest and publication will be performed by a data curator with support by the Pangaea data librarian and editorial group.


 * 1) Basic operation of PANGAEA is long-term committed and funded through the host institutes (AWI & MARUM). Technial operation includes hardware/software, Internet connection, maintenance, web services, and backup.
 * 2) The data curator will be funded and employed through this project. Additional project specific operation expense is depended on the requirements of the projects and can be added to the budget. Costs do not include the maintanance of web pages.
 * 3) Data management by PANGAEA includes:
 * training of the projects data curator on the Pangaea editorial system,
 * definition of all project specific parameters,
 * support during the liftetime of the project in questions related to data curation, conversion, and publication,
 * administration of password protected access,
 * provision of data citations including a DOI for each data set,
 * provision of a supplement DOI for datasets related to publications,
 * distribution of the projects data through library catalogs, portals and search engines via web services,
 * publication of a data report on request.

System description
PANGAEA is aimed at archiving, publishing and distributing georeferenced data from earth system science. Operated as a long-term archive, it can also be compared to a library, providing the infrastructure and bibliographic citation for sustainable access to the scientific results of projects. The infrastructure includes mechanisms to generate individual data compilations form the inventory giving data in PANGAEA an added value.

Host institutes of PANGAEA are the Alfred Wegener Institute (AWI) and the Center for Marine Environmental Sciences (MARUM).

A dedicated data curator will operate as a member of the project, considering the proximity to the data producers.

Data are stored in a consistent format with related meta-information in a relational database. The data exchange between project partners and the availability of products and published data to the scientific community will be established through a client/server system and web services on the Internet. Data are provided on the Internet for harvesting by portals, catalogs and search engines.

The system operates a dictionary of scientific parameter definitions. New parameters can be defined at any time. A georeference in time and space allows the extraction of individually configured subsets of data from the inventory for further processing. To exchange unpublished data during the runtime of the project, data sets may be password protected. Muratorium, publication and distribution of data is defined in the projects data policy. After the termination of the project and a final moratorium period of (to be defined by the project) x month, data will become public information. It will be made freely available to the scientific community in Open Access.

The projects web pages will be operated by the data curator and will provide a service to all project participants, containing details of the established procedures for sampling and calibration to ensure that the merged data sets are both internally consistence and externally compatible (quality assurance). The web pages will be the basis for the internal workflow. As the gateway to the data in the first instance it will provide dynamic links always serving the community with the most recent result set archived so far.