Intern:NKGCF

Text draft NKGCF brochure Global Change Research in Germany 2011 (chapter Data Centre and Data Availability)

Introduction
Observations and measurements are the life-blood of Global Change research, the resulting data sets are the basis for scientific publications. If the underlaying primary data are accessible, the findings can easily be verified by reviewer and reader. The availability of the content is one point, its form is the other important prerequisite for a proper future use of the content. In particular questions with a focus on the change of the earth system require a global view and consequently global data sets. Many findings are fractionated through the individual research of scientists and projects. An immense added-value could be given to those distributed data, if it would not only be available but even in standard formats.

Besides the curiosity how the earth works, a not to be neglected driving force in science is the credit a scientist gains for his/her publications. Consequently it should be wise to include the archiving of data into the well established workflow for scientific publications and make data citable. Also the infrastructure for the search and distribution of publications is established, i.e. the global system of libraries. Adding data citations to the library catalogs and search engines would complete the assimilation of data into the publication workflow and infrastructure. As an outcome of a national funded DFG-project, a reliable system for sustainable archiving and citation of data was established and the data registration agency DataCite was founded.

Libraries exist since 5000 years, data centers since 50 years. Since the invention of the World Data Center system during the International Geophysical Year (1957/58) the process of data archiving has suffered from a constantly changing technology, in particular storage media and formats. Many valuable data are lost for ever on degrading tapes or on broken discs without backup. During the last years this childhood of electronic data handling is substituted by a technology avoiding data loss by migration and providing capacities in the petabyte range. The evolution of computer technology will continue but now science has the oportunity to assure long-term access of its results in the perspective of a librarian. The Internet as a connecting infrastructure allows the interlinking of archives distributed around the globe and thus the provision of even science specific harmonized data views through common portals.

A problem often stressed in data discussions is the increasing amount of data. E.g. a satellite like the new CryoSat, launched to observe the behaviour of the polar ice, will produce 50 GB per day). The solution for this problem is highly linear: a high amount of data requires a high amount of storage space. A problem rarely mentioned in this context is the variety of measurements in all parts of the geosphere (atmo-, hydro-, kryo-, bio- lithosphere) and thus the complexity of the resulting data. Leaving out anything related to biodiversity (i.e. species distribution) will still remain some ten thousands of variables as measured by all disciplines of the natural sciences. Smart data models are required to handle at least parts of this fine granular and highly diverse data huddle.

Throug the Internet technology it is now possible to store any thing any where at any time - which is far from beeing a sustainable infrastructure in the sense of a librarian. The error 404 (file not found) problem was realized by the commercial publishers already a few years after the opening of the Internet and the system of persistent identifiers was invented.

Data centres

 * WDCC


 * WDC-RSAT


 * GFZ/ICDP


 * PANGAEA

The content of German data centers also contributes to international portals, e.g. for carbon (www.carboocean.org), biodiversity (GBIF, OBIS), IODP (sedis), or to the major portal of the WDS which is still in its intital phase.

Availability
The availability of data from Global Change research needs urgent improvement. To support the flow of data into archives most important are (1) the reliability of access and an (2) assured credit for the data producer/provider. A national DFG-funded project has substantialy contributed to these points. The project has focused on the citation of data sets citable and has establish a reliable access persistient identification. DataCite as the international registration agency for data DOI in 2010 has registered 100,000 data sets as an initial contribution and test for workflow, technical implementation and distribution mechanisms. Several data centers contributed with its content under the leadership of the National Library for Science and Technology (TIB).

In Germany data DOI are distributed through the TIB (National Library for Science and Technology) as a partner of DataCite which is the international registration agency and a member of the IDF (International DOI Foundation). Centers have a cooperation contract with the TIB which also assures the distribution of the data citation through library catalogs.

Data citations are distributed through library catalogs, search engines as well as generic and science specific portals.

ESSD


Part of the problem of insufficient availability of data is the missing credit for the data provider. As one solution in 2009 through a German initiative the first data publication journal (Earth System Science Data) was established by the Open Access publisher Copernicus (http://www.earth-system-science-data.net). The journal ESSD improves the availability of data by its integration into the established scientific publication process. Papers describing data, methods and quality are peer-reviewed and will be listed in the Science Citation Index; the provision of a persistent identifier (e.g. DOI) as a pointer to the full data set is mandatory. The first publication made eight year time series of Ozone measurements on the Antarctic station of the former GDR (German Democratic Republic) available.

Publication workflow
Centers in Germany archiving data related to Global Change provide its content through an established workflow in citable entities. An author who wants to archive a supplement related to a publication or making results available to the scientific community, e.g. within a research project, is integrated into an editorial workflow similar to a scientific journal. Documented through an issue tracking system the data publication process consists of the following steps: (1) submission (author), (2) check for consistency and completness of metadata, archiving (editor), (3) proof-read (author), (4) corrections (editor), (5) peer-review for supplement to a publication (if applicable), (6) publication as citable entity with DOI as persistent identifier. Data supplements can automatically be linked to the publications splash page of the journal/publisher through an automatic web service (e.g. ).

box ICSU World Data System
As a soncequence of the International Geophysical Year of 1957-1958 (IGY) the International Council for Science (ICSU) established the World Data Center system to serve data from the IGY scientific disciplines. Until the end of the WDC system in 2008 more than 50 centers with a global distribution were established covering most disciplines of Global Change research. With the invention of the Internet, many WDC made its data online accessible. Since 2001 Germany has contributed with three centers: World Data Center for Marine Environmental Sciences (WDC-MARE), World Data Center for Climate (WDCC), and World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT).

In 2009 a new World Data System (WDS) has been created through a decision of the 29th General Assembly of the International Council for Science (ICSU); WDS will substitute the WDC. Over 100 WDCs, data centers and institutes have expressed interest in becoming part of the new system. The WDS concept aims at a transition from existing stand-alone WDCs and individual services to a common globally interoperable distributed data system, that incorporates emerging technologies and new scientific data activities. A prototype data portal which is being considered as a proof of concept for an element of the new system. WDS will enjoy a broader disciplinary and geographic base than previous ICSU bodies and will strive to become a world wide 'community of excellence' for scientific data. Any organization producing or holding data is encouraged to join the new WDS (http://icsu-wds.org).

Links

 * examples: wdc-paleo, pangaea, wdcc
 * http://visibleearth.nasa.gov/
 * http://earthobservatory.nasa.gov
 * http://science.gsfc.nasa.gov
 * WMO
 * GEOSS
 * Global Change Research in Germany 2008