Intern:NKGCF

Contribution to NKGCF brochure Global Change Research in Germany 2011

Text draft of chapter:

Data Centre and Availability
Observations, measurements and models are the lifeblood of Global Change research, the resulting data sets are the basis for any scientific publication. The importance of data availability and archiving was recently stressed in special issues of leading scientific journals (a.o. Science ; Nature ). Only if the primary data of publications are accessible, the findings can be verified by reviewer and reader. While data availability is one point, its form is the other important prerequisite for a proper future use of the content. In particular questions with a focus on changes of the earth requires global data sets. At this time still most findings are fractionated through the individual research of scientists and projects. An immense added-value could be given to those distributed data, if it would not only be available in Open Access but even in harmonized, machine readable form, allowing exchange, compilation and computation.

Preservation of and access to printed publications is the task of libraries assuring legally obligated longterm archiving. The invention of the Internet has added new capabilities through publisher catalogs, portals and search engines with direct-access to publications in digital form independent of its locality. But the ability to store and distribute any digital object anywhere at anytime is far from beeing a sustainable infrastructure neither in an archivists view. The error 404 ("file not found") as a result of a dynamic network structure and an ever changing technology was realized by the commercial publishers already a few years after the Internet was established. The system of persistent identifiers was invented to rely on stable linking mechanisms. In 2011 more than 45 Million Digital Object Identifier (DOI) are registered through a system operated by the International DOI Foundation (IDF) for journal articles - with an urgent need to include data into this system. Through a successfull project of the German Research Foundation (DFG), a system for sustainable data archiving, citation and persistent identification by using the DOI was established. Several data centers contributed with its content under the leadership of the National Library for Science and Technology (TIB, Hannover). In 2009 the international registration agency DataCite was founded.

box: DataCite is an international association that aims to support researchers by enabling them to locate, identify, and cite research datasets with confidence. DataCite was launched on December 1st 2009 in London. In 2011, DataCite has 15 members from 10 countries, including the TIB as initiator; 1 Million data sets are already registered with a DOI. DataCite has a global leadership in promoting the use of persistent identifiers for datasets as integral part of the data citation. Through its members, it establishes and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. As science operates globally with individual researchers working and publishing, DataCite is global with local partners offering services. Further organisations are encouraged to join the association.
 * http://www.datacite.org

Libraries exist since 5000 years, data centers since 50 years. During those decades data archiving has suffered from constantly changing storage media and formats. Many valuable data got lost on degrading tapes or on broken discs without backup; in many disciplines data archiving was neglected anyway. During the last years this childhood of electronic data handling is substituted by a technology avoiding data loss by migration and providing capacities in the petabyte range. Now science has a real opportunity to assure the availability of results in an archival perspective. The interlinking of archives distributed around the globe now even allows the provision of combined data views through science specific portals.

During the International Geophysical Year 1957/58 (IGY) the International Council for Science (ICSU) established the World Data Center (WDC) system to serve data from geosciences. Until its major revision in 2008 more than 50 WDC with a global distribution were established covering important disciplines of Global Change research. By means of the Internet, many WDC made its data online accessible. Since 2001 Germany has contributed with three centers: World Data Center for Climate (WDCC), the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT), and the World Data Center for Marine Environmental Sciences (WDC-MARE), all together forming the German WDC-Cluster.

Through a decision of the 29th General Assembly of ICSU, WDC will be transferred to a new World Data System (WDS) until 2012. The WDS concept aims at a transition to a common, globally interoperable data system and will strive to become a world wide "community of excellence" for scientific data. A prototype data portal which is being considered to be a common entry point for the various data sources is operated in Germany. Any organization holding data is encouraged to join the new WDS. Germany will contribute with the German WDC-cluster and the data publisher PANGAEA.
 * http://icsu-wds.org

box: The German WDC-cluster for Global Change data

The World Data Center for Climate (WDCC) offers data management consulting for climate models over the life time of the data. A variety of metadata standards can be handled with catalogues - an indispensable precondition for distributed and federated archives. Part of the archiving workflow is a quality management. In 2010 the amount of on-line accessible model data has grown to >400 TB. As a Data Collection or Production Centre (DCPC) the WDCC is part of the World Meteorological Organization's information system. As part of the IPCC Assessment Reports, WDCC is one of three data notes for climate model data collection in cooperation with the Program for Climate Model Diagnosis and Intercomparison (PCMDI, USA) and the British Atmospheric Data Centre (BADC, UK).
 * http://www.wdc-climate.de



The German Research Center for Geosciences (GFZ) investigates "System Earth" at global scale with geological, physical, chemical and biological processes occurring at its surface and in its interior. The resulting data are unique in their scientific profile as they encompass the entire planet from global field models down to individual samples from scientific drilling operations. Most data held at GFZ are accessible through a central portal and through discipline specific portals, such as the World Stress Map (WSM), Satellite Data Centre (ISDC), Scientific Drilling Database (SDDB), and the GEOFON Seismological Network. Datasets are citeable including a DOI. The portal and the publication of data are offered as a joint service of the GFZ Centre for Geoinformation Technology (CeGIT) and the GFZ Library Albert Einstein.
 * http://www.gfz-potsdam.de:80/portal/gfz/Services/Forschungsdaten

The World Data Center for Marine Environmental Sciences (WDC-MARE) is aimed at collecting, scrutinizing, and disseminating data related to earth system research in all fields of marine sciences. The center serves a data publication series called WDC-MARE Reports. Projects wich have used WDC-MARE for data curation have a focus on environmental, geological, biological, physical and chemical oceanography.
 * http://www.wdc-mare.org

Since 2003 the German Aerospace Center (DLR) hosts the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT). The center offers a growing collection of atmosphere-related satellite-based data sets, information products and services. Focus is on atmospheric trace gases, aerosols, dynamics, radiation, and cloud physical parameters with complementary information. This is achieved either by giving Open Access to data stored at the center or by acting as a portal linking to external sources. WDC-RSAT is a member of the WMO-WDC group and serves as a management platform for the Network for the Detection of Mesopause Change (NDMC).
 * http://wdc.dlr.de

The data volume with its demanding increase of storage capacities is the evident factor stressed in any data discussion. (A satellite like CryoSat, recently launched to observe the behaviour of the polar ice, will produce 50 GB of raw data per day.) An other point, even more a challenge but rarely mentioned in this context, is the variety of measurements in all parts of the geosphere. Repositories are faced with a complex bunch of data sets with some ten thousands of variables as measured by all disciplines of Global Change related sciences. Smart data models are required to handle this fine granular and highly diverse data huddle. A strictly normalized and abstracted model is required as established through the data library PANGAEA.

box: PANGAEA® - Data Publisher for Earth & Life Science is a unique repository to store and publish data from earth system research. Data publications or supplements related to articles are entered through an editorial system, stored in a relational database and distributed by web services. Data can be georeferenced in time and space, part of the meta-description is a bibliographic citation with DOI. Data are available in Open Access under the terms of Creative Commons licenses. The host institutes AWI and MARUM provide Pangaea as an infrastructure to the international scientific community, open to any institute, project or scientist, e.g. Pangaea is central archive of the Baseline Surface Radiation Network (BSRN).
 * http://www.pangaea.de



Centers in Germany with data related to Global Change archive its content through an established workflow in citable entities. An author who wants to archive a publications data supplement or to make results available to the scientific community, e.g. within a research project, becomes part of an editorial workflow similar to a scientific journal: (1) submission, (2) consistency, completness check (3) archiving, (3) proof-read, (4) corrections, (5) peer-review (i.a.), (6) publication as citable entity with DOI. Efforts are made to link supplements to the journals catalog (e.g. ).

Besides the curiosity to know how the earth works, a not to be neglected driving force of science is the credit scientists gains for their work. One obstacle for the insufficient availability of data is this still missing credit for data providers. Consequently data archiving needs to become integral part of the established publication environment for scientific papers, most important appropriate citable and with reliable access.

To improve the situation scientists from Germany and the UK initiated the journal Earth System Science Data (ESSD) as the first journal solely aimed at the publication of data. ESSD is online since 2009 from the Open Access publisher Copernicus and will improve the availability of data by its integration into the publication process. Papers describing data, methods and quality are peer-reviewed and will be listed in the Science Citation Index. The provision of a persistent identifier (e.g. DOI) as data link is mandatory. The first publication made available an eight year time series of ozone profiles from the Antarctic station of the former GDR (German Democratic Republic).
 * http://www.earth-system-science-data.net

just for info

 * Global Change Research in Germany 2008