Intern:NKGCF

Text draft NKGCF brochure Global Change Research in Germany 2011 (chapter Data Centre and Data Availability)

Observations, measurements and models are the lifeblood of Global Change research, the resulting data sets are the basis for any scientific publication. The importance of data availability and archiving was recently stressed in special issues of leading scientific journals (Science ; Nature ). Only if the primary data of publications are accessible, the findings can be verified by reviewer and reader. While data availability is one point, its form is the other important prerequisite for a proper future use of the content. In particular questions with a focus on changes of the earth requires global data sets. In principle most findings are fractionated through the individual research of scientists and projects. An immense added-value could be given to those distributed data, if it would not only be available but even in harmonized machine readable form, allowing exchange, compilation and computation.

Preservation of and access to printed publications is the task of libraries assuring legally obligated longterm archiving. The invention of the Internet has added new capabilities through publisher catalogs, portals and search engines with direct-access to publications in digital form independent of its locality. But the storage and distribution of any digital object anywhere at anytime is far from beeing a sustainable infrastructure neither in a librarians nor in an arivists view. The error 404 (file not found) as a result of a higly dynamic network structure and an ever changing technology was realized by the commercial publishers already a few years after the Internet was established. The system of persistent identifiers was invented to establish stable linking mechanisms. In 2011 more than 45 Million Digital Object Identifier (DOI) are registered through a system operated by the International DOI Foundation (IDF) for journal articles - with an urgent need to include data into this system. Through a successfull project of the German Research Foundation (DFG), a system for sustainable data archiving, citation and persistent identification by using DOI was established. Several data centers contributed with its content under the leadership of the National Library for Science and Technology (TIB, Hannover). In 2009 the registration agency DataCite was founded as an international organisation.

box: DataCite is an international association that aims to support researchers by enabling them to locate, identify, and cite research datasets with confidence. DataCite was launched on December 1st 2009 in London. In 2011, DataCite has 15 members from 10 countries, including the TIB as initiator and 100 000 data sets are already registered with a DOI. DataCite has a global leadership in promoting the use of persistent identifiers for datasets as integral part of the data citation. Through its members, it establishes and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. As science operates globally with individual researchers working and publishing, DataCite is global with local partners offering services and advice where required. Further organisations are encouraged to join the association.
 * http://www.datacite.org

Libraries exist since 5000 years, data centers since 50 years. During those decades data archiving has suffered from constantly changing storage media and formats. Many valuable data got lost on degrading tapes or on broken discs without backup; in many disciplines data archiving was neglected anyway. During the last years this childhood of electronic data handling is substituted by a technology avoiding data loss by migration and providing capacities in the petabyte range. Now science has a real opportunity to assure the availability of results in a librarians long-term perspective. In addition the Internet provides the network to interlink archives distributed around the globe and thus even allows the provision of combined data views through science specific portals.

During the International Geophysical Year 1957/58 (IGY) the International Council for Science (ICSU) established the World Data Center (WDC) system to serve data from geosciences. Until its major revision in 2008 more than 50 WDC with a global distribution were established covering important disciplines of Global Change research. With the invention of the Internet, many WDC made its data online accessible. Since 2001 Germany has contributed with three centers: World Data Center for Climate (WDCC), the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT), and the World Data Center for Marine Environmental Sciences (WDC-MARE), all together forming the German WDC-Cluster.

Through a decision of the 29th General Assembly of ICSU the WDC system will be transfered to a new World Data System (WDS) in 2009 to 2011. The WDS concept aims at a transition from individual WDCs to a common globally interoperable distributed data system and will strive to become a world wide 'community of excellence' for scientific data. A prototype data portal which is being considered to be a common entry point for the various data sources is operated in Germany. Any organization holding data is encouraged to join the new WDS. Germany will contribute with the German WDC-cluster and the data publisher PANGAEA.
 * http://icsu-wds.org

box: The German WDC-cluster for Global Change data

The World Data Centre for Climate (WDCC) offers data management consulting for climate models over the whole life time of the data. With several catalogues a variety of metadata standards can be handled - an indispensable precondition for distributed and federated archives. Part of the archiving workflow is a detailed quality management. In 2010 the amount of on-line accessible model data has grown to more than 400 TB. As a Data Collection and Production Centre the WDCC is also part of the World Meteorological Organization's information system. As part of the IPCC Assessment Reports, WDCC is one of three data notes for climate model data collection in cooperation with the Program for Climate Model Diagnosis and Intercomparison (PCMDI, USA) and the British Atmospheric Data Centre (BADC, UK).
 * http://www.wdc-climate.de



The German Research Centre for Geosciences (GFZ) investigates "System Earth" at locations all over the world with geological, physical, chemical and biological processes occurring at its surface and in its interior. The resulting data are unique in their scientific profile as they encompass the entire planet from global field models down to individual samples from scientific drilling operations. Most data held at GFZ are accessible through a central portal and through discipline specific portals, such as the World Stress Map (WSM), Satellite Data Centre (ISDC), Scientific Drilling Database (SDDB), and the GEOFON Seismological Network. Datasets are citeable including a DOI. The portal and the publication of data are offered as a joint service of the GFZ Centre for Geoinformation Technology (CeGIT) and the GFZ Library Albert Einstein.
 * http://www.gfz-potsdam.de:80/portal/gfz/Services/Forschungsdaten

The World Data Center for Marine Environmental Sciences (WDC-MARE) is aimed at collecting, scrutinizing, and disseminating data related to Global Change and earth system research in all fields of marine sciences. Projects wich have used WDC-MARE for data curation have a focus on environmental, geological, biological, physical and chemical oceanography. WDC-MARE makes use of PANGAEA as its data archive.
 * http://www.wdc-mare.de

Since 2003 the German Aerospace Center (DLR) hosts the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT). The center offers a continuously growing collection of atmosphere-related satellite-based data sets, information products and services. Focus is on atmospheric trace gases, aerosols, dynamics, radiation, and cloud physical parameters with complementary information. This is achieved either by giving Open Access to data stored at the center or by acting as a portal linking to other sources. WDC-RSAT is a member of the WMO-WDC group and serves as a management platform for the Network for the Detection of Mesopause Change (NDMC).
 * http://wdc.dlr.de

The data volume with its required constantly increasing storage capacities is the factor stressed in any data discussion; e.g. a satellite like CryoSat, recently launched to observe the behaviour of the polar ice, will produce 50 GB of raw data per day. An other point, rarely mentioned in this context, is the variety of measurements in all parts of the geosphere (atmo-, hydro-, kryo-, bio-, lithosphere). The complexity of small to medium granular data sets resulting from hundreds of projects and thousands of publications each year is even more a challenge. Leaving out anything related to biodiversity (i.e. species distribution) will still remain some ten thousands of variables as measured by all disciplines of Global Change related sciences. Smart data models are required to handle this fine granular and highly diverse data huddle. A strictly normalized and abstracted model is required as established through the PANGAEA data library.

box: PANGAEA® - Data Publisher for Earth & Life Sciences is a unique digital archive for data from earth system research. It is operated not only with a functionality like a library but also as publication system for discrete data publications and for supplements related to articles in cooperation with publishers. Data is entered through an editorial system, stored in a relational database and distributed by web services. Data can be georeferenced in time and space and are accompanied by a description (metadata); part of the latter is a bibliographic citation and a DOI as persistent link. Data are available in Open Access under the terms of Creative Commons licenses. The operational institutes AWI and MARUM provide Pangaea as an infrastructure for the scientific community, open to any institute, project or scientist.
 * http://www.pangaea.de



Centers in Germany archiving data related to Global Change provide its content through an established workflow in citable entities. An author who wants to archive a supplement related to a publication or making results available to the scientific community, e.g. within a research project, is integrated into an editorial workflow similar to a scientific journa: (1) submission (author), (2) check for consistency and completness of metadata, archiving (editor), (3) proof-read (author), (4) corrections (editor), (5) peer-review (if publication related), (6) publication as citable entity with DOI. Data supplements can automatically be linked to the journals catalog through an automated web service (e.g. ).

Besides the curiosity to know how the earth works, a not to be neglected driving force in science is the credit scientists gains for their work. One obstacle for the still insufficient availability of data is this missing credit for the data provider. Consequently data archiving needs to become integral part of the already established workflow and storage for scientific publications. Data must be appropriate citable and require a reliable access.

As a contribution to solve this problem scientists from Germany and the UK initiated the journal Earth System Science Data (ESSD) as the first journal solely aimed at the publication of data. ESSD is online since 2009 from the Open Access publisher Copernicus and will improve the availability of data by its integration into the established scientific publication process. Papers describing data, methods and quality are peer-reviewed and will be listed in the Science Citation Index. The provision of a persistent identifier (e.g. DOI) as address of the data set is mandatory. The first publication made available an eight year time series of ozone profiles from the Antarctic station of the former GDR (German Democratic Republic).
 * http://www.earth-system-science-data.net

just for info

 * Global Change Research in Germany 2008