Intern:NKGCF

Text draft NKGCF brochure Global Change Research in Germany 2011 (chapter Data Centre and Data Availability)

Observations, measurements and models are the lifeblood of Global Change research, the resulting data sets are the basis for any scientific publications. The importance of data availability and archiving was recently stressed in special issues of leading scientific journals (Science ; Nature ). Only if the primary data of publications are accessible, the findings can be verified by reviewer and reader. The data availability is one point, its form is the other important prerequisite for a proper future use of the content. In particular questions with a focus on changes in the earth system the global perspective requires consequently global data sets. In principle most findings are fractionated through the individual research of scientists and projects. An immense added-value could be given to those distributed data, if it would not only be available but even in harmonized standard formats, allowing exchange and compilation.

Besides the curiosity to know how the earth works, a not to be neglected driving force in science is the credit a scientist gains for his/her work. Consequently data archiving needs to become integral part of the already established workflow for scientific publications - data must be appropriate citable and require an reliable access. Also the infrastructure for the search and distribution of publications is established, i.e. through the global system of libraries including legaly obligated longterm archiving. The invention of the Internet with the over-all availability in digital form has added new search capabilities through publisher catalogs, science specific portals and search engines. If data citations are becoming available through this growing infrastructure the assimilation of data into the publication workflow and infrastructure would be completed. As an success of a national funded project of the German Research Foundation (DFG), a reliable system for sustainable archiving, the citation of data and its persisten identification by using DOI (Digitl Object identifier) was established. Several data centers contributed with its content under the leadership of the National Library for Science and Technology (TIB, Hannover) and in 2009 the international data registration agency DataCite was founded.

box: DataCite is an international association that aims to support researchers by enabling them to locate, identify, and cite research datasets with confidence. DataCite was launched on December 1st 2009 in London. As of December 2010, DataCite has 15 members from 10 countries, including the TIB as initiator; 100000 data sets are registered with a DOI. DataCite has a global leadership in promoting the use of persistent identifiers for datasets as integral part of the data citation. Through its members, it establishes and promotes common methods, best practices, and guidance. The member organisations work independently with data centres and other holders of research data sets in their own domains. As science operates globally with individual researchers working and publishing, DataCite is global with local partners offering services and advice where required. Further organisations are encouraged to join the association.
 * http://www.datacite.org

Libraries exist since 5000 years, data centers since 50 years. During those decades data archiving has suffered from constantly changing storage media and formats. Many valuable data got lost on degrading tapes or on broken discs without backup; in many disciplines data archiving was neglected anyway. During the last years this childhood of electronic data handling is substituted by a technology avoiding data loss by migration and providing capacities in the petabyte range. Now science has the oportunity to assure availability of results in a real librarians long-term perspective. In addition the Internet provides the network to interlink archives distributed around the globe and thus even allows the provision of harmonized data views through science specific portals.

During the International Geophysical Year 1957/58 (IGY) the International Council for Science (ICSU) established the World Data Center (WDC) system to serve data from geosciences. Until its major revision in 2008 more than 50 WDC with a global distribution were established covering important disciplines of Global Change research. With the invention of the Internet, many WDC made its data online accessible. Since 2001 Germany has contributed with three centers: World Data Center for Climate (WDCC), the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT), and the World Data Center for Marine Environmental Sciences (WDC-MARE).

Through a decision of the 29th General Assembly of ICSU the WDC system will be transfered to a new World Data System (WDS) in 2009 to 2011. The WDS concept aims at a transition from individual WDCs to a common globally interoperable distributed data system and will strive to become a world wide 'community of excellence' for scientific data. A prototype data portal which is being considered to be a common entry point for the various data sources is operational. Any organization producing or holding data is encouraged to join the new WDS. Germany will contribute with the data centers WDCC and WDC-RSAT and with the data publisher and library PANGAEA.
 * http://icsu-wds.org

box: The German WDC-cluster for Global Change data

The World Data Centre for Climate (WDCC) offers data management consulting for climate models over the whole life time of the data. With several catalogues a variety of metadata standards can be handled - an indispensable precondition for distributed and federated archives. Part of the archiving workflow is a detailed quality management. In 2010 the amount of on-line accessible model data has grown to more than 400 TB. As a Data Collection and Production Centre the WDCC is also part of the World Meteorological Organization's information system. As part of the IPCC Assessment Reports, WDCC is one of three data notes for climate model data collection in cooperation with the Program for Climate Model Diagnosis and Intercomparison (PCMDI, USA) and the British Atmospheric Data Centre.
 * http://www.mad.zmaw.de/wdc-for-climate

Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ, Potsdam) investigates "System Earth" at locations all over the world with all geological, physical, chemical and biological processes occurring at its surface and in its interior. The resulting data are unique in their scientific profile as they encompass the entire planet from global field models down to individual samples from scientific drilling operations. A large proportion of the data held at GFZ are accessible through web portals and many datasets are identified, and thus citeable, by Digital Objekt Identifers (DOI). Data curation services are offered to geoscience research projects as a joint service of the GFZ Centre for Geoinformation Technology (CeGIT) and of the GFZ Library Albert Einstein through a common data portal, project portals and virtual research environments. This service also encompasses consulting on data management, data curation, publication and long-term preservation of digital research data.
 * http://www.gfz-potsdam.de:80/portal/gfz/Services/Forschungsdaten

The World Data Center for Marine Environmental Sciences (WDC-MARE) is aimed at collecting, scrutinizing, and disseminating data related to Global Change and earth system research in all fields of marine sciences. Projects wich have used WDC-MARE for data curation have a focus on environmental, geological, biological, physical and chemical oceanography. WDC-MARE makes use of PANGAEA as its data archive.
 * http://www.wdc-mare.de

Since 2003 the German Aerospace Center (DLR) hosts the World Data Center for Remote Sensing of the Atmosphere (WDC-RSAT). The center offers a continuously growing collection of atmosphere-related satellite-based data sets, information products and services. Focus is on atmospheric trace gases, aerosols, dynamics, radiation, and cloud physical parameters with complementary information. This is achieved either by giving Open Access to data stored at the center or by acting as a portal linking to other sources. WDC-RSAT is a member of the WMO-WDC group and serves as a management platform for the Network for the Detection of Mesopause Change (NDMC).
 * http://wdc.dlr.de

The data volume and the constantly increasing storage capacities is the point mostly stressed in data discussions; e.g. a satellite like CryoSat, recently launched to observe the behaviour of the polar ice, will produce 50 GB of data per day. An other point, rarely mentioned in this context, is the variety of measurements in all parts of the geosphere (atmo-, hydro-, kryo-, bio- lithosphere). The complexity of small to medium granular data sets resulting from hundreds of projects is even more a challenge than the required storage capacity. Leaving out anything related to biodiversity (i.e. species distribution) will still remain some ten thousands of variables as measured by all disciplines of Global Change related sciences. Smart data models are required to handle this fine granular and highly diverse data huddle. It requires a strictly normalized and abstracted model as established through the PANGAEA archive.

box: PANGAEA® - Publisher for Data from the Earth System is a unique digital archive for any kind of data from earth system research. It is operated as data library and publication system for supplements related to articles and for discrete data publications. Data are published via an editorial system, the content is stored in a relational database and distributed by web services. Data can be georeferenced in time and space and are accompanied by a description (metainformation); part of the latter is a bibliographic citation and a Digital Object Identifier (DOI) as persistent identification of the publication. Data are available in Opcen Access under the terms of Creative Commons licenses. The operational institutes AWI and MARUM provide Pangaea as a infrastructure to the international scientific community, open to any institute, project or scientist.
 * http://www.pangaea.de

By means of the Internet it is possible to store anything (digital object) anywhere at anytime - which is far from beeing a sustainable infrastructure in a librarian view. The resulting error 404 (file not found) problem was realized by the commercial publishers already a few years after the Internet was established and the system of persistent identifiers was invented. In 2011 about 40 Million Digital Object identifier (DOI) are registered for articles in scientific journals through a system operated by the International DOI Foundation (IDF).

Centers in Germany archiving data related to Global Change provide its content through an established workflow in citable entities. An author who wants to archive a supplement related to a publication or making results available to the scientific community, e.g. within a research project, is integrated into an editorial workflow similar to a scientific journal. Documented through an issue tracking system the data publication process consists of six steps: (1) submission (author), (2) check for consistency and completness of metadata, archiving (editor), (3) proof-read (author), (4) corrections (editor), (5) peer-review for supplement to a publication (if applicable), (6) publication as citable entity with DOI as persistent identifier. Data supplements can automatically be linked to the publications splash page of the journal/publisher through an automatic web service (e.g. ).



One reason for the insufficient availability of data is the missing credit for the data provider. As a contribution to solve this problem scientists from Germany and the UK initiated the journal Earth System Science Data (ESSD) as the first journal solely aimed at the publication of data. ESSD is online since 2009 from the Open Access publisher Copernicus. The journal ESSD improves the availability of data by its integration into the established scientific publication process. Papers describing data, methods and quality are peer-reviewed and will be listed in the Science Citation Index; the provision of a persistent identifier (e.g. DOI) as a pointer to the full data set is mandatory in a paper. The first publication made available an eight year time series of ozone profiles from the Antarctic station of the former GDR (German Democratic Republic).
 * http://www.earth-system-science-data.net

just for info

 * Global Change Research in Germany 2008