Intern talk:ARCOD

Draft proposal of Georgy Cherkashev for the meeting in St. Petersburg 2009-11-26

ARCOD – database on sediments in the Arctic Ocean – composition and characteristics of bottom sediments from the Arctic Ocean and its marginal seas
For the first time the project was presented at the 9-th Meeting (Helgoland, December 2003). In this project there is no expedition component and, accordingly, it does not require much funding. Now only 10-20% of all Russian data on bottom sediments, which carry information about recent and ancient natural processes are published in international journals. Most of the information either published only in Russian journals, or has not yet been published and is kept by researchers and in archives. Objectives of the project are: to assess, systematize and integrate all available in Russia information on bottom sediments and, as a final product – an open system of data and preparation of an Atlas and a monograph of bottom sediments of the Arctic Ocean. The project plans to import the whole information data set into the open information system PANGEA, to prepare maps of distribution of sedimentological, mineralogical, geochemical, microbiological and other components in bottom sediments. Russian partners in the project are: VNIIO, IO RAS, MMBI, GEOKHI. Germanic partners in the project are: AWI, IFM-GEOMAR, University of Bremen. The coordinators of the project are AWI (Germany) and VNIIOkeangeologiya (Russia). Russian and German sides in 2004–2006 successfully carried out the pilot phase of the project and obtained important scientific results that can be a basis for subsequent preparation of the Atlas of bottom sediments of the Arctic Ocean. In 2007–2008 the data base was extended: it contains information files on bottom sediments and other objects of oceans. The Arctic region remains of the main importance. It was demonstrated that sharing of Russian and German data is very effective and allows to identify the main sedimentation factors for the recent historical period, and for paleooceanologic reconstructions. Resources of the PANGEA Information System, the leading European information system for oceanographic and geological data, developed in Germany (Bremen University and AWI), were used. Up to now data of 2620 data sources have been imported to the PANGAEA information system, but the lack of funding hinders full implementation of the project.

The sides note that this project does not need much funding, but it is extremely important from scientific point of view. The sides have agreed to transfer the project to the category of ongoing projects of this Agreement with a subsequent decision on its funding.

Proposal/report 2010-01-30
(report by Evgeny Gurvich) During many decades the USSR and than Russia have studied the World Ocean and many seas. Considerable part of results of these studies has been published in books and journals. Many of these publications contain data of observations, measurements, analyses, etc. at stations and along sections with concrete geographical coordinates, i.e. applicable for importing to PANGAEA. More than 90% of these data have been published in Russian language. Some journals are translated into English and are available in libraries of some German universities. As to the books, overwhelming majority of them can be found only in scientific libraries of Russia or FSU countries.

In October 2009 Evgeny Gurvich looked through some hundred books devoted to ocean studies in libraries of Moscow: Many books with interesting information had no coordinates of stations and thus are useless for georeferenced archiving. About 1000 tables containing data of observations, measurements, analyses, etc. as well as important information from 64 books were brought to Germany Table 1. The major part was copied in the libraries, some were brought in originals. From some hundred issues of various journals (some not available in Germany) about 200 tables were copied Table 2.
 * Library of P.P. Shirshov Institute of Oceanology
 * Library of the Geography Department of M.V. Lomonosov Moscow State University
 * Library on Natural Sciences of the Russian Academy of Sciences
 * Russian State Library

Since January 2010 about 360 tables from books of Russian scientists containing 140000+ data points on marine geology (lithology, geochemistry, mineralogy) were archived. Preparation of obtained data for import included the following steps:
 * 1) Scanning tables and text for getting image files with resolution ≥300 dpi.
 * 2) Editing image files.
 * 3) Character recognition (OCR) of digitals and text from the tables and descriptions and saving them as Excel (tables) or Word (text) files.
 * 4) Checking of recognized digitals and text.
 * 5) Translation of needed text (text in tables, comments, abstracts) from Russian into English.
 * 6) Preparation of event files.
 * 7) Import of the event files to PANGAEA.
 * 8) Editing of saved Excel files and preparation of tabulated text data files.
 * 9) Creating metadata files with use of Split2Events program and editing.
 * 10) Preparation of import files with use of Split2Events program.
 * 11) Import of files to PANGAEA.
 * 12) Checking of the imported files on www.pangaea.de.
 * 13) Correction of mistakes (in need).
 * 14) Composing parent files (in need).

Speed of preparing data for import depends on quality of an original, on size and type of a table, on number of parameters. The better polygraphic quality of original tables and texts the faster is getting of table and text files. In good quality of the original recognized text and tables need only a little or not editing. In bad quality much editing or typing is required. A table with much text needs translation and editorial work. Time is depending on the number of parameters. In some palaeontologic and biostratigraphic tables and figures (that can be considered as tables) different occurrence of species is shown as different lines (dotted, dashed, thin, thick, medium, etc.). Such tables have valuable scientific information, but require more effort.

For each table from published books and papers author's name(s) and the reference is provided. If case of many tables, those are grouped to a parent which is than related to the reference. Proposed work includes preparation of scientific information mentioned in Tables 1 and 2 for import. After import of the data related to the literature listed in tables 1 and 2, data aquisition will continue in libraries of Moscow.

Data preparation for import (journal oceanology, reported 2011-02-08)
 * 1) finding and copying publications in the library;
 * 2) recognizing and checking tables and text (bad print quality);
 * 3) editing tables;
 * 4) defining and checking new events;
 * 5) taking event positions from maps;
 * 6) recalculating parameters to PANGAEA internal units.

Report 2011-07-27
In 2011 the journal of the USSR/Russian Academy of Sciences Okeanologiya or its English translation Oceanology were the main sources of information for importing to PANGAEA. From January to the end of July 143535 data points from 321 papers published in Okeanologiya in 1966-2009 were imported to PANGAEA. These data were included in 684 children files of 182 parent files (if papers contained more than one table for importing) and also in 139 single files (if papers contained only one table for importing).

Besides 113535 data points from tables and figures published in 8 Russian books and brought from Moscow were imported to PANGAEA by Dr. Stefanie Schumacher. These data were included in 37 children files of 8 parent files and also in 1 single file.

Report to PTJ:

Projekt ARCOD – Datenarchivierung in 2010/2011

Die Archivierung von Daten aus russischen Archiven und Publikationen wurde fortgesetzt. Die Erfassung erfolgte im Rahmen des etablierten Publikations-workflow von Pangaea, die für jedes Supplement zu einer Publikation einen eigenständige zitierfähigen Datensatz mit dauerhafter Identifizierung über eine DOI (Digital Object Identifier) vorsieht. Die Datenzitate werden über Portale, Suchmaschinen und den Bibliothekskatalog der TIB (Technische Informationsbibliothek Hannover) recherchierbar gemacht.

Die fachgerechte Erfassung der Primärdaten des DSDP (Deep Sea Drilling Projct) aus 8 Büchern der russischen Literatur erfolgte in Zusammenarbeit mit einer Mikropaläontologin, da es sich im wesentlichen um Mikrofossildaten handelte. Die Erfassung von Datensupplements zu Publikationen hat sich in 2011 auf die russsiche Zeitschrift Okeanologiya konzentriert (320 Publikationen). Okeanologiya wird herausgegeben vom russischen Verlag Nauka, der die Artikel aus ca. 50 Jahrgängen in Kooperation mit dem Springer-Verlag ins englische übersetzt (Oceanology) und über den Katalog „SpringerLink“ verfügbar macht. Über die Kooperation Springer-Pangaea werden die Datensupplements in Pangaea automatisch in Verbindung mit der Publikation über SpringerLink sichtbar. Die Arbeiten an Datentabellen aus Okeanologiya/Oceanology werden fortgesetzt, bis die Daten aus allen Publikationen erfaßt sind.