Intern:Project data management/IODP

Work flow to published DSDP/ODP/IODP Data
Following a request of IODP to archive postcruise data published during the last decades from DSDP and ODP investigators, we scan through all relevant journals by using the GEOREF portal for ODP publications. If a pdf with internal data tables is found, the table is converted to a machine readable file, imported to Pangaea and is published as a new electronic supplement of the publication.

1) References search on GEOREF
 * go to http://odp.georef.org/dbtw-wpd/qbeodp.htm
 * add in „Source“ the journal name
 * option „Brief record“

2) Download publication
 * if available open/download paper via URL given in "Brief record" (in new window)
 * or search via journal homepage for article (in new window)
 * older references may have a DOI which is NOT available in the pdf-version, but in online versions; same for the Issue. Therefore extract all informations from the online-version in an excel sheet:
 * Example from Marine Microplaeontology
 * Reference No of Georef, Author/Year, Volume and Issue, No of Tables with georeferenced data, No of supplements with georeferenced data, state of work, DOI
 * download pdf file and supplements of the reference.
 * Rename the pdf-file with the Reference No of Georef and the Author, example: 001Agnini.pdf.

3) Preparation of data
 * After all references from one defined subset of papers in Georef are checked/downloaded, start with conversion of tables and import.
 * Start with the most recent reference. Modern pdfs are in a better state than older ones.
 * Check for existing references/datasets in Pangaea
 * Supplement data might be in excel format; otherwise convert to excel
 * 3 choices for table-to-excel conversion
 * 1) Save the pdf as MSWord-Document. Open the *.doc file, copy the table and past it in an excel sheet.
 * 2) Mark the table in the pdf file with the cursor. Copy and past to a text editor. Replace the blanks with tabstopp. Copy and paste into an excel sheet. Sometimes columns or lines are not proper aligned -> carefully correct. Check for invalid numbers/names.
 * 3) ask a datatypist for transcription

4) Import of data
 * If all tables from one reference are prepared, start with import to Pangaea. As Dataset title use the Table/Supplement/Appendix number in brackets followed by the table caption, e.g.(Table 1) Age model of ODP Hole 113-698B.
 * Change publication year in Pangaea to the publication year of the reference
 * If all datasets are imported, create a parent, including all (child-)datasets and give it a short name describing the content of the data supplement
 * change data set status to supplementary data (editors only!)
 * final example: 

5) Statistics
 * keep track on
 * number of ODP publications per journal
 * number of publications with data