Qualiservice Data Model

From PANGAEA Wiki
Jump to navigation Jump to search

The Qualiservice metadata model will be described here.

As it was created as an extension of the PANGAEA Data model, all structures and names are taken from this. They have similar applications, but differ in the type of data described and its access: since PANGAEA presents the data and its metadata openly in its portal (https://www.pangaea.de/), Qualiservice uses this data model only as a representation/description of the data (metadata), not to exchange/display the data itself.

Therefore we talk here about a Metadata Model for Qualiservice.

The PANGAEA Data Model runs on a relational database (PostgreSQL) and is expressed more technically as an XML-Schema in https://ws.pangaea.de/schemas/pangaea/MetaData.xsd

Main tables

Note: The meaning names as used by Qualiservice are displayed here in addition to the original table and module names in this form Original name|Qualiservice name

The Metadata model consists of four main modules (Project, Campaign|Study, Event, Dataset|Collection of Data) and supporting tables with supplemental information. The data object metadata (metadata about single interviews or cases) are organized in Data Series|Micrometadata (this micrometadata was referred to as Interview-Metadata in the first phase of the Qualiservice Project, s. Betancort & Haake, 2014[1]).

As the PANGAEA Data model is a generic one, it could be reused by Qualiservice increasing interoperability and findability of the data collections shared and archived by Qualiservice:

The hierarchy of the four main tables follows the steps in science for gathering analytical data: within a PROJECT different CAMPAIGNs are executed to get samples for investigations or to make measurements at distinct locations (EVENT). The result of the investigations are analytical data, organized in Data Series, grouped in DATASETS

This last point about Data Series and Datasets refers in the case of Qualiservice only to MICROMETADATA grouped in COLECCIONS OF DATA.

Qualiservice Data Model - PANGAEA and DDI

Project

The PROJECT table is the uppermost level in the data model, used to define big research projects like Collaborative Research Centres or Clusters of Excellence.

Details of the project framework and its funding are included.

Mappings

Table DDI 3.2 da-ra 4.0
Project Series Statement (Optional and repeatable) Collective titles
Award Funding Information (Optional and repeatable) Funding references

Fields

Required fields for the project definition are (mandatory in bold):

Field PANGAEA MetaData Description DDI 3.2 Element
Acronym md:project/md:label project acronym Series Abbreviation (Optional and repeatable)
Name md:project/md:name full project title Series Name (Optional and repeatable)
Type md:project/@type Institute, DFG, BMBF, National institution, EU, National, International Series Description (Optional and non-repeatable)
Coordinator project responsible or coordinator. Value from Staff table Series Description (Optional and non-repeatable)
Institute md:project/md:institution place of coordination or project office. Value from Institution table Series Description (Optional and non-repeatable)
URI md:project/md:URI link to homepage of the project Series Description (Optional and non-repeatable)
URI for data link to the data repository of the project Series Repository Location (Optional and repeatable)
Comment more information about the project: comments (de, en),

other names (de, en), other project type

Series Description (Optional and non-repeatable)
Awards md:award Funder, Award number, (Sub)Title. Value from Award table Funding Information (Optional and repeatable)

Campaign|Study

This module includes the study metadata.

Mappings

Table DDI 3.2 da-ra 4.0
Campaign Group (optional and repeatable) collective titles, resource language, classifications, keywords,

descriptions, geographic coverages, temporal coverages, universes, time dimensions, notes

Fields

Required fields for the campaign definition are (mandatory in bold):

Field PANGAEA MetaData Description DDI 3.2 Element
Acronym md:event/md:campaign/md:name official study acronym Alternate Title (Optional and repeatable)
Title md:event/md:campaign/md:optionalName full authoritative study title Title (Optional and non-repeatable)
Begin md:event/md:campaign/md:start study funding start date Start Date (Mandatory and non-repeatable) in Lifecycle Event (and if applicable Temporal Coverage)
End md:event/md:campaign/md:end study funding end date End Date (Optional and non-repeatable) in Lifecycle Event (and if applicable Temporal Coverage)
Study responsible(s) md:event/md:campaign/md:chiefScientist name of the person(s) beeing responsible for the study Creator (Optional and repeatable) with Creator Name
URI md:event/md:campaign/md:URI link to an (official) study web page Note (Optional and repeatable)
Study report md:reference[contains(@comment, 'Studienreport')] citation of the study report Other Material (Optional and repeatable)
Comment more information about the study: comments (de, en),

other titles (de, en), funding organization/number (for studies without associated project)

Note (Optional and repeatable)

For titles use above element Title

For alternate titles use the element Alternate Title

For funding, see above Funding Information (under Project)

Study abstract md:event/md:campaign/md:attribute[@name='Study abstract'] abstract of the study unit describing the nature and scope of if Abstract (Optional and non-repeatable)
Keyword md:event/md:campaign/md:attribute[@name='Keyword'] keywords describing the topics covered by the study.

Values from ELSST and TheSoz (GESIS) and/or MeSH

Keyword (Optional and repeatable)
Subject md:event/md:campaign/md:attribute[@name='Subject'] subject or discipline of the study.

Values from CESSDA Topic Classification and/or DFG-Fachsystematik

Subject (Optional and repeatable)
Type of data md:event/md:campaign/md:attribute[@name='Kind of data type'] general type of data (quantitative, qualitative, mixed) Kind Of Data Type (Optional and non-repeatable)
Language md:event/md:campaign/md:attribute[@name='Language'] study language as ISO 639-3 code (deu, eng, spa...) Language (Optional and repeatable)
Time method md:event/md:campaign/md:attribute[@name='Time method'] values from DDI Controlled Vocabulary Time Method TimeMethod (Optional and repeatable)
Universe md:event/md:campaign/md:attribute[@name='Universe'] description of the researched population of the study Universe (Optional and repeatable)
Location md:event/md:campaign/md:attribute[@name='Location'] geographic coverage of the study as ISO 3166 code Spatial Coverage (Optional and non-repeatable) with Description and/or CountryCode
Coverage start date md:event/md:campaign/md:attribute[@name='Start date'] start date of the temporal coverage of the study Start Date (Mandatory and non-repeatable) in Temporal Coverage (Optional and non-repeatable)
Coverage end date md:event/md:campaign/md:attribute[@name='End date'] end date of the temporal coverage of the study End Date (Optional and non-repeatable) in Temporal Coverage (Optional and non-repeatable)
Period subject md:event/md:campaign/md:attribute[@name='Period subject'] time period covered by the study Subject (Optional and repeatable) in Temporal Coverage (Optional and non-repeatable)

Event

This table include information about events by which the data was colleted, transformed (transcription, anonymization) or analized.

Mappings

Table DDI 3.2 da-ra 4.0
Event Lifecycle Event (Optional and repeatable)

Data Collection (Optional and repeatable)

Processing Event (Optional and repeatable)

descriptions, geographic coverages, temporal coverages, samplings, collection modes, notes

Fields

Required fields for the event definition are (mandatory in bold):

Field PANGAEA MetaData Description DDI 3.2 Element
Label md:event/md:label event acronym or label
  • Label (Optional and repeatable) in Lifecycle Event and/or
Optional label md:event/md:optionalLabel optional event label
  • Label (Optional and repeatable) in Lifecycle Event and/or
Method md:event/md:method/md:name method which was applied in the event.

Values from controlled vocabulary

  • Description (Optional and non-repeatable) in Lifecycle Event and/or
Event location md:event/md:location/md:name geographical location of the event

(normally of the data collection)

Event start date md:event/md:dateTime ISO-format: YYYY-MM-DD
Event end date md:event/md:dateTime2 ISO-format: YYYY-MM-DD
Comment md:event/md:comment more information about the event (de, en)
  • Description (Optional and non-repeatable) in Lifecycle Event and/or Data Collection or
  • User Attribute Pair (Optional and repeatable) in Processing Event
URI md:event/md:URI Link to a more detailed description of an event,

e.g. on a external web page or a document.

  • Description (Optional and non-repeatable) in Lifecycle Event and/or Data Collection or
  • User Attribute Pair (Optional and repeatable) in Processing Event
Event type md:event/md:attribute[@name='Event type'] DataCollection, DataProcessing,

DataProcessing.InterviewTranscriptions or

DataProcessing.DisclosureLimitation.

  • Event Type (Optional and non-repeatable) in Lifecycle Event
Instrument md:event/md:attribute[@name='Instrument'] Guidelines.Flex, Guidelines.Flex.Narrative,

Guidelines.Flex.General, Guidelines.Tight

Guidelines.Tight.Narrative or Guidelines.Tight.General

  • Instrument (Optional and repeatable) with Instrument Name and/or Type Of Instrument in Data Collection and/or
  • Description (Optional and non-repeatable) in Lifecycle Event or
  • User Attribute Pair (Optional and repeatable) in Processing Event
Methodology

description

md:event/md:attribute[@name='Methodology'] description or more information about

the methodology (de, en)

Responsible md:event/md:attribute[@name='Responsible'] event responsible
Collection situation md:event/md:attribute[@name='Collection situation'] situation in which the data collection event takes place
Mode of collection md:event/md:attribute[@name='Mode of collection'] mode of collection with controlled values: FaceToFace

Telephone, WebBased or Email

Sample size md:event/md:attribute[@name='Sample size'] note that it can difers from the size of the archived dataset
Sampling procedure md:event/md:attribute[@name='Sampling procedure'] description of the sampling procedure (de, en)

Dataset

A dataset describes a collection of data (from one or several data collection events) whose metadata are organized in a data frame (matrix) and is mostly put together in a scientific context. The collections of data follows the archival principles of provenance and integrity as far as possible, grouping the data according to the entity by which they were created or resulting from the same activity.

If the data collector or data provider submits the data to Qualiservice in their own logical order, this structure should be maintained to reflect the context and structure in which they were created, used or transferred. Thus, some data collections will be arranged in geographical groups, others in temporal groups, some in methodological groups or in administrative groups (e.g. data from the whole study or from a single round).

The dataset is the central entity of the model and therefore it is associated to a persistent identifier (Digital Object Identifier or DOI) for unique identification, citation, and long term location of the data.

A parent set bundles two or more related datasets for a certain reason, e.g. to made them citable through a single citation (the supplement to a publication consists of more than one set, for example in the case of a dissertation) or a number of data sets are defined by the PI as a citable entity (childs are independent, for example in the case of time series, or dependent, if the usability / comprehensiveness of the individual data sets is only ensured by supplying all data sets as a package)

Mappings

Table DDI 3.2 da-ra 4.0
Data sets Study Unit (Optional and repeatable)

SubGroup (Optional and repeatable)

resource type (free), resource identifier, titles, creators, data URLs, doi proposal, publication date, availability, rights, free keywords, descriptions, universes, temporal coverages, contributors, funding references, notes, relations, publications

Mapping-Note: the dataset is mapped to the element Study Unit of DDI3.2., instead of Archive, as in this specification there isn't the possibility to link the archived dataset with its correspondig metadata about data collection and processing event(s). Also the analysis unit and the universe of a certain collection of data couldn't be associated to its specific dataset. Nevertheless the archival characteristics (access information, status or classification of the curation and reuse level, completeness...) of a collection of data or dataset published by Qualiservice can be expressed partly via the Collection element in DDI3.2.

A parent set is expressed as a SubGroup (collection of datasets) to bundle all datasets of a study/round/version together.

Fields

Required fields for the dataset definition are (mandatory in bold):

Field PANGAEA MetaData Description DDI 3.2 Element
Author(s) md:citation/md:author/md:firstName, md:lastName author(s) of the dataset. More information (ORCID, current affiliation, and contact information) is provided in the Staff table. Creator (Optional and repeatable) with Creator Name
Affiliations md:citation/md:author/md:affiliation[1] affiliation of the author(s) to an organization, on whose behalf the author(s) created the dataset Creator Name attribute Affiliation (Optional and non-repeatable)
Year md:citation/md:year year of publication of the dataset Publication Date (Optional and non-repeatable)


optional:

Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "DisseminationPackageRelease"

Title md:citation/md:title full authoritative dataset title including subtitles Title (Optional and non-repeatable)
Alternate title md:citation/md:alternativeTitle an alternative title, commonly a title in other language Alternate Title (Optional and repeatable)
Source md:citation/md:source hosting institution or archive of the data Archive Organization Reference (Optional and non-repeatable)
Status md:technicalInfo/md:entry[@key='status'] controlled vocabulary for the status of the dataset: questionable, in review, validatet, published (registered DOIs only by "published") Availability Status (Optional and non-repeatable)
Protection md:technicalInfo/md:entry[@key='loginOption'] controlled vocabulary for the default access conditions of the micrometadata: unrestricted, signup required, access rights needed Access Conditions (Optional and non-repeatable)
End of embargo md:technicalInfo/md:entry[@key='moratoriumUntil'] specifies date and time when the moratorium on data sets with access rights needed is lifted End Date (Mandatory and non-repeatable) in Embargo (Optional and repeatable) Date
License md:license/md:label, md:name license applied the dataset dc:rights (Optional and repeatable)
Keywords md:keywords/md:techKeyword[@type='fromDatabase'] at the moment only for internal technical use
Created md:citation/md:dateTime date of creation of the metadata record necessary for the dataset publication Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "DisseminationPackageRelease" or "DisseminationPackageProduction"
Updated md:technicalInfo/md:entry[@key='lastModified'] date of update of the metadata record Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "MetadataEditing"
References md:reference references to the study report and oder context materials, to related archived (quantitative) data and to publications where the data was used Other Material (Optional and repeatable)

Item (Optional and repeatable) if part of the archived dataset and

Data Source (Optional and repeatable) to reference the original/source data

Temporal extent (start date) md:extent/md:temporal/md:minDateTime start date of data creation/generation


For the researched period see temporal coverage of the study or of the single data objects

Start Date (Mandatory and non-repeatable) in Temporal Coverage (Optional and non-repeatable)
Temporal extent (end date) md:extent/md:temporal/md:maxDateTime end date of data creation/generation


It doesn't normally include the data curation and publication preparation of the archive

End Date (Optional and non-repeatable) in Temporal Coverage (Optional and non-repeatable)
Data file quantity approximately data file quantity of the dataset Data File Quantity (Optional and non-repeatable)
Comment md:comment more information about the dataset (for example about transcription method) or study or context materials notes Note (Optional and repeatable)


For study or context materials notes, in Archive (Group or StudyUnit)

Awards md:award funder, award number, (sub)title. Value from Award table Funding Information (Optional and repeatable)
Abstract md:abstract short description of the data normally in german and english or the original language of the study Abstract (Optional and non-repeatable)
Analysis unit md:attribute[@name='Analysis unit'] values from DDI Controlled Vocabulary Analysis Unit
Call number md:attribute[@name='Call number'] name, code, or number used to uniquely identify the collection within the archive (by physical materials) Call Number (Optional and non-repeatable)
Kind of data md:attribute[@name='Kind of data'] controlled vocabulary: VerbalData, VerbalData.Transcripts, Observations, Documents Kind Of Data (Optional and repeatable)
Location in archive md:attribute[@name='Location in archive'] location of the collection within the archive (physical store) Location In Archive (Optional and repeatable)
Collection completeness md:attribute[@name='Collection completeness'] completeness of the archived collection: note gaps between the original and the archived/published collection of data or more general coverage gaps and/or strengths Collection Completeness (Optional and non-repeatable)
Study class md:attribute[@name='Study class'] classification of the collection of data regarding its reusability. Controlled vocabulary: SecondaryUse.Teaching, SecondaryUse.Research, SecondaryUse.Research&Teaching, SecondaryUse.Saferoom, Archiving.SecondaryUse, Archiving.GRP Study Class (Optional and non-repeatable)
Sub-universe md:attribute[@name='Sub-universe'] description of the researched population in the collection of data, if different from the study universe. It is a subgroup within a more general universe/population SubUniverse Class (Optional and repeatable)
Access type md:attribute[@name='Access type'] controlled vocabulary from da-ra 4.0

Download, Delivery, On-site, Not available, Unknown (as not information provided)

Access Type Name (Optional and repeatable)
(Access) Restrictions md:attribute[@name='Restrictions'] description of access restrictions to the data Restrictions (Optionan and non-repeatable)
Access permission Statement md:attribute[@name='Statement'] an statement about data access permissions and usage agreement, most commonly "Upon request by Qualiservice" Statement (Optional and non-repeatable)
Access permission URI md:attribute[@name='URI'] link to the usage agreement form for research or teaching purposes URI (Optional and non-repeatable) in Access Permission (Optional and repeatable)
Embargo rationale md:attribute[@name='Rationale'] embargo rationale Rationale (Optional and non-repeatable)

References

  1. Betancort Cabrera, Noemi und Haake, Elmar (2014): Das Qualiservice Metadatenschema, Version 1.1. Qualiservice Technical Reports, 2014/01. https://nbn-resolving.de/urn:nbn:de:gbv:46-00103643-13