Qualiservice Data Model
The Qualiservice metadata model will be described here.
As it was created as an extension of the PANGAEA Data model, all structures and names are taken from this. They have similar applications, but differ in the type of data described and its access: since PANGAEA presents the data and its metadata openly in its portal (https://www.pangaea.de/), Qualiservice uses this data model only as a representation/description of the data (metadata), not to exchange/display the data itself.
Therefore we talk here about a Metadata Model for Qualiservice.
The PANGAEA Data Model runs on a relational database (PostgreSQL) and is expressed more technically as an XML-Schema in https://ws.pangaea.de/schemas/pangaea/MetaData.xsd
Main tables
Note: The meaning names as used by Qualiservice are displayed here in addition to the original table and module names in this form Original name|Qualiservice name
The Metadata model consists of four main modules (Project, Campaign|Study, Event, Dataset|Collection of Data) and supporting tables with supplemental information. The data object metadata (metadata about single interviews or cases) are organized in Data Series|Micrometadata (this micrometadata was referred to as Interview-Metadata in the first phase of the Qualiservice Project, s. Betancort & Haake, 2014[1]).
As the PANGAEA Data model is a generic one, it could be reused by Qualiservice increasing interoperability and findability of the data collections shared and archived by Qualiservice:
The hierarchy of the four main tables follows the steps in science for gathering analytical data: within a PROJECT different CAMPAIGNs are executed to get samples for investigations or to make measurements at distinct locations (EVENT). The result of the investigations are analytical data, organized in Data Series, grouped in DATASETS
This last point about Data Series and Datasets refers in the case of Qualiservice only to MICROMETADATA grouped in COLECCIONS OF DATA.
Project
The PROJECT table is the uppermost level in the data model, used to define big research projects like Collaborative Research Centres or Clusters of Excellence.
Details of the project framework and its funding are included.
Mappings
Table | DDI 3.2 | da-ra 4.0 |
---|---|---|
Project | Series Statement (Optional and repeatable) | Collective titles |
Award | Funding Information (Optional and repeatable) | Funding references |
Fields
Required fields for the project definition are (mandatory in bold):
Field | PANGAEA MetaData | Description | DDI 3.2 Element |
---|---|---|---|
Acronym | md:project/md:label | project acronym | Series Abbreviation (Optional and repeatable) |
Name | md:project/md:name | full project title | Series Name (Optional and repeatable) |
Type | md:project/@type | Institute, DFG, BMBF, National institution, EU, National, International | Series Description (Optional and non-repeatable) |
Coordinator | project responsible or coordinator. Value from Staff table | Series Description (Optional and non-repeatable) | |
Institute | md:project/md:institution | place of coordination or project office. Value from Institution table | Series Description (Optional and non-repeatable) |
URI | md:project/md:URI | link to homepage of the project | Series Description (Optional and non-repeatable) |
URI for data | link to the data repository of the project | Series Repository Location (Optional and repeatable) | |
Comment | more information about the project: comments (de, en),
other names (de, en), other project type |
Series Description (Optional and non-repeatable) | |
Awards | md:award | Funder, Award number, (Sub)Title. Value from Award table | Funding Information (Optional and repeatable) |
Campaign|Study
This module includes the study metadata.
Mappings
Table | DDI 3.2 | da-ra 4.0 |
---|---|---|
Campaign | Group (optional and repeatable) | collective titles, resource language, classifications, keywords,
descriptions, geographic coverages, temporal coverages, universes, time dimensions, notes |
Fields
Required fields for the campaign definition are (mandatory in bold):
Field | PANGAEA MetaData | Description | DDI 3.2 Element |
---|---|---|---|
Acronym | md:event/md:campaign/md:name | official study acronym | Alternate Title (Optional and repeatable) |
Title | md:event/md:campaign/md:optionalName | full authoritative study title | Title (Optional and non-repeatable) |
Begin | md:event/md:campaign/md:start | study funding start date | Start Date (Mandatory and non-repeatable) in Lifecycle Event (and if applicable Temporal Coverage) |
End | md:event/md:campaign/md:end | study funding end date | End Date (Optional and non-repeatable) in Lifecycle Event (and if applicable Temporal Coverage) |
Study responsible(s) | md:event/md:campaign/md:chiefScientist | name of the person(s) beeing responsible for the study | Creator (Optional and repeatable) with Creator Name |
URI | md:event/md:campaign/md:URI | link to an (official) study web page | Note (Optional and repeatable) |
Study report | md:reference[contains(@comment, 'Studienreport')] | citation of the study report | Other Material (Optional and repeatable) |
Comment | more information about the study: comments (de, en),
other titles (de, en), funding organization/number (for studies without associated project) |
Note (Optional and repeatable)
For titles use above element Title For alternate titles use the element Alternate Title For funding, see above Funding Information (under Project) | |
Study abstract | md:event/md:campaign/md:attribute[@name='Study abstract'] | abstract of the study unit describing the nature and scope of if | Abstract (Optional and non-repeatable) |
Keyword | md:event/md:campaign/md:attribute[@name='Keyword'] | keywords describing the topics covered by the study. | Keyword (Optional and repeatable) |
Subject | md:event/md:campaign/md:attribute[@name='Subject'] | subject or discipline of the study.
Values from CESSDA Topic Classification and/or DFG-Fachsystematik |
Subject (Optional and repeatable) |
Type of data | md:event/md:campaign/md:attribute[@name='Kind of data type'] | general type of data (quantitative, qualitative, mixed) | Kind Of Data Type (Optional and non-repeatable) |
Language | md:event/md:campaign/md:attribute[@name='Language'] | study language as ISO 639-3 code (deu, eng, spa...) | Language (Optional and repeatable) |
Time method | md:event/md:campaign/md:attribute[@name='Time method'] | values from DDI Controlled Vocabulary Time Method | TimeMethod (Optional and repeatable) |
Universe | md:event/md:campaign/md:attribute[@name='Universe'] | description of the researched population of the study | Universe (Optional and repeatable) |
Location | md:event/md:campaign/md:attribute[@name='Location'] | geographic coverage of the study as ISO 3166 code | Spatial Coverage (Optional and non-repeatable) with Description and/or CountryCode |
Coverage start date | md:event/md:campaign/md:attribute[@name='Start date'] | start date of the temporal coverage of the study | Start Date (Mandatory and non-repeatable) in Temporal Coverage (Optional and non-repeatable) |
Coverage end date | md:event/md:campaign/md:attribute[@name='End date'] | end date of the temporal coverage of the study | End Date (Optional and non-repeatable) in Temporal Coverage (Optional and non-repeatable) |
Period subject | md:event/md:campaign/md:attribute[@name='Period subject'] | time period covered by the study | Subject (Optional and repeatable) in Temporal Coverage (Optional and non-repeatable) |
Event
This table include information about events by which the data was colleted, transformed (transcription, anonymization) or analized.
Mappings
Table | DDI 3.2 | da-ra 4.0 |
---|---|---|
Event | Lifecycle Event (Optional and repeatable)
Data Collection (Optional and repeatable) Processing Event (Optional and repeatable) |
descriptions, geographic coverages, temporal coverages, samplings, collection modes, notes |
Fields
Required fields for the event definition are (mandatory in bold):
Field | PANGAEA MetaData | Description | DDI 3.2 Element |
---|---|---|---|
Label | md:event/md:label | event acronym or label |
|
Optional label | md:event/md:optionalLabel | optional event label |
|
Method | md:event/md:method/md:name | method which was applied in the event.
Values from controlled vocabulary |
|
Event location | md:event/md:location/md:name | geographical location of the event
(normally of the data collection) |
|
Event start date | md:event/md:dateTime | ISO-format: YYYY-MM-DD |
|
Event end date | md:event/md:dateTime2 | ISO-format: YYYY-MM-DD |
|
Comment | md:event/md:comment | more information about the event (de, en) |
|
URI | md:event/md:URI | Link to a more detailed description of an event,
e.g. on a external web page or a document. |
|
Event type | md:event/md:attribute[@name='Event type'] | DataCollection, DataProcessing,
DataProcessing.InterviewTranscriptions or DataProcessing.DisclosureLimitation. |
|
Instrument | md:event/md:attribute[@name='Instrument'] | Guidelines.Flex, Guidelines.Flex.Narrative,
Guidelines.Flex.General, Guidelines.Tight Guidelines.Tight.Narrative or Guidelines.Tight.General |
|
Methodology
description |
md:event/md:attribute[@name='Methodology'] | description or more information about
the methodology (de, en) |
|
Responsible | md:event/md:attribute[@name='Responsible'] | event responsible |
|
Collection situation | md:event/md:attribute[@name='Collection situation'] | situation in which the data collection event takes place |
|
Mode of collection | md:event/md:attribute[@name='Mode of collection'] | mode of collection with controlled values: FaceToFace
Telephone, WebBased or Email |
|
Sample size | md:event/md:attribute[@name='Sample size'] | note that it can difers from the size of the archived dataset |
|
Sampling procedure | md:event/md:attribute[@name='Sampling procedure'] | description of the sampling procedure (de, en) |
|
Dataset
A dataset describes a collection of data (from one or several data collection events) whose metadata are organized in a data frame (matrix) and is mostly put together in a scientific context. The collections of data follows the archival principles of provenance and integrity as far as possible, grouping the data according to the entity by which they were created or resulting from the same activity.
If the data collector or data provider submits the data to Qualiservice in their own logical order, this structure should be maintained to reflect the context and structure in which they were created, used or transferred. Thus, some data collections will be arranged in geographical groups, others in temporal groups, some in methodological groups or in administrative groups (e.g. data from the whole study or from a single round).
The dataset is the central entity of the model and therefore it is associated to a persistent identifier (Digital Object Identifier or DOI) for unique identification, citation, and long term location of the data.
A parent set bundles two or more related datasets for a certain reason, e.g. to made them citable through a single citation (the supplement to a publication consists of more than one set, for example in the case of a dissertation) or a number of data sets are defined by the PI as a citable entity (childs are independent, for example in the case of time series, or dependent, if the usability / comprehensiveness of the individual data sets is only ensured by supplying all data sets as a package)
Mappings
Table | DDI 3.2 | da-ra 4.0 |
---|---|---|
Data sets | Study Unit (Optional and repeatable)
SubGroup (Optional and repeatable) |
resource type (free), resource identifier, titles, creators, data URLs, doi proposal, publication date, availability, rights, free keywords, descriptions, universes, temporal coverages, contributors, funding references, notes, relations, publications |
Mapping-Note: the dataset is mapped to the element Study Unit of DDI3.2., instead of Archive, as in this specification there isn't the possibility to link the archived dataset with its correspondig metadata about data collection and processing event(s). Also the analysis unit and the universe of a certain collection of data couldn't be associated to its specific dataset. Nevertheless the archival characteristics (access information, status or classification of the curation and reuse level, completeness...) of a collection of data or dataset published by Qualiservice can be expressed partly via the Collection element in DDI3.2.
A parent set is expressed as a SubGroup (collection of datasets) to bundle all datasets of a study/round/version together.
Fields
Required fields for the dataset definition are (mandatory in bold):
Field | PANGAEA MetaData | Description | DDI 3.2 Element |
---|---|---|---|
Author(s) | md:citation/md:author/md:firstName, md:lastName | author(s) of the dataset. More information (ORCID, current affiliation, and contact information) is provided in the Staff table. | Creator (Optional and repeatable) with Creator Name |
Affiliations | md:citation/md:author/md:affiliation[1] | affiliation of the author(s) to an organization, on whose behalf the author(s) created the dataset | Creator Name attribute Affiliation (Optional and non-repeatable) |
Year | md:citation/md:year | year of publication of the dataset | Publication Date (Optional and non-repeatable)
Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "DisseminationPackageRelease" |
Title | md:citation/md:title | full authoritative dataset title including subtitles | Title (Optional and non-repeatable) |
Alternate title | md:citation/md:alternativeTitle | an alternative title, commonly a title in other language | Alternate Title (Optional and repeatable) |
Source | md:citation/md:source | hosting institution or archive of the data | Archive Organization Reference (Optional and non-repeatable) |
Status | md:technicalInfo/md:entry[@key='status'] | controlled vocabulary for the status of the dataset: questionable, in review, validatet, published (registered DOIs only by "published") | Availability Status (Optional and non-repeatable) |
Protection | md:technicalInfo/md:entry[@key='loginOption'] | controlled vocabulary for the default access conditions of the micrometadata: unrestricted, signup required, access rights needed | Access Conditions (Optional and non-repeatable) |
End of embargo | md:technicalInfo/md:entry[@key='moratoriumUntil'] | specifies date and time when the moratorium on data sets with access rights needed is lifted | End Date (Mandatory and non-repeatable) in Embargo (Optional and repeatable) Date |
License | md:license/md:label, md:name | license applied the dataset | dc:rights (Optional and repeatable) |
Keywords | md:keywords/md:techKeyword[@type='fromDatabase'] | at the moment only for internal technical use | |
Created | md:citation/md:dateTime | date of creation of the metadata record necessary for the dataset publication | Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "DisseminationPackageRelease" or "DisseminationPackageProduction" |
Updated | md:technicalInfo/md:entry[@key='lastModified'] | date of update of the metadata record | Simple Date (Mandatory and non-repeatable) in Lifecycle Event with Event type "MetadataEditing" |
References | md:reference | references to the study report and oder context materials, to related archived (quantitative) data and to publications where the data was used | Other Material (Optional and repeatable)
Item (Optional and repeatable) if part of the archived dataset and Data Source (Optional and repeatable) to reference the original/source data |
Temporal extent (start date) | md:extent/md:temporal/md:minDateTime | start date of data creation/generation
|
Start Date (Mandatory and non-repeatable) in Temporal Coverage (Optional and non-repeatable) |
Temporal extent (end date) | md:extent/md:temporal/md:maxDateTime | end date of data creation/generation
|
End Date (Optional and non-repeatable) in Temporal Coverage (Optional and non-repeatable) |
Data file quantity | approximately data file quantity of the dataset | Data File Quantity (Optional and non-repeatable) | |
Comment | md:comment | more information about the dataset (for example about transcription method) or study or context materials notes | Note (Optional and repeatable)
|
Awards | md:award | funder, award number, (sub)title. Value from Award table | Funding Information (Optional and repeatable) |
Abstract | md:abstract | short description of the data normally in german and english or the original language of the study | Abstract (Optional and non-repeatable) |
Analysis unit | md:attribute[@name='Analysis unit'] | values from DDI Controlled Vocabulary Analysis Unit | |
Call number | md:attribute[@name='Call number'] | name, code, or number used to uniquely identify the collection within the archive (by physical materials) | Call Number (Optional and non-repeatable) |
Kind of data | md:attribute[@name='Kind of data'] | controlled vocabulary: VerbalData, VerbalData.Transcripts, Observations, Documents | Kind Of Data (Optional and repeatable) |
Location in archive | md:attribute[@name='Location in archive'] | location of the collection within the archive (physical store) | Location In Archive (Optional and repeatable) |
Collection completeness | md:attribute[@name='Collection completeness'] | completeness of the archived collection: note gaps between the original and the archived/published collection of data or more general coverage gaps and/or strengths | Collection Completeness (Optional and non-repeatable) |
Study class | md:attribute[@name='Study class'] | classification of the collection of data regarding its reusability. Controlled vocabulary: SecondaryUse.Teaching, SecondaryUse.Research, SecondaryUse.Research&Teaching, SecondaryUse.Saferoom, Archiving.SecondaryUse, Archiving.GRP | Study Class (Optional and non-repeatable) |
Sub-universe | md:attribute[@name='Sub-universe'] | description of the researched population in the collection of data, if different from the study universe. It is a subgroup within a more general universe/population | SubUniverse Class (Optional and repeatable) |
Access type | md:attribute[@name='Access type'] | controlled vocabulary from da-ra 4.0
Download, Delivery, On-site, Not available, Unknown (as not information provided) |
Access Type Name (Optional and repeatable) |
(Access) Restrictions | md:attribute[@name='Restrictions'] | description of access restrictions to the data | Restrictions (Optionan and non-repeatable) |
Access permission Statement | md:attribute[@name='Statement'] | an statement about data access permissions and usage agreement, most commonly "Upon request by Qualiservice" | Statement (Optional and non-repeatable) |
Access permission URI | md:attribute[@name='URI'] | link to the usage agreement form for research or teaching purposes | URI (Optional and non-repeatable) in Access Permission (Optional and repeatable) |
Embargo rationale | md:attribute[@name='Rationale'] | embargo rationale | Rationale (Optional and non-repeatable) |
References
- ↑ Betancort Cabrera, Noemi und Haake, Elmar (2014): Das Qualiservice Metadatenschema, Version 1.1. Qualiservice Technical Reports, 2014/01. https://nbn-resolving.de/urn:nbn:de:gbv:46-00103643-13