Data Seal of Approval

This page is a copy of the guide for the Data Seal of Approval (DSA) assessment. Pangaea related answers are added after each query in green.

''This paragraph is designed to help data managers who want to prepare an assessment of their repository to apply for the DSA. It lists each of the DSA’s guidelines (in blue) with suggestions of topics for inclusion and discussion. It is neither prescriptive nor exhaustive. Wherever possible each guideline should be addressed in this assessment by a link to a publicly available statement which relates to the issues noted below each guideline.''

 1.	The data producer deposits the research data in a data repository qualified according to the DSA guidelines This guideline simply refers to the DSA Status of the repository. The repository will either be: “not assessed”, “pending assessment”, “assessed” or “assessed, pending re-assessment”.  -> 2009-04-24 not assessed, form processed

 2.	The data producer provides the research data in formats recommended by the data repository This guideline relates to the level of guidance which the repository gives to the data producer before, and at the time of submission to the repository. The response should concentrate on the contribution of the repository to make this guideline possible for the data producer:

The format for numeric and text data is ASCII, for digital objects ISO-Standards are preferred as far as available (see format and data submission). n.a. A document has to be linked (via a handle) to the metadata, describing the format sufficient for future use. n.a. for ASCII In case of proprietary formats, a documentation of the format is required and is added to the related metadescription.
 * Does the repository publish a list of preferred formats?
 * Are tools used to check the compliance to official specifications of the formats?
 * What is the repository’s approach towards data that is deposited in non-preferred formats?
 * Are Quality Control checks in place at the repository to ensure that data producers adhere to the preferred formats?
 * Does the repository ask the depositors to provide detailed information about their file formats and the tools and methods by which the files were created?

 3.	The data producer provides the research data together with the metadata requested by the data repository This guideline relates to the level of guidance which the repository gives to the data producer before, and at the time of submission to the repository. The response should concentrate on the contribution of the repository to make this possible for the data producer: No forms; see Metadata as free text This is part of the workflow project/author > data editor> archive. Metadata are administered together with the data in a relational database with a proprietary publishing/editing system 4D. Metadata are related to data by the editor during the import procedure. The ISO standard 19115 (Geographic Information - Metadata) is used. Metadata are also provided in Dublin Core, Darwin Core and DIF, see metadata. Yes <font color="#1B6C02">This is a matter of defining insufficient. As long as the data can be used in some way, it will be archived with a note what and why some metadata is missing; if e.g. for data from a geological sample the georeference is missing, it is useless and will not be archived.
 * Are deposit forms which hold resource discovery metadata used?
 * Are there other user friendly ways for users to provide metadata?
 * What kind of Quality Control is in place at the repository to check that the data producer adheres to the request for metadata?
 * Are there tools to create metadata at the level of files?
 * Are metadata elements derived from established metadata standards, registries or conventions? If so list them, and show the level of adherence to those standards.
 * Are these metadata items relevant for the data consumers?
 * What is the repository’s approach if the metadata provided is insufficient for long term archiving?

<font color="#0000FF"> 4.	The data repository has an explicit mission in the area of digital archiving and promulgates it This guideline relates to the level of authority which the repository has. <font color="#1B6C02">Pangaea mission statement <font color="#1B6C02">see below <font color="#1B6C02">Yes, since 1994 on international level (list of presentations and papers) <font color="#1B6C02">Transfer of metadata/data to an other certified data repository or to a library (TIB or DNB)
 * Does the repository have a Mission Statement? Does it clearly reference a commissioning authority?
 * Does the repository have a document which outlines the way in which the mission statement is implemented?
 * Does the repository carry out promotional activities?
 * What level of succession planning has taken place in the event of the repository ceasing to exist?

<font color="#0000FF"> 5.	The data repository uses due diligence to ensure compliance with legal regulations and contracts This guideline relates to the legal regulations which impact on the repository. <font color="#1B6C02">The repository is operated by a research center of the Holmholtz Association. The operating institute AWI is a public-law foundation. <font color="#1B6C02">Data archiving is based on a contract if the data producer is a funded project. <font color="#1B6C02">no - Pangaea is an Open Access repository <font color="#1B6C02">no <font color="#1B6C02">no <font color="#1B6C02">n.a.
 * What is the legal position of the repository?
 * Does the repository use model contract(s) with data producers?
 * Does the repository use model contract(s) with data consumers?
 * Are the repository’s conditions of use published?
 * Are there measures in place if the conditions are not complied with?
 * How does the repository ensure knowledge of and compliance with national and international laws?

<font color="#0000FF"> 6.	The data repository applies documented processes and procedures for managing data storage This guideline relates to the ability of the repository to manage data. <font color="#1B6C02">no <font color="#1B6C02">*Daily incremental backup, weekly full backup in two tape drive archives, both mirrored in two different building, 2 km apart. <font color="#1B6C02">Recovery from backup <font color="#1B6C02">? <font color="#1B6C02">? <font color="#1B6C02">Yes, once a year <font color="#1B6C02">By continous migration to new storage media.
 * Does the repository have a preservation policy?
 * What is the repository’s strategy towards backup / multiple copies?
 * What form of data recovery provisions are in place?
 * Are Risk Management techniques used to inform the strategy?
 * What levels of security are acceptable for the repository?
 * Are there checks on the consistency of the archive?
 * How is deterioration of storage media handled and monitored?

<font color="#0000FF"> 7.	The data repository has a plan for long-term preservation of its digital assets This guideline relates to the ability of the repository providing continued access to data. <font color="#1B6C02">see answers about formats above <font color="#1B6C02">see answers about formats above
 * What provisions are in place to take into account the future obsolescence of file formats?
 * What provisions are in place to ensure long term data usability?

<font color="#0000FF"> 8.	Archiving takes place according to explicit workflows across the data life cycle This guideline relates to the levels of procedural documentation for the repository. <font color="#1B6C02">see this Wiki <font color="#1B6C02">? <font color="#1B6C02">Operation of technology through an experienced computer center of AWI; data archiving is achieved by editors <font color="#1B6C02">Observational, experimental and model/simulation data from basic research on the earth system; see [parameter dictionary] <font color="#1B6C02">? no selection required <font color="#1B6C02">? contact TIB to find appropriate repository <font color="#1B6C02">? n.a. <font color="#1B6C02">defined through the license
 * Does the repository have procedural documentation for archiving data?
 * If so, provide references to:
 * Workflows
 * Decision-making process for archival data transformations
 * Skills of employees
 * Types of data within the repository
 * Selection process
 * Approach towards data that does not fall within the mission
 * Guarding privacy of subjects, etc.
 * Clarity to data producers about handling of the data

<font color="#0000FF"> 9.	The data repository assumes responsibility from the data producers for access and availability of the digital objects This guideline relates to the levels of responsibility which the repository takes for its data. <font color="#1B6C02">see http://wiki.pangaea.de/wiki/PANGAEA#License <font color="#1B6C02">see above <font color="#1B6C02">?
 * What licences / contractual agreements does the repository have with data producers?
 * How does the repository enforce licences with the data producer?
 * Please describe your crisis management.

<font color="#0000FF"> 10.	The data repository enables the users to utilize the research data and refer to them This guideline relates to the formats in which the repository provides its data. <font color="#1B6C02">Yes; e.g. search OAIster in datasets for the word water. <font color="#1B6C02">Yes. Via search engine/data warehouse (login required) and by using the PANGAEA XML schema <font color="#1B6C02">DOI for data sets; DOI, urn and sref for referencing to external publications; handles for linking to external documents, used to extend the metadescription of data sets. Documents are stored in a publication repository, see ePIC.
 * In what form are data provided to end users? (E.g., are data provided in formats used by the research community?)
 * How do potential users find data? What search facilities are offered?
 * Is OAI harvesting permissible?
 * Is deep searching possible?
 * Does the repository offer Persistent Identifiers?

<font color="#0000FF"> 11.	The data repository ensures the integrity of the digital objects and the metadata This guideline relates to the information contained in the digital objects and metadata and whether it is complete, whether all changes are logged and whether intermediate versions are present in the archive. <font color="#1B6C02">planned <font color="#1B6C02">RSS-feed and links with dynamic queries on individual parts of the inventory. <font color="#1B6C02">The definition of a versioning is in the responsibility of the scientific field. As part of the metadata a link to older/newer versions can be given to clearly allow the user to identify the version history.
 * Does the repository utilise checksums? What type? How are they monitored?
 * How is the availability of data monitored?
 * Does the repository deal with multiple versions of the data? If so, how?

<font color="#0000FF"> 12.	The data repository ensures the authenticity of the digital objects and the metadata This guideline refers to the relationship between the original data and that disseminated, and whether or not existing relationships between datasets and/or metadata are maintained. <font color="#1B6C02">In principle no changes are allowed if data are published. <font color="#1B6C02">see versioning <font color="#1B6C02">Provenance: if applicable (as source); audit trails: informal during the archiving procedure <font color="#1B6C02">Any metadata field can be linked to an external source for futher information. When linking to external documents, persistent identifiers are prefered. Other related datasets are linked as other version (if applicable). <font color="#1B6C02">The definition what is a new version is in the decision of the provider, not the repository. <font color="#1B6C02">Yes, see authenticity
 * What is the repository’s strategy for changes? Are data producers made aware of this strategy?
 * How is versioning handled?
 * Does the repository maintain provenance data and related audit trails?
 * Does the repository maintain links to metadata and other datasets, and if so how?
 * How are the essential properties of different versions of the same file compared?
 * Does the repository check the identities of depositors?

<font color="#0000FF"> 13.	The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS This guideline refers to the level of conformance with accepted standards. <font color="#1B6C02">? <font color="#1B6C02">? <font color="#1B6C02">?
 * What standards does the repository use for reference?
 * How is the standard implemented, and if there are significant deviations from the standard why is that the case?
 * Does the repository have a plan for infrastructural development?

<font color="#0000FF"> 14.	The data consumer complies with access regulations set by the data repository This guideline refers to the contribution of the repository in creating legal access agreements which relate to relevant national (and international) legislation and the levels to which the repository informs the data consumer about the access conditions of the repository. <font color="#1B6C02">No <font color="#1B6C02">No <font color="#1B6C02">No, the repository does not hold confidential data <font color="#1B6C02">Yes, see license <font color="#1B6C02">?
 * Does the repository use End User Licence(s) with data consumers?
 * Are there any particular special requirements which the repository’s holdings require?
 * Are contracts provided to grant access to restricted-use (confidential) data?
 * Does the repository make use of special licences, e.g., Creative Commons?
 * Are there measures in place if the conditions are not complied with?

<font color="#0000FF"> 15.	The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and scientific research for the exchange and proper use of knowledge and information This guideline refers to the contribution of the repository to inform data users about any relevant codes of conduct. <font color="#1B6C02">? <font color="#1B6C02">? <font color="#1B6C02">AWI & MARUM <font color="#1B6C02">?
 * Does the repository need to deal with any relevant codes of conduct?
 * What are the terms of use to which data consumers agree?
 * Are institutional bodies involved?
 * Are there measures in place if these codes are not complied with?

<font color="#0000FF"> 16.	The data consumer respects the applicable licences of the data repository regarding the use of the research data This guideline refers to the contribution of the repository to inform data users regarding to the applicable licences. <font color="#1B6C02">see license <font color="#1B6C02">?
 * Are there relevant licences in place?
 * Are there measures in place if these codes are not complied with?

Link

 * Dataseal of Approval (DSA)