Data Seal of Approval

From PangaWiki
Jump to: navigation, search

This page is a copy of the guide for the Data Seal of Approval (DSA) assessment. Pangaea related answers are added after each query in green.

This paragraph is designed to help data managers who want to prepare an assessment of their repository to apply for the DSA. It lists each of the DSA’s guidelines (in blue) with suggestions of topics for inclusion and discussion. It is neither prescriptive nor exhaustive. Wherever possible each guideline should be addressed in this assessment by a link to a publicly available statement which relates to the issues noted below each guideline.



1. The data producer deposits the research data in a data repository qualified according to the DSA guidelines
This guideline simply refers to the DSA Status of the repository. The repository will either be: “not assessed”, “pending assessment”, “assessed” or “assessed, pending re-assessment”.
-> 2009-04-24 not assessed, form processed


2. The data producer provides the research data in formats recommended by the data repository
This guideline relates to the level of guidance which the repository gives to the data producer before, and at the time of submission to the repository. The response should concentrate on the contribution of the repository to make this guideline possible for the data producer:

  • Does the repository publish a list of preferred formats?

The format for numeric and text data is ASCII, for digital objects ISO-Standards are preferred as far as available (see format and data submission).

  • Are tools used to check the compliance to official specifications of the formats?

n.a.

  • What is the repository’s approach towards data that is deposited in non-preferred formats?

A document has to be linked (via a handle) to the metadata, describing the format sufficient for future use.

  • Are Quality Control checks in place at the repository to ensure that data producers adhere to the preferred formats?

n.a. for ASCII

  • Does the repository ask the depositors to provide detailed information about their file formats and the tools and methods by which the files were created?

In case of proprietary formats, a documentation of the format is required and is added to the related metadescription.


3. The data producer provides the research data together with the metadata requested by the data repository
This guideline relates to the level of guidance which the repository gives to the data producer before, and at the time of submission to the repository. The response should concentrate on the contribution of the repository to make this possible for the data producer:

  • Are deposit forms which hold resource discovery metadata used?

No forms; see Metadata

  • Are there other user friendly ways for users to provide metadata?

as free text

  • What kind of Quality Control is in place at the repository to check that the data producer adheres to the request for metadata?

This is part of the workflow project/author > data curator> archive.

  • Are there tools to create metadata at the level of files?

Metadata are administered together with the data in a relational database with a proprietary publishing/editing system 4D. Metadata are related to data by the curator during the import procedure.

  • Are metadata elements derived from established metadata standards, registries or conventions? If so list them, and show the level of adherence to those standards.

The ISO standard 19115 (Geographic Information - Metadata) is used. Metadata are also provided in Dublin Core, Darwin Core and DIF, see metadata.

  • Are these metadata items relevant for the data consumers?

Yes

  • What is the repository’s approach if the metadata provided is insufficient for long term archiving?

This is a matter of defining insufficient. As long as the data can be used in some way, it will be archived with a note what and why some metadata is missing; if e.g. for data from a geological sample the georeference is missing, it is useless and will not be archived.


4. The data repository has an explicit mission in the area of digital archiving and promulgates it
This guideline relates to the level of authority which the repository has.

  • Does the repository have a Mission Statement? Does it clearly reference a commissioning authority?

Pangaea mission statement

  • Does the repository have a document which outlines the way in which the mission statement is implemented?

see below

  • Does the repository carry out promotional activities?

Yes, since 1994 on international level (list of presentations and papers)

  • What level of succession planning has taken place in the event of the repository ceasing to exist?

Transfer of metadata/data to an other certified data repository or to a library (TIB or DNB)


5. The data repository uses due diligence to ensure compliance with legal regulations and contracts
This guideline relates to the legal regulations which impact on the repository.

  • What is the legal position of the repository?

The repository is operated by a research center of the Holmholtz Association. The operating institute AWI is a public-law foundation.

  • Does the repository use model contract(s) with data producers?

Data archiving is based on a contract if the data producer is a funded project.

  • Does the repository use model contract(s) with data consumers?

no - Pangaea is an Open Access repository

  • Are the repository’s conditions of use published?

no

  • Are there measures in place if the conditions are not complied with?

no

  • How does the repository ensure knowledge of and compliance with national and international laws?

n.a.


6. The data repository applies documented processes and procedures for managing data storage
This guideline relates to the ability of the repository to manage data.

  • Does the repository have a preservation policy?

no

  • What is the repository’s strategy towards backup / multiple copies?

*Daily incremental backup, weekly full backup in two tape drive archives, both mirrored in two different building, 2 km apart.

  • What form of data recovery provisions are in place?

Recovery from backup

  • Are Risk Management techniques used to inform the strategy?

?

  • What levels of security are acceptable for the repository?

?

  • Are there checks on the consistency of the archive?

Yes, once a year

  • How is deterioration of storage media handled and monitored?

By continous migration to new storage media.


7. The data repository has a plan for long-term preservation of its digital assets
This guideline relates to the ability of the repository providing continued access to data.

  • What provisions are in place to take into account the future obsolescence of file formats?

see answers about formats above

  • What provisions are in place to ensure long term data usability?

see answers about formats above


8. Archiving takes place according to explicit workflows across the data life cycle
This guideline relates to the levels of procedural documentation for the repository.

  • Does the repository have procedural documentation for archiving data?
  • If so, provide references to:
    • Workflows

see this Wiki

    • Decision-making process for archival data transformations

?

    • Skills of employees

Operation of technology through an experienced computer center of AWI; data archiving is achieved by curators

    • Types of data within the repository

Observational, experimental and model/simulation data from basic research on the earth system; see [parameter dictionary]

    • Selection process

? no selection required

    • Approach towards data that does not fall within the mission

? contact TIB to find appropriate repository

    • Guarding privacy of subjects, etc.

? n.a.

    • Clarity to data producers about handling of the data

defined through the license


9. The data repository assumes responsibility from the data producers for access and availability of the digital objects
This guideline relates to the levels of responsibility which the repository takes for its data.

  • What licences / contractual agreements does the repository have with data producers?

see http://wiki.pangaea.de/wiki/PANGAEA#License

  • How does the repository enforce licences with the data producer?

see above

  • Please describe your crisis management.

?


10. The data repository enables the users to utilize the research data and refer to them
This guideline relates to the formats in which the repository provides its data.

  • In what form are data provided to end users? (E.g., are data provided in formats used by the research community?)
  • How do potential users find data? What search facilities are offered?
  • Is OAI harvesting permissible?

Yes; e.g. search OAIster in datasets for the word water.

  • Is deep searching possible?

Yes. Via search engine/data warehouse (login required) and by using the PANGAEA XML schema

  • Does the repository offer Persistent Identifiers?

DOI for data sets; DOI, urn and sref for referencing to external publications; handles for linking to external documents, used to extend the metadescription of data sets. Documents are stored in a publication repository, see ePIC.


11. The data repository ensures the integrity of the digital objects and the metadata
This guideline relates to the information contained in the digital objects and metadata and whether it is complete, whether all changes are logged and whether intermediate versions are present in the archive.

  • Does the repository utilise checksums? What type? How are they monitored?

planned

  • How is the availability of data monitored?

RSS-feed and links with dynamic queries on individual parts of the inventory.

  • Does the repository deal with multiple versions of the data? If so, how?

The definition of a versioning is in the responsibility of the scientific field. As part of the metadata a link to older/newer versions can be given to clearly allow the user to identify the version history.


12. The data repository ensures the authenticity of the digital objects and the metadata
This guideline refers to the relationship between the original data and that disseminated, and whether or not existing relationships between datasets and/or metadata are maintained.

  • What is the repository’s strategy for changes? Are data producers made aware of this strategy?

In principle no changes are allowed if data are published.

  • How is versioning handled?

see versioning

  • Does the repository maintain provenance data and related audit trails?

Provenance: if applicable (as source); audit trails: informal during the archiving procedure

  • Does the repository maintain links to metadata and other datasets, and if so how?

Any metadata field can be linked to an external source for futher information. When linking to external documents, persistent identifiers are prefered. Other related datasets are linked as other version (if applicable).

  • How are the essential properties of different versions of the same file compared?

The definition what is a new version is in the decision of the provider, not the repository.

  • Does the repository check the identities of depositors?

Yes, see authenticity


13. The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS
This guideline refers to the level of conformance with accepted standards.

  • What standards does the repository use for reference?

?

  • How is the standard implemented, and if there are significant deviations from the standard why is that the case?

?

  • Does the repository have a plan for infrastructural development?

?


14. The data consumer complies with access regulations set by the data repository
This guideline refers to the contribution of the repository in creating legal access agreements which relate to relevant national (and international) legislation and the levels to which the repository informs the data consumer about the access conditions of the repository.

  • Does the repository use End User Licence(s) with data consumers?

No

  • Are there any particular special requirements which the repository’s holdings require?

No

  • Are contracts provided to grant access to restricted-use (confidential) data?

No, the repository does not hold confidential data

  • Does the repository make use of special licences, e.g., Creative Commons?

Yes, see license

  • Are there measures in place if the conditions are not complied with?

?


15. The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and scientific research for the exchange and proper use of knowledge and information
This guideline refers to the contribution of the repository to inform data users about any relevant codes of conduct.

  • Does the repository need to deal with any relevant codes of conduct?

?

  • What are the terms of use to which data consumers agree?

?

  • Are institutional bodies involved?

AWI & MARUM

  • Are there measures in place if these codes are not complied with?

?


16. The data consumer respects the applicable licences of the data repository regarding the use of the research data
This guideline refers to the contribution of the repository to inform data users regarding to the applicable licences.

  • Are there relevant licences in place?

see license

  • Are there measures in place if these codes are not complied with?

?

Link