Data set

A data set is a collection of data (often from one event) in a scientific context organized in one matrix. Data in Pangaea are organized in predefined data sets which are equal to the original files uploaded. The granularity of data sets is also provided as the result list from a Pangaea query (or other clients or web services) for download. The granularity of a data set depends on the type of data, the number of data points and is primarily in the decision of the data provider. A data set may contain one to many data series. One to many data sets may be grouped to a parent set. Access rights can be defined for a complete set only. Each data set consists of the data accompanied by metadata according to ISO standard fields for describing geodata.

Opening a data set in 4D will show four tabs named Config, Basics, Details and Web with metadata fields as described below:

Config tab

 * Parameter window shows parameters with unit and short name, used in the data set. The buttons Add, Clear and PreSelect are used to compose new data sets.
 * Geocodes window lists all geocodes available; those used in the data set are highlighted and can be added to the Configuration.
 * Related metainformation window contains fields from the event table which can be added to the Configuration.
 * Configuration window lists geocodes, related metainformation and all parameters used in the order of the available data set. The Load and Save button can be used to save a configuration and load the same configuration to other similar data sets.
 * Format will show the number of digits before and after the decimal point of a numeric parameter if selected in the configuration window by a mouse click. Different formats can be selected from the pop-up menue or changed by hand. If the geocode Date/Time is selected, different types of ISO formats can be selected, depending on the required precision.
 * Split by event should not be used anymore.
 * Split by versions must be checked if a parameter occurs more than once.
 * Aggregate function may be used to internaly calculate statistic values - do not use.

Basics tab

 * Author(s) of the data set, may be added by a multiple choice list related to the staff table
 * Title of the data set as free text; equivalent to the title of a publication
 * Source may contain the institution of the data origin; use if data are not related to a reference; relational to the table institution
 * Status with
 * status of the data set as pop-up menue with choices: questionable, not validated, validated, published
 * Access rights button to set individual access to data sets not having the status published
 * citable to make data set a parent set with the citation added to a library catalog - to be used by the data librarian or editors only !
 * Registry gives information about the registration process:
 * not to be registered if status is not published
 * registration is in the lead time for four weeks after setting the data set to status published
 * registered final status, automatically set four weeks after changed to published
 * login required may be checked for sets with status published but still under moratorium
 * Reference(s) opens a multiple choice list related to the reference table to select one to many papers relevant to the data set
 * PI(s) window lists all investigators related to the data series
 * Project(s) allows via a mutiple choice list to add one to many projects as provided by the Project table
 * Data series window lists all data series contained in this data set
 * Geocode(s) as used in the data set (can not be changed)
 * Event(s) as used in the data set (can not be changed)

Details tab

 * Citation as assembled from the fields on the Basics tab.
 * Comment to add individual comments as plain text; field size up to 32 kbyte. URIs might be included and will be resolvable in the metaheader (example ).
 * Keywords related to the Thesaurus; may be used to group sets by a keyword
 * Spatial coverage: fields showing min/max of the three spatial dimensions of the data set
 * Temporal coverage: min/max of Date/Time or Age
 * Topologic type is used to define the extension of a data set
 * Created: date/time or import; Updated: date/time of last change
 * Size of the data set

Web tab

 * Citation as assembled from the fields on the Basics tab.
 * URL as defined by the system for event-related data sets or as defined by the user for static links to files.
 * URL Data details to link files containing an extended description of the data set; the linked files should be *.txt for simple text or *.pdf if a text layout is needed. Field must contain a valid URI only and appears in the metaheader of a data set only if filled. (see URL comment on discussion page)
 * Export filename contains the name the file is given if downloaded as text file to the local system by the user. The extension *.tab is added automaticaly.
 * URL other version is an URI field to link to data sets of newer/older versions or any original source or other formats on the Internet. Field must contain a valid URI only and appears in the metaheader of a data set only if filled.
 * If a data set is deleted and substituted by a new version, this field in the deleted set should contain the DOI of the new version. In deleted sets only title and DOI will remain and the user if informed that the data set was substituted by an other version given by its new DOI: (this is only a test)
 * If a data set is requested which was deleted prior to registration or which has never been existed see e.g.
 * If a data set is requested which was deleted and missing a new version see e.g.

Description of the data set format as presented on the web
Data Description
 * Citation: consists of author(s), year, title, source and DOI
 * Reference(s): If data were published through a publication
 * Project(s): title and acronym of projects, related to the production of the data
 * Coverage: spatial and/or temporal distribution of the data set
 * Event(s): detailed metainformation about the sampling/measuring location
 * Parameter(s): list of parameters with unit and short name, related to the method, PI and an optional comment, as used in the data set
 * Size: number of data points

Download data links for
 * Download dataset as tab-delimited text (with a choice for a proper character encoding which is dependent on the local system)
 * View dataset as HTML

Data tables mostly consist of lines (samples) and columns (parameter), headed by the short name of the parameter as described in the parameter table (see above).