# Data set

A **data set** is a collection of data (from one or several events that is organized in a matrix and is mostly put together in a scientific context. Data in PANGAEA are organized in predefined **data sets** which are quite similar to the original files uploaded by the author (e.g. one table in one Excel sheet).

The granularity of a data set depends on the type of data and the number of data points, and is primarily in the decision of the data author. In principle, a PANGAEA data set can have an unlimited number of columns and lines (excel 2003: *65,536 x 256*; excel 2008: *>1 Mio x 16,384*) - Examples:

- 17 columns doi:10.1594/PANGAEA.821166
- 2,000,000+ lines doi:10.1594/PANGAEA.701279
- 22,600,000+ lines (in ascii: 551 MB; in ASE +index: 2.2 GB; export from IQ +DOI: 1.44 GB) (Fig. 2) doi:10.1594/PANGAEA.758918

A data set may contain one to many data series = parameters. Two or many data sets may be grouped into a collection. Access restrictions can be defined for a complete data set only. Each data set consists of the data and the metadata according to ISO standard fields (ISO 19115). A data set appears on the Internet with a metaheader which contains the information as described below.

## Deleting/Updating PANGAEA data

30 days after the publication of a data set, the DOI number of the data set is registered, e.g. doi:10.1594/PANGAEA.919538 (this does not apply to data sets with the status "in review"). Before DOI registration, a dataset can be deleted without problems. After registration, a dataset cannot be deleted anymore. However, a new version of the dataset - carrying a new DOI number - can be uploaded and the old version can be "hidden" from the search (it can still be found using the DOI link of the data set).