Molecular data in PANGAEA

From PANGAEA Wiki
Jump to navigation Jump to search

This page contains information about the types of molecular data available in PANGAEA and describes guidelines for integrating new molecular data in PANGAEA datasets.

Definition of molecular data

The term "molecular data" summarizes all kinds of data, which are associated with molecular biology or its central dogma, mainly genetic data as well as protein and enzyme data and metadata.

Which kind of molecular data is accepted by PANGAEA?

In the context of molecular data, PANGAEA serves as a repository for environmental data and metadata that are not stored in specialized databases.

Sequencing data (nucleotides)

PANGAEA does not accept nucleotide sequences and directly related data. For archiving nucleotide sequences and directly related data, the submission to one of the INSDC-databases (International Nucleotide Sequence Database Collaboration) is appreciated:

However metadata related to nucleotide sequences and "omics"-projects can be submitted to PANGAEA.

By default, PANGAEA links data to the European Nucleotide Archive (ENA) using INSDC accession numbers. The following kinds of sequencing data can be linked to PANGAEA:

  • Nucleotide sequences
    • Gene sequences (enzymes, 16S/18S rRNA)

Protein/Enzyme data

PANGAEA does not accept data that characterizes proteins and enzymes. However references can be added to a dataset, e.g.:

In the future, EC-numbers will be used for crosslinking PANGAEA and BRENDA (BRaunschweig ENzyme DAtabase)

Experimental data

Experimental data refers to measurements and counts in the context of molecular biology. PANGAEA contains many different measurement quantities (simple and complex parameters). Listed are some examples:

  • Gene expression data
  • Protein content data
  • Protein production data
  • Enzyme activity data
  • FISH counts
  • etc.

Submission guideline for molecular data

Depending on the dataset, please consider the following rules and recommendations.

Gene names and symbols

  • Preferably use approved / official gene symbols in accordance with applicable nomenclatures
  • Provision of full gene product name (functional RNA, protein, enzyme)is appreciated due to possible ambiguities of gene symbols

Protein names and symbols

Database accession numbers

An accession number is an unique identifier, usually having an alphanumerical format. In biosciences, the term is mainly used for genetic and protein sequence IDs, which refer to a certain entry within the database.

Genetic accession numbers

Protein accession numbers