Molecular data in PANGAEA

This page contains information about the types of molecular data available in PANGAEA and describes guidelines for integrating new molecular data in PANGAEA datasets.

Definition of molecular data

The term "molecular data" summarizes all kinds of data, which are associated with molecular biology or its central dogma, mainly genetic data as well as protein and enzyme data and metadata.

Which kind of molecular data is accepted by PANGAEA?

In the context of molecular data, PANGAEA serves as a repository for environmental data and metadata that are not stored in specialized databases.

Sequencing data (nucleotides)

PANGAEA does not accept nucleotide sequences and directly related data. For archiving nucleotide sequences and directly related data, the submission to one of the INSDC-databases (International Nucleotide Sequence Database Collaboration) is appreciated:

However metadata related to nucleotide sequences and "omics"-projects can be submitted to PANGAEA.

By default, PANGAEA links data to the European Nucleotide Archive (ENA) using INSDC accession numbers. The following kinds of sequencing data can be linked to PANGAEA:

Nucleotide sequences
- Gene sequences (enzymes, 16S/18S rRNA)

Sequence Read Archive "SRA" data ((meta)-genomics/transciptomics)
- Sequencing projects (e.g. Bioproject)
- Samples (e.g. Biosample)
- Sequencing runs
- Whole genome shotgun sequencing project (WGS)
etc.

Protein/Enzyme data

PANGAEA does not accept data that characterizes proteins and enzymes. However references can be added to a dataset, e.g.:

Protein accession numbers (INSDC-databases, UniProt)
ORF /gene names /gene symbols
EC-numbers of IUBMB (International Union of Biochemistry and Molecular Biology)
etc.

In the future, EC-numbers will be used for crosslinking PANGAEA and BRENDA (BRaunschweig ENzyme DAtabase)

Experimental data

Experimental data refers to measurements and counts in the context of molecular biology. PANGAEA contains many different measurement quantities (simple and complex parameters). Listed are some examples:

Gene expression data
Protein content data
Protein production data
Enzyme activity data
FISH counts
etc.

Submission guideline for molecular data

Depending on the dataset, please consider the following rules and recommendations.

Gene names and symbols

Preferably use approved / official gene symbols in accordance with applicable nomenclatures
Provision of full gene product name (functional RNA, protein, enzyme)is appreciated due to possible ambiguities of gene symbols

Protein names and symbols

Gene and protein nomenclature are intertwined. As for gene symbols, follow the applicable nomenclatures
Enzyme names: Usage of accepted names by the nomenclature committee of IUBMB (International Union of Biochemistry and Molecular Biology) is strongly recommended.

Database accession numbers

An accession number is an unique identifier, usually having an alphanumerical format. In biosciences, the term is mainly used for genetic and protein sequence IDs, which refer to a certain entry within the database.

Genetic accession numbers

Only provide INSDC accession numbers! (PANGAEA resolves all INSDC-accession numbers with ENA)
Do not use other kinds of accession numbers (for example Gold Study IDs from the Genomes Online "GOLD" database
The NCBI GenInfo Identifier (GI) is no INSDC accession number and can not be resolved by PANGAEA.

Protein accession numbers

INSDC accession numbers for proteins can be resolved by PANGAEA as for genetic sequences and are therefore prefered.
Uniprot accession numbers are also allowed, but cannot be resolved.

Molecular data in PANGAEA

Contents

Definition of molecular data

Which kind of molecular data is accepted by PANGAEA?

Sequencing data (nucleotides)

Protein/Enzyme data

Experimental data

Submission guideline for molecular data

Gene names and symbols

Protein names and symbols

Database accession numbers

Genetic accession numbers

Protein accession numbers

Navigation menu

Molecular data in PANGAEA

Definition of molecular data

Which kind of molecular data is accepted by PANGAEA?

Sequencing data (nucleotides)

Protein/Enzyme data

Experimental data

Submission guideline for molecular data

Gene names and symbols

Protein names and symbols

Database accession numbers

Genetic accession numbers

Protein accession numbers

Navigation menu

Search