Chemical data in PANGAEA

From PANGAEA Wiki
Jump to: navigation, search

This wiki-article is meant to guide scientists for submission of chemical data. Please read at least the short summary of rules before submitting your data! This will facilitate your communication with our curators and increases the quality of your data.

Definition of chemical data

The term "chemical data" summarizes all kinds of data, which are associated with organic and inorganic chemistry, mainly biomarker and ion concentrations.

Short summary of rules

  • Avoid abbreviations in measurement parameter names
  • Preferably, provide up-to-date systematic IUPAC names
  • Additional database-crossreferences or chemical identifiers are appreciated, but optional
  • Describe the sample medium: Where was it measured? (e.g. sediment, water, air, ...)
  • Always provide measurement parameters with SI-units
  • If possible: What has been measured? (Please provide a quantity e.g. mass, concentration, ... of sth.)
  • Think about it: Is my data comprehensible for others?


Which kind of chemical data can be found in PANGAEA?

The list below is not complete and shall only give a glimpse of chemical data in PANGAEA.

  • Organic chemistry data
    • Biomarker concentrations:
      • membrane lipids (e.g. fatty acids, triacyl glycrides, glycerol dialkyl glycerol tetraethers, sterol derivates etc.)
      • alkanes and alkenes
      • (...)
    • Amino acid concentrations
    • Pigment concentrations
      • Chlorophylls
      • Carotenoids
      • (...)
    • Organic pollutants
      • Polycyclic aromatic hydrocarbons(PAHs)
      • Polychlorinated and -brominated biphenyls, diphenyl ether and naphthalenes
    • (...)
  • Inorganic chemistry data
    • Ion concentrations
    • (...)
  • Isotopical data
    • Isotope ratios
    • (...)


Submission guideline for chemical data

Please consider the rules in the following chapters, if chemical terms are contained in your data or in the parameters you measured.

Chemical molecules

Molecule abbreviations and codes

As far as possible, abbreviations, code-names and sum-formulas in names of your measured parameters should be avoided. Please use the full length names instead. This will help our curators and fellow scientists to understand your data.
Proper abbreviations of table headers for visual display on our website will be chosen by our curators in compliance with existing parameters in our database.
In exceptional cases, i.e. when compound names can not be systematically assigned (e.g. due to analytical limitations) or if the name length would exceed readability, abbreviations can be kept. Please communicate this circumstance with your curator during the submission process.

Molecule names

Full length molecule names should be used for measurement-parameters, whenever possible. Preferably, systematic IUPAC names should be given. If the systematic name is too long, trivial names or semisystematic names may be used.
The following types of molecule names exist:

  • Trivial names
    • Common names
    • Retained names
    • Proprietary names
  • Semisystematic names
  • Systematic names


Definitions:
Trivial names: Names, that do not follow a nomenclature.
Common names: Kind of a trivial name. Commonly used names, especially in spoken language. Can be very informal. e.g. "baking soda"
Retained names: Kind of a trivial name. Accepted by IUPAC for use in systematic names (e.g. furan).
Proprietary names: Brand names for chemical products. E.g. Freon.
Semisystematic names: Mixture between retained and systematic names for simplification purposes. Especially used for natural products (e.g. steroids)
Systematic names: Names following strict nomenclatural rules.

Recommended nomenclatures

PANGAEA recommends the utilization of the most recent nomenclature, published by the International Union of Pure and Applied Chemistry (IUPAC)

  • Organic Chemistry
    • Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 (ISBN:9781849733069 1849733066)
  • Inorganic Chemistry
    • Nomenclature of Inorganic Chemistry: IUPAC Recommendations 2005 (PDF, ISBN:978-0-85404-438-2)
  • Specific nomenclatures (e.g. for natural products)


Identifiers, Database-Crosslinks and Ontologies

Provision of chemical identifiers or crosslinks to chemical database entries and ontologies may help to avoid ambiguities and improves the comprehensibility of your data for other scientists.
The addition of identifiers or URIs (Uniform Resource Identifier, e.g. a hyperlink) is optional, but can especially faciliate communication with your curator.
Furthermore, PANGAEA plans to use these connectors in future, to interlink data from different repositories, increasing your reach.
PANGAEA especially encourages the provision of one of the following references, however you can also provide different resources (Chemspider, Lipidmaps, Wikipedia etc.)

ChEBI

The Chemical Entities of Biological Interest (ChEBI) represent the most commonly used chemical ontology as well as a database. ChEBI enables machine readability and is manually curated. PANGAEA links any chemical compound names to ChEBI-entries, if available.
If you decide to reference ChEBI, please include the ChEBI-ID.

PubChem

PANGAEA uses PubChem as an alternative reference database, if no entries in ChEBI are available. If you want to provide a hyperlink to PubChem, please quote the PubChem compound entry and the corresponding CID (compound identifier). Compound entries in PubChem summarize all data for a unique chemical structure, but please be aware, that the entries are not manually curated.

InChI-keys

The International Chemical Identifier (InChI) is calculated from the chemical structure of a compound by an algorithm and is thus a direct structure representation. The InChI-key is derived from InChI and is a fixed-length, unique identifier, which is perfectly suited for database search and crosslinking. In contrast to CAS-numbers, InChI-keys are a non-proprietary international standard.

Sample medium

Please make sure, that you provide the information, in which sample medium the chemical compound has been measured.
Examples are: in sediment, water, air, ice, a living being (species name).
Sample media should not only be given for relative measurements (e.g. chemical compound per unit sediment mass [µg/g]), but also for absolute measurements (e.g. heavy metal content in porewater [g]) and fractions (e.g. hexadecanoic acid of total fatty acid in sediment core [%])
Our curators will decide, if this information should be placed within the parameter name, parameter comment or dataset description.

Chemical quantities

A measurement consists of a quantity (what has been measured, e.g. mass, concentration, etc.) and a unit. Please always provide a unit for each measurement parameter. The usage of quantity terms is strongly encouraged, when submitting data, although it is often left away and indirectly infered from the measurement unit.

Measurement units

Units should use symbols according to the International System of Units. Please avoid non-SI-units if not indispensible.
Example: For radioactivity-measurements, please use the SI-derived unit "Becquerel" (Bq) instead of the non-SI-unit "Curie" (Ci).

Measurement quantities

The provision of measurement quantites will increase the comprehensibility and thus reusability of your data. Although quantities can often be inferred from the measurement unit, there can be ambiguous cases, especially for dimensionless units.
Example: Percentages (%) can represent for example mass fractions, volume fractions, mole fractions or even chromatographic signal strength fractions. Measurements will only be comparable, if the quantity is known.

Examples of good data submissions

This is an illustrative example, how you can improve your data submission:

Bad example Good example Explanation
C16:0 [ppm] hexadecanoic acid in sediment [μg/g]
  • Use IUPAC syst. names
  • provide info about sample medium
  • use SI-conform units