Intern:Lipids in PANGAEA

=Definition= Lipids constitute one of the four major classes of compounds found in living tissues (the others being carbohydrates, proteins, and nucleic acids) and they include: (1) fatty acids; (2) neutral fats (i.e. triacylglycerols), other fatty acid esters, and soaps; (3) long-chain (or fatty) alcohols and waxes; (4) sphingoids and other long-chain bases; (5) glycolipids, phospholipids, and sphingolipids; and (6) carotenes, polyprenols, sterols (and related compounds), terpenes, and other isoprenoids. (...)" (Oxford Dictionary of Biochemistry and Molecular Biology (2000), p.379)

=PANGAEA-rules for lipid data and features=
 * Naming of parameters and features shall be convenient with IUPAC systematic and semisystematic nomenclatures.
 * The standard reference database for lipids is LipidMaps (preferable use for web description field).
 * The terminology catalogue adopts many terms of the LipidMaps ontology (not yet complete; with addition of some special stand alone classes: glycolipids, laderrane lipids, betaine lipids)
 * Further levels of molecule detail are structured comparable to Liebisch et al.
 * If not further specialised, a specific compound should be described by three synonymical features in the TC: trivial/retained name, semisystematic name, systematic IUPAC name
 * If applicable, each lipid compound term in the TC should have an InChI-key.
 * If no InChI-key but a SMILES is available (e.g. for generic compound classes), it should be added to the feature

=Challenges within lipid terminology=


 * lack of standardized vocabulary
 * differing classification schemes
 * many synonyms
 * old and incomplete IUPAC nomenclature

Use of abbreviations and retained names
Many lipids are big and complex molecules. To faciliate the handling of names, most scientists use retained names, developed their own specialised nomenclature and/or encrypt structural information in abbreviations. Moreover lipids are divided into several building blocks (especially for glycerolipids/triglycerides) and lipid families.

The following disadvantages occur for systematic names:


 * Systematic IUPAC names can become very complex and unwieldy. Moreover common lipid building blocks are less recognizable in systematic names (e.g. the glycerol backbone becomes propane-1,2,3-triol or propane-1,2,3-triyl respectively)


 * Many generic lipid structures can not be properly represented by classical systematic names. Due to IUPAC seniority rules and more complicated rules for stereochemistry, small changes of the molecular structure can change the whole name massively. Assigment of generic names for generic structures is not always possible with substitutive nomenclature (prefered nomenclature by IUPAC).


 * generation of pseudo-IUPAC names by scientists because of difficult rules
 * nomenclature has not been updated since 1999 (or even longer for specific groups)
 * for many lipid groups, no consistent nomenclature exists, yet

For an overview on common encoding abbreviations, see table 1 of this publication. Please note that usage of abbreviations can differ for specific lipid groups and always check the corresponding PANGAEA guidelines.

Also keep in mind, that some naming schemes have been deleveloped for a narrow variety of lipid compounds and that many naming alternatives and synonyms may exist.

Resolution of lipid identification
For general information on analysis techniques, read the Wikipedia article on Lipidomics.

tbc: mass spectrometry and output (eventually comparison to other techniques, which are not high throughput). Up to which level can be identified?

For glycerolipids and glycerophospholipids:

MS resolution levels (more info in material and methods, section 1)):

Level 1 (MS1): Only type of lipid (e.g. PC) and number of carbons and double bonds in total is known (--> e.g. PC 36:2), no structural information

Level 2 (MS2): Fatty acid carbons and double bonds are known, but not the sn-position or structure of the fatty acid (PC 16∶0_18∶1)

Level 3 (MS2) with intensity interpretation): Fatty acid position is known (PC 18∶1/16∶0)

Level 4: Double bond positions of fatty acid known(PC 18∶1 [9]/16∶0)

Level 4 can rarely be achieved, only by specialized laboratories. Unfortunately many scientists deduce detailed structural info from only knowing m/z values (m/z can only infer level 1 information)

Workflow of HPLC-MS-MS /ESI-MS analyses: Lipid extraction --> HPLC (seperates different lipid classes) --> ESI --> Mass Spectrometry 1 (=seperation by m/z) values) --> Mass Spectrometry 2 (fragmentation of the molecule and detection of fragments: identification of groups)

(see also: link1, )

Some analysis methods leave aways the LC-stept and directly transfer a crude extract to the ESI.

A general problem of mass spectrometry peak annotation is, that the annotation is often based on assumptions (ester-bonds instead of ether-bonds, even-chain instead of odd-chain)(read more here)

=Lipid systematics and nomenclature=

IUPAC nomenclature

 * Carotenoid nomenclature (1974) (PDF version)
 * IUPAC nomenclature on lipids (1976) (fatty acids and alcohols, glycerolipids, glycerophospholipids and old glycolipid rules)
 * Update: Derivatives of phosphatidic acid (1980)
 * Nomenclatures of "fat-soluble" vitamins (1973-1981) (A, D, E, K)
 * Prenol nomenclature (1986) (PDF version)
 * Steroid nomenclature (1989) (PDF version)
 * Updated glycolypid nomenclature (1997) (PDF version)
 * Natural products and related compounds (1999) (including steroids, terpenes, carotenes)
 * Chapter P-107 Lipids in IUPAC 2013 (p.1431 ff.) - a summary of past nomenclatures on lipids (excluding steroids), no added value.
 * Chapter P-101 Nomenclature of natural products in IUPAC 2013 (p.1294 ff.) - a summary of rules for several natural product groups (including steroids, terpenes, carotenes)

Not covered by IUPAC nomenclature:
 * ladderane lipids
 * glycerol dialkyl glycerol tetraethers
 * saccharolipids

Lipid Maps ontology and nomenclature
Lipids can be subdivided into several classes. The Lipid Maps database has developed the first internationally accepted Lipid Classification System with 8 categories (containing classes and subclasses):
 * Fatty Acyls [FA]
 * Glycerolipids [GL]
 * Glycerophospholipids [GP]
 * Sphingolipids [SP]
 * Sterol lipids [ST]
 * Prenol lipids [PR]
 * Saccharolipids [SL]
 * Polyketides [PK]

Statement of Lipid Maps for the construction of categories: "Lipids may be categorized based on their chemically functional backbone as polyketides, acylglycerols, sphingolipids, prenols, or saccharolipids. However, for historical and bioinformatics advantages, we chose to separate fatty acyls from other polyketides, the glycerophospholipids from the other glycerolipids, and sterol lipids from other prenols, resulting in a total of eight primary categories." (source)

The maximum depth of the lipid hierarchy is four (in most of the cases three). The subclasses devide lipids by further characteristics besides the backbone (e.g. headgroup type and sidechain composition for glycerolipids and glycerophospholipids).

The LipidMaps nomenclature has extended the IUPAC nomenclature on lipids, which is already very old (1976: fatty acids and glycerolipids, later added: glycolipids, prenols and steroids) and does not cover all recently discovered lipid classes. However, glycolipids (containing sugar residues in their head groups) have not received a seperate category within the Lipid Maps Classification System. Because the system rather focuses on the lipid backbone, the glycolipids are dispersed across the 8 different categories.

Publications on the Lipid Maps Classification System: Initial, Update

Basic information on the lipid classes can be found in the lipid maps tutorial. Overview on lipid classification and nomenclature can be found here.

Drawbacks of LipidMaps ontology:
 * no clear definitions and characteristics (criteria) for classification
 * no fine granularity classes
 * not machine-readible

LipidMaps Ontology adaption by PANGAEA
PANGAEA has adapted some LipidMaps ontology categories and classes within the terminology catalogue. However, the complete ontology is too detailed for the PANGAEA purpose.

Moreover, some special lipid categories have been additionally added for a better overview and because of their relevance for searches. The concrete lipids of these additional categories are dispersed in the original LipidMaps Ontology, but are gathered in the TC for easier retrieval. Additional categories:
 * glycerol dialkyl glycerol tetraether
 * phospholipids
 * glycolipids

Furthermore, PANGAEA deviates from the strict 1:1-relations by LipidMaps and allows multiple assignments within the terminology catalogue. Example: A fatty acid with double bonds and an oxo-functional group can relate to both, oxo fatty acids [FA0106] and unsaturated fatty acids [FA0103].

Shorthand notation by Liebisch et al.
Publication

The shorthand notation has been specifically developed by Liebisch et al. for mass spectrometry results. Because the resolution of mass spectrometry analyses can differ (see here) and since the molecule structure is rarely as detailed as LipidMaps entries (functional groups, bond types, locants and stereochemistry known), a more generic notation was developed, which is clearly refering to the level of detail, known.

The notation has been developed for fatty acyls, glycerolipids, glycerophospholipids, sphingosines and sterol lipids. However, please note, that the shorthand notation is better established for some groups than for others (good adoption for glycerolipids) and that not all databases use the notation (used by LipidHome, Swisslipids)

A good overview about the lipid hierarchy for glycero(phospho)lipids based on structural information content can be found here.

Notation for fatty acyls

 * Lipid class level:
 * formula: FA(m)
 * m = mass
 * only the class is known
 * Lipid species level:
 * formula: FA x:y
 * x = total number of carbons, y = number of chain double bonds
 * Only the class and overall number of carbons and double bonds is known
 * Lipid acyl level:
 * formula: FA x:y_z
 * x = number of carbons in chain, y = number of chain double bonds, z = kind and number of substituents (e.g. Me3 = three methyl groups)
 * The exact number of carbons and double bonds within the chain is known. Additionally, functional groups are known without position
 * Fatty acyl structure level (=LipidMaps level):
 * formula: FA x:y(z1,z2,z3,...)
 * x = number of carbons in chain, y = number of chain double bonds, z = position (and if known configuration) of double bonds, position of substituents (e.g. 3Me = methyl at position C3)
 * order of functional groups: Double bonds - OH - O - Me.

FA = Fatty Acid

OH = hydroxyl group

O = keto group

Please pay attention: Should not be confused with omega notation for fatty acids! Example: not otherwise mentioned, C16:2 stands for a straight chain fatty acid, however FA 16:2 potentially also includes methyl branched fatty acids.

Drawbacks of shorthand notation
The notations can be the same for different levels of detail (within the groups fatty acyls, glycerolipids and glycerophospholipids).

The amount of detail about the structure can thus not always be directly infered from the notation. The level of detail must be given, to know how generic the notation still is. Example:

=Glycerolipids= Glycerolipids are also called triglycerides. The common feature of all glycerolipids is the glycerol backbone.

Building blocks
Glycerol / Glycerine (prefix: glycero- or less commonly glyceryl-)

Diacyl

Dialkyl

DG /DAG = diacylglycerol

PE = phosphatidylenthanolamine or (sometimes wrongly used for phosphoethanolamine)

PC = phosphatidylcholine (sometimes wrongly used for phosphocholine)

Naming ambiguities and precautions

 * Prefix "phosphatidyl" can stand for glycerophospholipids with radyl chains (acyl, alkyl or 1-alkenyl) or only with acyl-chains. According to IUPAC, phosphatidyl only stands for acyl-substituted glycerophospholipids. (see also here)

Bonding types
tbc: 'O-' for alkyl and 'P-' for plasmalogen type bond acyl, alkyl and 1Z-alkenyl sidechains and the resulting bonding types

stereospecific numbering (sn-) nomenclature
Information: https://de.wikipedia.org/wiki/Sn-Nomenklatur (only German) https://en.wikipedia.org/wiki/Glycerophospholipid#Nomenclature_and_stereochemistry

Families of glycerolipids
=Information resources and databases=

Educational material
Lipid maps tutorials

Lipid Web

Lipid Library webpage

Cyberlipid (Search for specific lipids)

LipidomicNet Wiki

Powerpoint presentation on Intact Polar Lipids (concept, structure, classification, massspectrometry analyses basics)

LipidMaps lipid analysis methods, workshop lectures and overview picture

Lipid Maps Structure Database
The Lipid Maps Structure Database (LMSD)is seen as the standard reference database for lipidomics. The LipidMaps database:
 * created the first internationally accepted lipid ontology (classification)
 * is manually curated
 * has extended the IUPAC lipid nomenclatures
 * offers educational materials and many tools
 * does not allow generic entries (e.g. positions of double bonds, stereochemistry undefined)

Publication

SwissLipids
The SwissLipids database is developed by the Swiss-Prot, which is also responsible for the non-redundant UniProtKB/Swiss-Prot database. The SwissLipid database:
 * is manually curated
 * contains lipids with evidence as well as possible, in silico generated lipid structures
 * contains generic lipid entries
 * classifies lipids based on the Lipid Maps Ontology
 * generates abbreviations and offers additional ontological information until isomeric subspecies level using the notation scheme of Liebisch et al.
 * maps all hierarchy levels to ChEBI
 * best way to search: Using short-hand notation by Liebisch et al. or using chemical identifiers e.g. generic SMILES

Publication

Some lipids can not be found in Swisslipids (or with restrictions:
 * SGDG (sulfoquinovosyldiacylglycerol)
 * no generic MGDGs (monogalactosyldiacylglycerol);(only MHDG = Monohexosyldiacylglycerol; unspecified carbohydrate)

LipidHome
The LipidHome database is a database for in silico generated, theoratically possible lipid structures. The LipidHome database:
 * covers only glycerolipids and glycerophospholipids
 * does not contain systematic names
 * focusses strongly on mass spectrometry data
 * uses the LipidomicNetstandard lipid nomenclature of Liebisch et al.
 * uses the structure representation rules of ChEBI

Publication

LipidBank
Lipid Bank (Wiki)

Other databases
Discontinued:

Lipidat

Lipid Ontology LiPro
The LiPro ontology has been developed for the purpose of computer-automated classification of lipids based on the presence of characteristic functional groups (which can be automatically detected from a SMILES string).

However, the LiPro ontology became deprecated in 2016. The available ontology on BioPortal is incomplete (completely flat) and therefore useless.

However, the literature ressources offer usefull information on existing lipid ontologies and nomenclature including existing drawbacks: Publication 1, Publication 2, Publication 3

Lipid drawing tools
Lipid maps offers drawing tools for all lipid classes, which dramatically speed up the structure drawing process.

The corresponding backbone molecules are predefined and only the substituent information (number of carbons, double bonds, specific funtional groups, stereochemistry etc.) must be added.

Identifiers such as InChI-keys can not be directly obtained. For this purpose, download either the MDL MOL file or the SDF file and open them in a drawing tool such as BIOVIA:


 * MDL MOL file
 * Directly opened in BIOVIA
 * Use the Text-to-Structure Function to generate InChI-Strings/-keys and IUPAC names
 * Enantiomers included:
 * Textbox "AND Enantiomers" encodes for racemic mixtures (included in IUPAC name). Delete textbox if only depicted molecule is of interest.
 * Please note, that InChI can not encode for mixtures. Only depicted molecule is translated.
 * Molecule can be adapted in BIOVIA
 * SDF file
 * Opens SD Viewer Box
 * Automatical additional information: IUPAC name, InChIkey, Systematic name (lipid nomenclature name)
 * Please note, that InChI can not encode for mixtures. Only depicted molecule is translated.
 * No adaptions possible!