18
Translating research data into Gene Ontology annotations Pascale Gaudet SIB – Swiss Institute of Bioinformatics GO Consortium

Translating research data into Gene Ontology annotations

Embed Size (px)

Citation preview

Translating research data into Gene Ontology annotations

Pascale GaudetSIB – Swiss Institute of Bioinformatics

GO Consortium

Ontology Annotations Model of biology

Gene Ontology Consortium What we provide

A structured representation of biology, composed of:

• Classes• Relations• Definitions

+ =

- Antigen binding- Adaptive immune response- Extracellular

IGHA1Immunoglobulinheavyconstantalpha1

- Glutamine-tRNAligase activity- Translation- Cytoplasm

QARSGln tRNA synthetase

Statements about the functions of specific gene products. 3 aspects: • Molecular function• Biological process• Cellular component

Representation of current knowledge in a manner that is: • Human

understandable• Machine computable

GO “annotations”§ An annotation is a statement linking a gene to

some aspect of its function (a GO ontology term)

§ Each annotation is based on some evidence, recorded as part of the annotation§ Evidence code (type of evidence)§ Reference (published journal article)

Examples:Annotation1:INSR+‘receptoractivity’Annotation2:INSR+‘plasmamembrane’Annotation3:INSR+‘insulinreceptorsignalingpathway’

Semantics of a GO annotationThe association of a GO class with a gene product is a statement that means:

§ molecular function: molecular activities of gene products

§ cellular component: where gene products are active§ biological process: pathways and larger processes

made up of the activities of multiple gene products.§ In other words, annotations represent the

normal, in vivo biological role of gene products

Manual- Literature-based Manual- Sequence-based Algorithmic(unreviewed)

How are annotations generated?

Ancomputerprogramanalysesasequencesandmakeapredictionbasedonsomedecisioncriteria,forexample:

-proteindomain(InterPro2GO)- sequencesimilarity(BLAST2GO)

Anexpertreviewstheliteratureandassignsfunctions,processesandcellularcomponentstogenesproducts

>500,000annotations >65MannotationsAnexpertanalysesasequenceandmakesaprediction concerningthegenefunctionbasedonknownfunctionsofrelatedsequences

Thepredictionscanbebasedontheknownfunctionofevolutionarilyrelatedsequences(phylogeneticrelationships)

>3Mannotations

Manual- Literature-based

Evidence types

Chibucos MC,Siegele DA,HuJC,Giglio M.(2017)EvidenceandconclusionontologyPMID:27812948

Manual- Sequence-based Algorithmic(unreviewed)

EXPexperimentalevidence

IDAinferredfromdirectassay

IPIinferredfromphysicalinteraction

IMPinferredfrommutantphenotype

ISSinferredfromsequencesimilarity

ISOinferredfromsequenceortholog

IBAinferredfrombiologicalaspectofancestor

IEAinferredfromelectronicannotation

Who produces GO annotations?• Model organism databases (SGD, FlyBase,

wormbase, MGI, etc)• Generalist databases, for eg UniProtKB, IntAct• Domain-specific projects: Cardiovascular project

(UCL), synapse project (VU), etc.• Anyone who wishes to contribute their expertise

and data to the project

Best practices for generating literature-based GO annotations

§ Ensure consistency of usage across a broad consortium of contributors

§ Improve inferencing capabilities

Focus on the research hypothesis§ Use prior knowledge to understand the hypothesis

being tested and its relation to the experimental observation

Protein Knownroles Hypothesis Assay Result Conclusionfor GODDFB(O76075) DNase Thenucleaseactivityof

DDFBisrequiredfornuclearDNAfragmentationduringapoptosis

ApoptoticDNAfragmentationincreasedinthepresenceofDDFB

DDFBmediatesnuclearDNAfragmentationduringapoptosis=apoptoticDNAfragmentation(GO:0006309)

FOXL2(P58012) Transcriptionfactor

MutationsinFOXL2areknowntocauseprematureovarianfailure,whichmaybeduetoincreasedapoptosis

ApoptoticDNAfragmentationincreasedinthepresenceofFOXL2

FOXL2increasestherateofapoptosis=positiveregulationofapoptoticprocess(GO:0043065)

Annotate the conclusion, not the assay

1) rubidium if often used to assay potassium transport,

because the radioactive form is more readily available;

- the physiologically relevant substrate is potassium

2) Protein kinases are often tested with non-physiologically

relevant substrates, such as histone

- if the authors do not discuss the physiological relevance,

one cannot annotate the substrate

On the in vivo relevance of phenotypes• Phenotypes can help understand the function of proteins• Phenotypes can insights into mechanisms leading to disease• The scope of the GO, though, is to capture the normal function of proteins

Indirect effects of a mutation- RNA polymerase affects essentially all cellular processes (cell

proliferation, development, etc) but does not mediate theseprocesses

Lack of hypothesis for a role of a protein in a process: - Knockdown of Tmem234 in zebrafish results defects in pronephric

glomerulus formation. Annotation by IMP to glomerulus formation isnot supported by any cellular/molecular data

Get the wider perspective• Favor a gene-by-gene or pathway-by-pathway

approach for curation rather than paper-by-paper

• Read recent publications

• Remove incorrect annotations based on invalidated

hypothesis

Guidelines for high quality annotations

• Annotate the conclusion of the experiment• Use the biological context to interpret the

experiments• Carefully select publications. Read recent

publications• Ensure consistency with existing annotations • Keep annotation up-to date: Remove obsolete

annotations

Other approaches for quality control

• Annotation consistency exercises• Taxonomic constraints• Co-occurrence of annotations• Phylogenetic annotations• User feedback

- from GO website- from PubMed- from databases

GO annotations in PubMed

Annotations for a paper

This talk was based upon

Acknowledgments• GO PIs• Judy Blake• Mike Cherry• Suzanna Lewis• Paul Sternberg• Paul Thomas

• GO Handbook contributors• Christophe Dessimoz• Jim Hu• Nives Skunca• Sylvain Poux

• Funding• NIH HG002273 (GO)