48
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK

Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK

Embed Size (px)

Citation preview

Methods for Creating GO Annotations

Emily DimmerEuropean Bioinformatics Institute

Wellcome Trust Genome Campus

Cambridge

UK

The core information needed for a GO annotation

1. Database object (protein)e.g. Q9ARH1

2. GO term IDe.g. GO:0004674

3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro

4. Evidence codee.g. TAS

1. Database object (protein)e.g. Q9ARH1

2. GO term IDe.g. GO:0004674

3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro

4. Evidence codee.g. TAS

The core information needed for a GO annotation

1. Database object (protein)e.g. Q9ARH1

2. GO term IDe.g. GO:0004674

3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro

4. Evidence codee.g. TAS

The core information needed for a GO annotation

1. Database object (protein)e.g. Q9ARH1

2. GO term IDe.g. GO:0004674

3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro

4. Evidence codee.g. TAS

The core information needed for a GO annotation

GO Evidence Codes

Code Definition

IEA Inferred from Electronic Annotation

IDA Inferred from Direct Assay

IEP Inferred from Expression Pattern

IGI Inferred from Genetic Interaction

IMP Inferred from Mutant Phenotype

IPI Inferred from Physical Interaction

ISS Inferred from Sequence Similarity

TAS Traceable Author Statement

NAS Non-traceable Author Statement

RCA Reviewed Computational Analysis

IC Inferred from Curator

ND No Data

Manuallyannotated

• Every GO annotation includes an Evidence Code that gives information about the evidence from which the annotation has been made.

Additional fields can be used to further clarify an annotation

• Qualifiers

(NOT, contributes_to, colocalizes_with)

• ‘with’ data to provide users with more information on the method/experiment applied.

hSNF2H ATPase activity GO:0016887 IDA

Rsf-1 NOT ATPase activity GO:0016887 IDA

Annotations using the ‘NOT’ qualifier

Loyola et al. Mol Cell Biol. 2003 Oct;23(19):6759-68.

1. Its individual action

2. the action of the whole complex

To differentiate between these two types of annotations, if a protein does not possess the activity itself, the annotation has the contributes_to qualifier added

A protein which is part of a complex can be annotated to terms in that describe:

(Molecular Function terms)

Annotations using the ‘contributes_to’ qualifier

Cao et al. Mol Cell. 2005 Dec 22;20(6):845-54.

Bmi-1 ubiquitin-protein ligase activity IDA contributes_to

Ring1A ubiquitin-protein ligase activity IDA contributes_to

Pc3 ubiquitin-protein ligase activity IDA contributes_to

Ring1B ubiquitin-protein ligase activity IDA

Annotations using the ‘contributes_to’ qualifier

Annotations using the ‘colocalizes_with’ qualifier

• Used with cellular component terms

• To describe proteins that are transiently or peripherally associated with an organelle or complex

Meyer et al. J Cell Biol. 1997 Feb 24;136(4):775-88.

CENP-E condensed chromosome kinetochore IDA colocalizes_with

Annotations using additional identifiers in the ‘with’ column

• Provides further information to support the evidence code used in an annotation

For protein binding annotations…

Protein GO term Evidence Reference With

When transferring annotations based on sequence similarity…

Protein GO term Evidence Reference With

There are two main types of GO annotation:

Electronic Annotation

Manual Annotation

both these methods have their advantages

They can be easily distinguished by the ‘evidence code’ used.

Electronic Annotation

Fatty acid biosynthesis ( Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

MF_00527: Putative 3-methyladenine DNA glycosylase(HAMAP)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

GO:DNA repair

(GO:0006281)

• Very high-quality

•However these annotations often use high-level GO terms and provide little detail.

Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17

http://www.geneontology.org/GO.indices.shtml

Mappings of external concepts to GO

InterProScan

http://www.ebi.ac.uk/InterProScan

Output from InterProScan…

• High–quality, specific annotations made using:

• Peer-reviewed papers

• A range of evidence codes to categorize the types of evidence found in a paper

• very time consuming and requires trained biologists

Manual Annotation

Finding GO terms … …for chicken TaxREB107protein (Q8UWG7)

Component: cytoplasm GO:0005737

nucleolicytoplasmic

increased troponin I reporter geneactivity

positive modulator of skeletal muscle geneexpression

Component: nucleolus GO:0005730

Process: positive regulation of transcription GO:0045941

Process: positive regulation of skeletal muscle development GO:0048643

http://www.geneontology.org/GO.annotation.shtml

Aids for GO manual annotation

Many are on the GO Consortium tools page:

http://www.geneontology.org/GO.tools.shtml

GoPubMed gives an overview over literature abstracts taken from PubMed and categorizes them with Gene Ontology terms:

GoPubMed

http://gopubmed.org

GoPubMed

http://gopubmed.org

http://www.ebi.ac.uk/Rebholz-srv/whatizit

Whatizit

http://www.ebi.ac.uk/Rebholz-srv/whatizit

GO termsUniProt Ac’s

http://www.ebi.ac.uk/ego

http://www.godatabase.org

…and more varieties of browsers available on the GO Tools page:

http://www.geneontology.org/GO.tools.html

http://www.geneontology.org/GO.tools.html

Searching for GO terms

http://www.ebi.ac.uk/ego

http://www.ebi.ac.uk/ego

Exact match

GO annotation editors

• enhanced spreadsheets (e.g. Excel)

• Protein2GO (GOA)

• The GO Consortium is aware there is a need for a light-weight, generic GO annotation tool.

Enhanced Spreadsheets

• quick and cheap to start with

• however difficult to maintain/update a reasonable sized set of annotations

protein2go

Protein2GO

Protein2GO

Protein2GO

Protein2GO

Protein2GO

Protein2GO

Protein2GO

QuickGO : http://www.ebi.ac.uk/ego

Download and parse an entire gene association file…

…or look at annotations for a protein using one of the GO browsers or a database that integrates GO annotations.

How users can view GO annotations

http://www.geneontology.org/GO.current.annotations.shtml

http://www.ebi.ac.uk/goa

AcknowledgementsNicky Mulder Head of InterPro Evelyn Camon GOA CoordinatorDaniel Barrell GOA ProgrammerRachael Huntley GOA Curator

David Binns & John Maslen QuickGO, Protein2GO tools Achuthanunni C. Balakrishnan Text-2-GO

Jorge Duarte IPI sets

Midori Harris GO EditorJane Lomax GO CuratorAmelia Ireland GO CuratorJennifer Clarke GO Curator

Rolf Apweiler Head of Sequence Database Group The Gene Ontology Consortium and 1.5 members of GOA currently supported by an P41 grant from the National Human Genome Research Institute (NHGRI) [grant HG002273], GOA is also supported by core EMBL funding.