I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping –...

I529: Lab5 02/20/2009

AI : Kwangmin Choi

Today’s topics• Gene Ontology prediction/mapping– AmiGo

• http://amigo.geneontology.org/cgi-bin/amigo/go.cgi– PFP

• http://dragon.bio.purdue.edu/pfp/– GOtcha

• http://www.compbio.dundee.ac.uk/gotcha/

• Pathway prediction/mapping– KAAS

• http://www.genome.jp/kegg/kaas

Gene Ontology

• In a species-independent manner., the GO project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated

GO:biological process• A biological process is series of events accomplished by one

or more ordered assemblies of molecular functions. – E.g. cellular physiological process or signal transduction. – E.g. pyrimidine metabolic process or alpha-glucoside transport.

• It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps.

• A biological process is not equivalent to a pathway; at present, GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

GO: molecular functions• Molecular function describes activities, such as catalytic or

binding activities, that occur at the molecular level.

• GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions,

• GO milecular function terms do not specify where or when, or in what context, the action takes place. – E..g. (general) catalytic activity, transporter activity, or binding

etc.– E.g. (specific) adenylate cyclase activity, Toll receptor binding

GO: cellular components• A cellular component is just that, a component of a

cell, but with the proviso that it is part of some larger object;

• Less informative

• This may be an anatomical structure – e.g. rough endoplasmic reticulum or nucleus

• or a gene product group – e.g. ribosome, proteasome or a protein dimer

AmiGO• URL http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

• AmiGO is the official tool for searching and browsing the Gene Ontology database

• Simple blast search is provided (not useful)

• AmiGO consists of a controlled vocabulary of terms covering biological concepts, and a large number of genes or gene products whose attributes have been annotated using GO terms.

PFP (Automated Protein Function Prediction Server)

• Hawkins, T., Luban, S. and Kihara, D. 2006. Enhanced Automated Function Prediction Using Distantly Related Sequences and Contextual Association by PFP. Protein Science 15: 1550-6.

• The PFP algorithm has been shown to increase coverage of sequence-based function annotation more than fivefold by extending a PSI-BLAST search to extract and score GO terms individually

• It applies the Function Association Matrix (FAM), to score significantly associating pairs of annotations.

PFP method• PFP uses a scoring scheme to rank GO

annotations assigned to all of the most similar sequences according to – (1) their frequency of occurrence in those sequences – (2) the degree of similarity of the originating

sequence to the query.

• This is similar to the scoring basis for the R-value used by the GOtcha method to score annotations from pairwise alignment matches (Martin et al. 2004)

PFP method

• A GO term, fa

• s(fa) is the final score assigned to the GO term, fa • N is the number of the similar sequences retrieved by PSI-BLAST • E_value(i) is the E-value given to the sequence I• b = 2 (or log10[100]) to allow the use of sequence matches to an E-value of 100.• Function Association Matrix (FAM),

– fj is a GO term assigned to the sequence i. – P(fa | fj) is the conditional probability that fa is associated with fj, – c(fa, fj) is number of times fa and fj are assigned simultaneously to each sequence in UniProt – c(fj) is the total number of times fj appeared in UniProt, – μ is the size of one dimension of the FAM (i.e., the total number of unique GO terms)– ɛ is the pseudo-count.

• Web server http://dragon.bio.purdue.edu/pfp/queue/1168_kw.f.result.html

• Local installation– http://dragon.bio.purdue.edu/pfp/dist– Installed in /home/kwchoi/public_html/PFP– You need to specify the path of blastpgp – And also need BLOSUM62

PFP (Automated Protein Function Prediction Server)

• PFP output– /home/kwchoi/public_html/I529-09-lab/Lab5/Data/pfp_data

• Columns– 1: predicted GO term– 2: GO category (f/p/c)– 3: raw term score– 4: term p-value– 5: rank (by p-value)– 6: confidence to be exact match– 7: rank (by column 7)– 8: confidence within 2 edges on the GO DAG– 9: rank (by column 8)– 10: confidence within 4 edges on the GO DAG– 11: rank (by column 10)– 12: GO term short definition

GOtcha

• The GOtcha method – Martin et al. BMC Bioinformatics (2004) 5:178.

• GOtcha assigns functional terms transitively based upon sequence similarity.

• These terms are ranked by probability and displayed graphically on a subtree of Gene Ontology.

• GOtcha performs a BLAST search of the query sequence against individual well annotated genomes.

• Annotations are transitively assigned from all hits, with a score corresponding to the E-value, individual GO-terms receiving cumulative scores from multiple sequence similarity matches.

• Cumulative scores are normalized and, for each term, two scores are obtained – the I-score which is normalized to the root

node, – the C-score which is the cumulative score at

the root node.

• For each GO-term a precomputed scoring table is used to establish the assignment likelihood for that term given that I-score and that C-score. This is represented as a probability

Gotcha method

Pathway mapping

• E.g E.coli K-12 pathway (00300)

KAAS• KAAS (KEGG Automatic Annotation Server) provides

functional annotation of genes in a genome by BLAST comparisons against a manually curated set of ortholog groups in KEGG GENES.

• The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways.

• Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A., and Kanehisa, M.; KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-W185 (2007). [NAR]

KAAS• Web server: http://www.genome.jp/kegg/kaas/

• KAAS works best when a complete set of genes in a genome is known. Prepare query amino acid sequences and use the BBH (bi-directional best hit) method to assign orthologs.

• KAAS can also be used for a limited number of genes. Prepare query amino acid sequences and use the SBH (single-directional best hit) method to assign orthologs.

• When ESTs are comprehensive enough, a set of consensus contigs can be generated by the EGassembler server and used as a gene set for KAAS with the BBH method. Otherwise, use ESTs as they are with the SBH method.

KAAS workflow

Pathway mapping

• KAAS returns – KO list– KEGG Atlas Metabolism map [Create atlas]– Pathway maps [Create all maps]– Hierarchy files

• You can highlight KEGG maps using KEGG API– http://www.genome.jp/kegg/soap/doc/

keggapi_manual.html– See: color_pathway_by_objects

I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping –...

Documents

[Lab5]IC Compiler

Lab5 SedimentaryRocks December2018

Guia Teorica Lab5

Verilog Lab5

Lab5 Nervous tissue skin

Lab5- Variowin Basics.ppt

EFLORES - LAB5

LAB5 FISICA2

I529 - Lab 6 GO+MEME + TwinScan

Micro Lab5

lab5-6 def

Flow Measurement Lab5

LAB5 - Copy

LAB5 FISICA

Biol205 lab5 introduction

I529: Machine Learning in Bioinformatics

Python programming lab5

Lab5- Variowin Basics

Lab5 Intro & Lab5

E45 Lab5 Copy