Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Lecture 3
Functional Annotation Strategies
Rationale
Blast2GO is a flexible framework for functional annotation Many parameters involved in the B2G annotation rule:
Blast vs. InterPro approach Annotation Score threshold
Abstraction Evidence Code Weights...
Different annotation strategies can be envisaged How they do behave? Which is the “best”?
Evaluation Strategy
32 annotation strategies
8 datasets
EVALUATE Annotation Intensity Annotation Accuracy
cis-annotation Impact on Functional Genomics
Annotation Guidelines
Different Annotation Strategies
strict
permissive
DataSets evaluated
Different EST and Protein DataSets
Results I: Blast result
The key factor for annotation success is a successful Blast results
Results II: Length effect
Annotation success is dependent on sequence length: Best > 400 nts (this is correlated with the chance to obtain positive Blast result)
Result III: Number of annotations
The number of annotated sequences increases with more
permissive annotation stiles But when adding InterPro, differences decrease
Annex does not increase # sequences
NO INTERPRO
Result III: Number of annotations
The # GO/sequence also increases from strict to permissive annotations. Annex increases #GO/seqs
Result III : Number of annotations
Once electronic annotations are enabled (ECw for IEA >0.7), annotation styles stabilize
ECw(IEA) = 0 ECw(IEA) > 0.7
Result III: : Number of annotations
Mean GO level stays practically the same thought all styles
The abstraction term GOw has a small but significant effect on GO level.
Results IV: InterPro & Annex
The more restrictive the Blast strategy the stronger the augmentation by InterProv
Augmentation by Annex is practically constant
Result V: Manual Curation
- Default parameters give the best accuracy - Strict annotation is less informative (less GO terms) - Generous and all-mapping is more informative but also more error-prone - InterPro alone annotates less sequences and with less GO terms
Results VI: Functional Genomics Enrichment analysis for 3 datasets
The more GO terms, the more enriched GO terms but semantically there were not many differences and the number of branches within each dataset was similar at different annotation styles
Results V: cis-annotation
By varying the %hit filter at annotation step, one could control possible cis-annotation errors
The major effect of setting a high %hit filter when annotation is a dramatic reduction in the number of annotated sequences, but changes on successfully annotated sequences are not high
# seqs in dataset
1556
33951
Take-home messages A positive Blast result and considering electronic evidences are the
key factors to sucessful annotation. Sequence length and quality is very important!!!
InterPro and Annex can increase annotation by 10-15% B2G default parameters are in general good and equivalent to what
one would annotate by a computational reviewed annotation procedure
The effect of annotation stringency on functional genomics tests is difficult to predict, but the more GO terms you have the more
enriched terms you can find. Allways a core functional message was found
Do not worry too much about erroneous cis-annotation!!
Our recommended annotation strategy
Use first B2G default settings
If many green sequences: annotate these with threshold 45
Add InterPro and Annex (in this order)
Check some protein families you might be interested in by keyword searching.
Improve this sequences manually (you can use the merge .annot
function for this)