Upload
monica-munoz-torres
View
353
Download
7
Embed Size (px)
Citation preview
APOLLO + i5KCol laborat ive Curation and Interact ive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunoztoNathan Dunn, Monica Poelchau, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik, and Suzanna Lewis. Berkeley Bioinformatics Open-Source Projects (BBOP)Genomics Division, Lawrence Berkeley National LaboratoryXXIII Plant and Animal Genome Conference. San Diego, CA. January 14, 2015
OUTLINE
• CURATING GENOMES steps involved
• MANUAL ANNOTATION
is necessary, but does not always scale • WEB APOLLO
empowering curators • i5K
pursuing common goals
Web Apollo CollaboraHve CuraHon and InteracHve Analysis of Genomes
2
CURATING GENOMESsteps involved
1 Crea-on of Gene Models calling ORFs, one or more rounds of gene predicHon, etc.
2 Annota-on of gene models Describing funcHon, expression paNerns, and metabolic network memberships.
3 Manual annota-on
CURATING GENOMES 3
AUTOMATED ANNOTATIONremains an imperfect art
Unlike the more highly polished genomes of earlier projects, today: a. lower coverage. b. more frequent assembly errors and annotaHon of genes across
mulHple scaffolds. c. automated genome annotaHons must be curated to resolve
discrepancies, providing clarity and validaHon.
CURATING GENOMES 4
Image: www.BroadInsHtute.org
ACCURACY OF ANNOTATION … it depends
EXAMPLE v Eight methods for differenHal alternaHve
splicing detecHon in plants, using RNAseq. v Conclusion: NO single method performs
the best in all situaHons.
“The accuracy of annota/on has a major impact on which method should be chosen for analysis.”
CURATING GENOMES 5
Liu et al. BMC BioinformaHcs 2014, 15:364
6
MANUAL ANNOTATIONobjectives
IdenHfies elements that best represent the underlying biology (including missing genes) and eliminates elements that reflect systemic errors of automated analyses.
Assigns funcHon through comparaHve analysis of similar genome elements from closely related species using literature, databases, and researchers’ lab data.
1
2
MANUAL ANNOTATION
hNp://GeneOntology.org
BUT, MANUAL CURATIONdoes not always scale
A small group of highly trained experts; e.g. GO
1 Museum
A few very good biologists and a few very good bioinformaHcians camp together, during intense but short periods of Hme.
Jamboree 2
Researchers work by themselves, then may or may not publicize results; may be a dead-‐end with very few people ever aware of these results.
Co?age 3
Elsik et al. 2006. Genome Res. 16(11):1329-‐33.
MANUAL ANNOTATION 7
Too many sequences and not enough hands to approach curaHon.
POWER TO THE CURATORSaugment existing tools
Fill in the gap for all the things that won’t be easy to cover with these approaches and allow researchers to beNer contribute their efforts.
Give more people the power to curate! Big data are not a subs/tute for, but a supplement to tradi/onal data collec/on and analysis.
The Parable of Google Flu. Lazer et al. 2014. Science 343 (6176): 1203-‐1205.
v Enable more curators to work
v Enable beNer scienHfic publishing
v Credit curators for their work
WEB APOLLO 8
GENOME ANNOTATIONan inherently collaborative task
Researchers ofen turn to colleagues for second opinions and insight from those with experHse in parHcular areas (e.g., domains, families). To facilitate and encourage this, we conHnue to improve Apollo.
WEB APOLLO 9
v Web based for easy access. v Concurrent access supports real Hme collaboraHon. v Built-‐in support for standards (transparently compliant). v AutomaHc generaHon of ready-‐made computable data. v Client-‐side applicaHon relieves server boNleneck and supports privacy. v Supports annotaHon of genes, pseudogenes, tRNAs, snRNAs,
snoRNAs, ncRNAs, miRNAs, TEs, and repeats.
The new Javascript-‐based Apollo :
COLLABORATIONSalso crowdsourcing development
v New avenues for landing on Apollo and customizaHon of addiHonal applicaHons.
v Web services for alignment and funcHonal annotaHon tools. v RNAseq datasets being used to re-‐annotate the bovine genome, finding
genes that neither RefSeq nor Ensembl predicted. Also creaHng track of disagreement between sets.
v Bovine genome consorHum making previous iteraHons of manual annotaHon
efforts (from 3 assemblies ago) available for integraHon of curated models.
WEB APOLLO 10
UNIVERSITY of MISSOURI
National Agricultural Library
i5K5,000 insects and related Arthropod species
v Species are selected in an effort to beNer understand arthropod evoluHon and phylogeny through: v worldwide agriculture v food safety v medicine v energy producHon v models in biology v those species most abundant in world ecosystems v every branch of the insect phylogeny
v Each new genome requires visualizaHon and curaHon!
APOLLO + i5K 11
National Agricultural Library
hNp://arthropodgenomes.org/wiki/i5K
i5Kwho can join?
v All Arthropods are welcome! v Pilot project: 39 species
v 3 with completed manual annotaHon v 25 undergoing manual annotaHon
v We offer a plaiorm for collaboraHve genome analysis.
v We do not offer funding for sequencing projects.
APOLLO + i5K 12
National Agricultural Library
Wasmania auropunctata Phlebotomus papatasi
hNp://arthropodgenomes.org/wiki/i5K
i5Kcurrent workflow: pilot project
APOLLO + i5K 13
National Agricultural Library
Sequencing, assembly, & annotaHon
Research Plan
Select genes of interest
Calling all collaborators
Manual AnnotaHon
Merge automated &
manual annotaHons
• Set Hme frame • Training • Q&A
Update gene set for computaHonal
analysis
• Gatekeeping • More curaHon
CollaboraHve
ComputaHonal
PublicaHon
i5Ktools at workspace@NAL
v Web Apollo v RegistraHon module v DifferenHal user permissions
v Django BLAST v Queries mulHple species at once v Links directly to Apollo
v Species pages & Gene pages v project details, metrics, staHsHcs
v Widget to track all WA annotaHons
APOLLO + i5K 14
National Agricultural Library
Tripal, Chado, JBrowse, Apollo
National Agricultural Library
i5Kwhat we have learned
v Enabling collaboraHon has been very useful to communiHes v Data hosHng and administraHon at NAL facilitates process for many groups v You must enforce strict rules and formats v Metadata capture is a must; standards must be generated and enforced v Users prefer small bits of help info at a Hme, instead of lengthy manuals v The ideal assembly is of high quality and remains stable v InvesHng Hme and effort on a high quality set of automated gene predicHons
will pay off v Quality of manually annotated set will depend on the coordinator’s “whip”
APOLLO + i5K 15
National Agricultural Library
i5Khow to join
v Visit hNp://arthropodgenomes.org/wiki/i5K to sign up v Contact us!
Please tell us about your research interests and comment on the status and quality of sequencing / assembly / automated annotaHon for your genome of interest. @monimunozto | mcmunozt @ lbl.gov
v Check out the i5K Workspace@NAL at hNps://i5k.nal.usda.gov/
APOLLO + i5K 16
National Agricultural Library
FUTURE PLANSeducational tools
We are working with educators to make Web Apollo part of their curricula.
WEB APOLLO 17
Lecture Series.
In the classroom. At the lab.
Classroom exercises: from genome sequence to
hypothesis.
CuraHon group dedicated to producing educaHon materials for non-‐model organism communiHes.
Our team provides online documentaHon, hands-‐on
training, and rapid response to users.
ALL ARE WELCOMEcall or email to join the Apollo community
Open Call for Developers on the First Thursday of each month at 9:00AM (Pacific Time).
Message @monimunozto for details.
BBOP Projects 18
Join the conversaHon by submirng your email at hNps://lists.lbl.gov/sympa/subscribe/apollo
hNp://GenomeArchitect.org hNp://ArthropodGenomes.org/wiki/i5K
• Berkeley Bioinforma-cs Open-‐source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § ChrisHne G. Elsik (PI). University of Missouri.
• * Ian Holmes (PI). University of California Berkeley.
• Arthropod genomics community: i5K Steering CommiNee (esp. Sue Brown (Kansas State)), Alexie Papanicolaou (CSIRO), Monica Poelchau, Christopher Childers (USDA/NAL), fringy Richards, Dan Hughes, Kim Worley (HGSC-‐BCM), BGI, Oliver Niehuis (1KITE hNp://www.1kite.org/), and the Honey Bee Genome Sequencing ConsorHum.
• Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-‐AC02-‐05CH11231.
• Insect images used with permission: hNp://AlexanderWild.com
• For your a?en-on, thank you! Thank you. 19
Web Apollo
Nathan Dunn
Colin Diesh §
Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Web Apollo: hNp://GenomeArchitect.org
i5K: hNp://arthropodgenomes.org/wiki/i5K
GO: hNp://GeneOntology.org
Thanks!
NAL at USDA
Monica Poelchau
Christopher Childers
NAL team
HGSC at BCM
fringy Richards
Dan Hughes
Kim Worley
Web Apollo
Q-‐ratore