20
APOLLO + i5K Collaborative Curation and Interactive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunozto Nathan Dunn, Monica Poelchau, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik, and Suzanna Lewis. Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory XXIII Plant and Animal Genome Conference. San Diego, CA. January 14, 2015

Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

Embed Size (px)

Citation preview

Page 1: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

APOLLO + i5KCol laborat ive Curation and Interact ive Analysis of Genomes Monica Munoz-Torres, PhD | @monimunoztoNathan Dunn, Monica Poelchau, Ian Holmes, Colin Diesh, Deepak Unni, Christine Elsik, and Suzanna Lewis. Berkeley Bioinformatics Open-Source Projects (BBOP)Genomics Division, Lawrence Berkeley National LaboratoryXXIII Plant and Animal Genome Conference. San Diego, CA. January 14, 2015

Page 2: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

OUTLINE

•  CURATING  GENOMES  steps  involved  

 •  MANUAL  ANNOTATION  

is  necessary,  but  does  not  always  scale    •  WEB  APOLLO  

empowering  curators    •  i5K  

pursuing  common  goals  

Web  Apollo  CollaboraHve  CuraHon  and    InteracHve  Analysis  of  Genomes  

2

Page 3: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

CURATING GENOMESsteps involved

1  Crea-on  of  Gene  Models  calling  ORFs,  one  or  more  rounds  of  gene  predicHon,  etc.    

2  Annota-on  of  gene  models  Describing  funcHon,  expression  paNerns,    and  metabolic  network    memberships.  

3     Manual  annota-on  

CURATING GENOMES 3

Page 4: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

AUTOMATED ANNOTATIONremains an imperfect art

Unlike  the  more  highly  polished  genomes  of  earlier  projects,  today:  a.  lower  coverage.  b.  more  frequent  assembly  errors  and  annotaHon  of  genes  across  

mulHple  scaffolds.  c.  automated  genome  annotaHons  must  be  curated  to  resolve  

discrepancies,  providing  clarity  and  validaHon.  

CURATING GENOMES 4

Image:  www.BroadInsHtute.org    

Page 5: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

ACCURACY OF ANNOTATION … it depends

EXAMPLE    v  Eight  methods  for  differenHal  alternaHve  

splicing  detecHon  in  plants,  using  RNAseq.  v  Conclusion:  NO  single  method  performs  

the  best  in  all  situaHons.  

   “The  accuracy  of  annota/on  has  a  major  impact  on  which  method  should  be  chosen    for  analysis.”    

CURATING GENOMES 5

Liu  et  al.  BMC  BioinformaHcs  2014,  15:364  

Page 6: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

6

MANUAL ANNOTATIONobjectives

IdenHfies  elements  that  best  represent  the  underlying  biology  (including  missing  genes)  and  eliminates  elements  that  reflect  systemic  errors  of  automated  analyses.  

Assigns  funcHon  through  comparaHve  analysis  of  similar  genome  elements  from  closely  related  species  using  literature,  databases,  and  researchers’  lab  data.  

1  

2  

MANUAL ANNOTATION

hNp://GeneOntology.org  

Page 7: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

BUT, MANUAL CURATIONdoes not always scale

A  small  group  of  highly  trained  experts;  e.g.  GO  

1   Museum  

A  few  very  good  biologists  and  a  few  very  good  bioinformaHcians  camp  together,  during  intense  but  short  periods  of  Hme.  

Jamboree  2  

Researchers  work  by  themselves,  then  may  or  may  not  publicize  results;  may  be  a  dead-­‐end  with  very  few  people  ever  aware  of  these  results.  

Co?age  3  

Elsik  et  al.  2006.  Genome  Res.  16(11):1329-­‐33.  

MANUAL ANNOTATION 7

Too  many  sequences  and    not  enough  hands  to  approach  curaHon.  

Page 8: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

POWER TO THE CURATORSaugment existing tools

Fill   in   the   gap   for   all   the   things   that  won’t   be   easy   to   cover   with   these  approaches   and   allow   researchers   to  beNer  contribute  their  efforts.  

Give  more  people  the  power  to  curate!  Big   data   are   not   a   subs/tute   for,  but  a  supplement  to  tradi/onal  data  collec/on  and  analysis.  

The  Parable  of  Google  Flu.  Lazer  et  al.  2014.  Science  343  (6176):  1203-­‐1205.  

v Enable  more  curators  to  work  

v Enable  beNer  scienHfic  publishing  

v Credit  curators  for  their  work    

WEB APOLLO 8

Page 9: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

GENOME ANNOTATIONan inherently collaborative task

Researchers  ofen  turn  to  colleagues  for  second  opinions  and  insight  from  those  with  experHse  in  parHcular  areas  (e.g.,  domains,  families).  To  facilitate  and  encourage  this,  we  conHnue  to  improve  Apollo.  

WEB APOLLO 9

v  Web  based  for  easy  access.    v  Concurrent  access  supports  real  Hme  collaboraHon.    v  Built-­‐in  support  for  standards  (transparently  compliant).    v  AutomaHc  generaHon  of  ready-­‐made  computable  data.    v  Client-­‐side  applicaHon  relieves  server  boNleneck  and  supports  privacy.  v  Supports  annotaHon  of  genes,    pseudogenes,  tRNAs,  snRNAs,  

snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.  

The  new  Javascript-­‐based  Apollo              :    

Page 10: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

COLLABORATIONSalso crowdsourcing development

v  New  avenues  for  landing  on  Apollo  and  customizaHon  of  addiHonal  applicaHons.  

v  Web  services  for  alignment  and  funcHonal  annotaHon  tools.    v  RNAseq  datasets  being  used  to  re-­‐annotate  the  bovine  genome,  finding  

genes  that  neither  RefSeq  nor  Ensembl  predicted.  Also  creaHng  track  of  disagreement  between  sets.    

 v  Bovine  genome  consorHum  making  previous  iteraHons  of  manual  annotaHon  

efforts  (from  3  assemblies  ago)  available  for  integraHon  of  curated  models.  

WEB APOLLO 10

UNIVERSITY of MISSOURI

National Agricultural Library

Page 11: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5K5,000 insects and related Arthropod species

v  Species  are  selected  in  an  effort  to  beNer  understand  arthropod  evoluHon  and  phylogeny  through:  v  worldwide  agriculture  v  food  safety  v  medicine  v  energy  producHon    v  models  in  biology    v  those  species  most  abundant  in  world  ecosystems  v  every  branch  of  the  insect  phylogeny  

 v  Each  new  genome  requires  visualizaHon  and  curaHon!  

APOLLO + i5K 11

National Agricultural Library

hNp://arthropodgenomes.org/wiki/i5K  

Page 12: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5Kwho can join?

v  All  Arthropods  are  welcome!    v  Pilot  project:  39  species  

v  3  with  completed  manual  annotaHon  v  25  undergoing  manual  annotaHon  

v  We  offer  a  plaiorm  for    collaboraHve  genome  analysis.    

v  We  do  not  offer  funding  for  sequencing  projects.  

APOLLO + i5K 12

National Agricultural Library

Wasmania  auropunctata  Phlebotomus  papatasi  

hNp://arthropodgenomes.org/wiki/i5K  

Page 13: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5Kcurrent workflow: pilot project

APOLLO + i5K 13

National Agricultural Library

Sequencing,  assembly,  &  annotaHon  

Research  Plan  

Select  genes  of  interest  

Calling  all  collaborators  

Manual  AnnotaHon  

Merge  automated  &  

manual  annotaHons  

•  Set  Hme  frame  •  Training  •  Q&A  

Update  gene  set  for  computaHonal  

analysis  

•  Gatekeeping  •  More  curaHon  

CollaboraHve  

ComputaHonal  

PublicaHon  

Page 14: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5Ktools at workspace@NAL

v  Web  Apollo  v  RegistraHon  module  v  DifferenHal  user  permissions  

v  Django  BLAST  v  Queries  mulHple  species  at  once  v  Links  directly  to  Apollo  

v  Species  pages  &  Gene  pages  v  project  details,  metrics,  staHsHcs  

 v  Widget  to  track  all  WA  annotaHons  

APOLLO + i5K 14

National Agricultural Library

Tripal,  Chado,  JBrowse,  Apollo  

National Agricultural Library

Page 15: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5Kwhat we have learned

v  Enabling  collaboraHon  has  been  very  useful  to  communiHes  v  Data  hosHng  and  administraHon  at  NAL  facilitates  process  for  many  groups  v  You  must  enforce  strict  rules  and  formats  v  Metadata  capture  is  a  must;  standards  must  be  generated  and  enforced  v  Users  prefer  small  bits  of  help  info  at  a  Hme,  instead  of  lengthy  manuals  v  The  ideal  assembly  is  of  high  quality  and  remains  stable  v  InvesHng  Hme  and  effort  on  a  high  quality  set  of  automated  gene  predicHons  

will  pay  off  v  Quality  of  manually  annotated  set  will  depend  on  the  coordinator’s  “whip”  

APOLLO + i5K 15

National Agricultural Library

Page 16: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

i5Khow to join

v  Visit  hNp://arthropodgenomes.org/wiki/i5K  to  sign  up      v  Contact  us!    

Please  tell  us  about  your  research  interests  and  comment  on  the  status  and  quality  of  sequencing  /  assembly  /  automated  annotaHon  for  your  genome  of  interest.  @monimunozto  |  mcmunozt  @  lbl.gov  

 v  Check  out  the  i5K  Workspace@NAL  at  hNps://i5k.nal.usda.gov/    

APOLLO + i5K 16

National Agricultural Library

Page 17: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

FUTURE PLANSeducational tools

We  are  working  with  educators  to  make  Web  Apollo  part  of  their  curricula.  

WEB APOLLO 17

Lecture  Series.  

In  the  classroom.  At  the  lab.  

Classroom  exercises:  from  genome  sequence  to  

hypothesis.  

CuraHon  group  dedicated  to  producing  educaHon  materials  for  non-­‐model  organism  communiHes.  

Our  team  provides  online  documentaHon,  hands-­‐on  

training,  and  rapid  response  to  users.  

Page 18: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

ALL ARE WELCOMEcall or email to join the Apollo community

Open  Call  for  Developers  on  the  First  Thursday  of  each  month  at  9:00AM  (Pacific  Time).    

Message  @monimunozto  for  details.  

BBOP Projects 18

Join  the  conversaHon  by  submirng  your  email  at    hNps://lists.lbl.gov/sympa/subscribe/apollo  

hNp://GenomeArchitect.org    hNp://ArthropodGenomes.org/wiki/i5K  

Page 19: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

•  Berkeley  Bioinforma-cs  Open-­‐source  Projects  (BBOP),  Berkeley  Lab:  Web  Apollo  and  Gene  Ontology  teams.  Suzanna  E.  Lewis  (PI).  

•  §  ChrisHne  G.  Elsik  (PI).  University  of  Missouri.    

•  *  Ian  Holmes  (PI).  University  of  California  Berkeley.  

•  Arthropod  genomics  community:  i5K  Steering  CommiNee  (esp.  Sue  Brown  (Kansas  State)),  Alexie  Papanicolaou  (CSIRO),  Monica  Poelchau,  Christopher  Childers  (USDA/NAL),  fringy  Richards,  Dan  Hughes,  Kim  Worley  (HGSC-­‐BCM),  BGI,  Oliver  Niehuis  (1KITE  hNp://www.1kite.org/),  and  the  Honey  Bee  Genome  Sequencing  ConsorHum.  

•  Web  Apollo  is  supported  by  NIH  grants  5R01GM080203  from  NIGMS,  and  5R01HG004483  from  NHGRI,  and  by  the  Director,  Office  of  Science,  Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department  of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231.  

•  Insect  images  used  with  permission:  hNp://AlexanderWild.com  

•  For  your  a?en-on,  thank  you!  Thank you. 19

Web  Apollo  

Nathan  Dunn  

Colin  Diesh  §  

Deepak  Unni  §    

 

Gene  Ontology  

Chris  Mungall  

Seth  Carbon  

Heiko  Dietze  

 

BBOP  

Web  Apollo:  hNp://GenomeArchitect.org    

i5K:  hNp://arthropodgenomes.org/wiki/i5K  

GO:  hNp://GeneOntology.org  

Thanks!  

NAL  at  USDA  

Monica  Poelchau  

Christopher  Childers  

NAL  team  

HGSC  at  BCM  

fringy  Richards  

Dan  Hughes  

Kim  Worley  

 

Page 20: Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes

Web  Apollo  

Q-­‐ratore