73
The Genome Analysis Centre Building Excellence in Genomics and Computa5onal Bioscience

Aequatous browser: Visualising complex similarity relationships among species

Embed Size (px)

DESCRIPTION

The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species. P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser

Citation preview

Page 1: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  

Building  Excellence  in  Genomics  and  Computa5onal  Bioscience  

Page 2: Aequatous browser:  Visualising complex similarity relationships among species

Anil  Thanki  Scien&fic  Programmer  

Core  Bioinforma&cs  Team    

[email protected]  

Synteny  Browser:    Visualising  rela&onships  among  species  

14-­‐May-­‐2014  

Page 3: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Synteny  Browser  Concept  

•  Evolu&on  •  Synteny  •  Homology  •  Paralogous  •  Orthologous  •  Mul&ple  sequence  alignment  •  CIGAR  line  

•  Ensembl  Compara  •  Database  •  Synteny  Browser  Pipeline  

•  Examples  in  Browser    •  Future  work  

Page 4: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Evolu;on  

Page 5: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Synteny  

-­‐   Synteny                                                                                                                                                                                              the  occurrence  of  two  or  more  genes  on  the  same  chromosome  within  one  species  

-­‐  Conserved  Synteny  The  occurrence  of  synteny  of  orthologous  genes  in  two  different  organisms.  

conserved  synteny  

human  chr7  

mouse  chr5  

mblab.wustl.edu/svn/mblab/trunk/web_internal/presenta&ons/MJB/Synteny.ppt  synteny  ppt  

Page 6: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Homology  

…  existence  of  shared  ancestry  between  a  pair  of  structures,  or  genes,  in  different  species.  

-­‐  Wikipedia  

The  principle  of  homology:  

Page 7: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Orthologs  

…  if  homologous  sequences  are  inferred  to  be  descended  from  the  same  ancestral  sequence  separated  by  a  specia&on.  

•  i.e.  Phenylalanine  Hydroxylase  (PheOH)  •  is  an  enzyme  that  catalyzes  the  hydroxyla&on  of  the  aroma&c  side-­‐chain  of  phenylalanine  to  

generate  tyrosine  

h[p://www.bio.davidson.edu/courses/molbio/molstudents/spring2010/piper/orthologs.html  

•  Orthologs  o_en,  but  not  always,  have  the  same  func&on.    

Page 8: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Paralogs  

…  if  homologous  sequences  were  created  by  a  duplica&on  event  within  the  genome.  

•  o_en  belong  to  the  same  species,  but  this  is  not  necessary  •  the  haemoglobin  gene  of  humans  and  the  myoglobin  gene  of  chimpanzees  are  paralogs  

MATTHEW2262'S  BLOG  

Page 9: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   *.ology  

•  Ohnology  •  …  paralogous  genes  that  have  originated  by  a  process  of  

whole-­‐genome  duplica&on  (WGD).    •  Xenology  •  Homologs  resul&ng  from  horizontal  gene  transfer  between  

two  organisms  are  termed  xenologs.    •  Typically  have  similar  func&on  in  both  organisms.    •  Can  have  different  func&ons,  if  the  new  environment  is  

vastly  different  for  the  horizontally  moving  gene.    •  Gametology  •  Gametology  denotes  the  rela&onship  between  homologous  

genes  on  nonrecombining,  opposite  sex  chromosomes.    -­‐  Wikipedia  

Page 10: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Sequence  Alignment  Mul&ple  Sequence  Alignment  •  Sequence  alignment  of  three  or  more  sequences.  •  Used  to  infer  homology    

Page 11: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  

Page 12: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  

Page 13: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

Page 14: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

h[p://genome.sph.umich.edu/wiki/SAM  

Page 15: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

•  CIGAR:  3M  h[p://genome.sph.umich.edu/wiki/SAM  

Page 16: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

•  CIGAR:  3M1I  h[p://genome.sph.umich.edu/wiki/SAM  

Page 17: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

•  CIGAR:  3M1I3M  h[p://genome.sph.umich.edu/wiki/SAM  

Page 18: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

•  CIGAR:  3M1I3M1D  h[p://genome.sph.umich.edu/wiki/SAM  

Page 19: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  •  Compact  Idiosyncra&c  Gapped  Alignment  Report    •  The  CIGAR  string  is  a  mix  of  le[ers  and  numbers  to  represent  the  alignment  in  shorter  and  efficient  way.  

•  They  are  used  to  indicate  things  like  which  bases  align  (either  a  match/mismatch)  with  the  reference,  are  deleted  from  the  reference,  and  are  inser&ons  that  are  not  in  the  reference.  

•  E.g  •  Ref:        CCATACTGAACTGACTAAC  •  Read:  ACTAGAATGGCT  

•  Ref:        C  C  A  T  A  C  T          G  A  A  C  T  G  A  C  T  A  A  C  •  Read:                              A  C  T  A  G  A  A        T  G  G  C  T  

•  CIGAR:  3M1I3M1D5M  h[p://genome.sph.umich.edu/wiki/SAM  

Page 20: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC    

Page 21: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

Page 22: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7M  

Page 23: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD  

Page 24: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  

Page 25: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:    

Page 26: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D  

Page 27: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7M  

Page 28: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD  

Page 29: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M  

Page 30: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M3D  

Page 31: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M3D  •  CIGAR3:    

Page 32: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M3D  •  CIGAR3:  3M  

Page 33: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M3D  •  CIGAR3:  3MD  

Page 34: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 CIGAR  Line  in            Compara  •  CIGAR  Line  in  Ensembl  Compara  •  Read1:  CCATACTGAACTGACTAAC  •  Read2:  ACTAGAATGGCT  •  Read3:  CCAACTAGAACTGACTAAC  

•  Read1:        C C A T A C T - G A A C T G A C T A A C!•  Read2:        - - - - A C T A G A A - T G G C T - - -!•  Read3:        C C A - A C T A G A A C T G A C T A A C  

•  CIGAR1:  7MD12M  •  CIGAR2:  4D7MD5M3D  •  CIGAR3:  3MD16M  

Page 35: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

           Compara  

•  A  single  database  which  contains  precalculated  compara&ve  genomics  data  

•  Raw  genomic  sequence  •  Whole  genome  alignments  •  (tBLAT,  BlastZ-­‐net,  PECAN)  

•  Syntenic  regions  (based  on  BlastZ-­‐net)  

•  Protein  Sequences    •  Raw  Protein  Alignments  •  Protein  Family  clusters  •  Protein  trees  •  Gene  orthology  /  paraology  predic&ons  

    Stephen Fitzgerald

 

Page 36: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

           Compara  

•  Compara  is  divided  into  main  three  parts:  1.  General  Tables  (12)  2.  Genomic  alignments  tables  (8)  3.  Gene  trees  and  homologies  tables  (24)  

•  Link  to  Ensembl  Core  tables  for  core  genomic  informa&on  •  i.e.  Gene  Structure,  Chromosome  etc  

compara  

Page 37: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species   Get  

Chromosomes  Get  Member  

Genes  Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 38: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species   Get  

Chromosomes  Get  Member  

Genes  Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 39: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species   Get  

Chromosomes  Get  Member  

Genes  Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 40: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species   Get  

Chromosomes  Get  Member  

Genes  Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 41: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

L  

BC

A

EF  

G

HI  

Page 42: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

L  

BC

A

EF  

G

HI  

R1  

R2  

R3  

Page 43: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

L  

BC

A

EF  

G

HI  

R1  

R2  

R3  

A1  A2  A3  

A6  A5  A4  

Page 44: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

A  D  

L  

J  M

O

Page 45: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

A  

Page 46: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

A  

Page 47: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

A  

BC

D

Page 48: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

select m1.stable_id, m2.stable_id, m3.*, gam.CIGAR_line from member m1 join member m2 on (m1.canonical_member_id = m2.member_id) join gene_tree_node gtn1 on (m2.member_id = gtn1.member_id) join gene_tree_root gtr on (gtr.root_id = gtn1.root_id) join gene_align_member gam using (gene_align_id) join member m3 on (gam.member_id = m3.member_id) where gtr.clusterset_id = "default" and m1.source_name = "ENSEMBLGENE" and m1.stable_id = "LOC_Os01g01010";!

Page 49: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

select m1.stable_id, m2.stable_id, m3.*, gam.CIGAR_line from member m1 join member m2 on (m1.canonical_member_id = m2.member_id) join gene_tree_node gtn1 on (m2.member_id = gtn1.member_id) join gene_tree_root gtr on (gtr.root_id = gtn1.root_id) join gene_align_member gam using (gene_align_id) join member m3 on (gam.member_id = m3.member_id) where gtr.clusterset_id = "default" and m1.source_name = "ENSEMBLGENE" and m1.stable_id = "LOC_Os01g01010";!

Page 50: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  O.  sa&va  

e!  Brachy  

e!  A.  tauschii  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

select g.gene_id, g.seq_region_start as gene_start, g.seq_region_end as gene_end, g.seq_region_strand as gene_strand, g. description as gene_name, t.transcript_id, t.seq_region_start as transcript_start, t.seq_region_end as transcript_end, t.description as transcript_name, e.exon_id, e.seq_region_start as exon_start, e.seq_region_end as exon_end, tl.translation_id, tl.seq_start as translation_start, tl.seq_end as translation_end, tl.start_exon_id, tl.end_exon_id from gene g left join transcript t on t.gene_id = g.gene_id left join exon_transcript et on t.transcript_id = et.transcript_id left join exon e on et.exon_id = e.exon_id left join translation tl on tl.transcript_id = t.transcript_id where t.gene_id = g.gene_id and t.transcript_id = et.transcript_id and et.exon_id = e.exon_id and g.stable_id = ?";!

Page 51: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  O.  sa&va  

e!  Brachy  

e!  A.  tauschii  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 52: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  

e!  O.  sa&va  

e!  Brachy  

e!  A.  tauschii  

e!  COMPARA  

Get  Species     Get  Chromosomes                            

Get  Member  Genes                

Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 53: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Reference:  1.  Expand  CIGAR  

•  7MD12M  

Page 54: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Reference:  1.  Expand  CIGAR  

•  7MD12M  •  MMMMMMMDMMMMMMMMMMMM  

Page 55: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Reference:  1.  Expand  CIGAR  

•  7MD12M  •  MMMMMMMDMMMMMMMMMMMM  

2.  Mul&ply  3  &mes    1.  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMM  

Page 56: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Reference:  1.  Expand  CIGAR  

•  7MD12M  •  MMMMMMMDMMMMMMMMMMMM  

2.  Mul&ply  3  &mes    1.  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMM  3.  Split  on  exons    

•  MMMMMMMMMMMM MMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!

Page 57: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Reference:  1.  Expand  CIGAR  

•  7MD12M  •  MMMMMMMDMMMMMMMMMMMM  

2.  Mul&ply  3  &mes    1.  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMM  3.  Split  on  exons  

•  MMMMMMMMMMMM MMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!

4.  Draw  

Page 58: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Homologous  gene:  1.  Expand  CIGAR  

•  MD9MD5M3D  •  MDMMMMMMMMMDMMMMMDDD  

2.  Mul&ply  3  &mes    •  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM

MMMMMMMMMMMMDDDDDDDDD  

Page 59: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Homologous  gene:  1.  Expand  CIGAR  

•  MD9MD5M3D  •  MDMMMMMMMMMDMMMMMDDD  

2.  Mul&ply  3  &mes    •  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM

MMMMMMMMMMMMDDDDDDDDD  3.  Map  to  Reference  CIGAR    

•  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!•  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD  

Page 60: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Homologous  gene:  1.  Expand  CIGAR  

•  MD9MD5M3D  •  MDMMMMMMMMMDMMMMMDDD  

2.  Mul&ply  3  &mes    •  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM

MMMMMMMMMMMMDDDDDDDDD  3.  Map  to  Reference  CIGAR    

•  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!•  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD  

4.  Split  on  exons  matching  to  Reference  •  MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!•  MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!

Page 61: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Homologous  gene:  1.  Expand  CIGAR  

•  MD9MD5M3D  •  MDMMMMMMMMMDMMMMMDDD  

2.  Mul&ply  3  &mes    •  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM

MMMMMMMMMMMMDDDDDDDDD  3.  Map  to  Reference  CIGAR    

•  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!•  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD  

4.  Split  on  exons  matching  to  Reference  •  MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!•  MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!

5.  Split  on  exons  of  homologous  gene  •  MMMDDDMMMM MMMMMMM MMMMMMMMM MMMMMMMDDDM MMMMMMMMMM MMMMDD DDDDDDD  

Page 62: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

For  Homologous  gene:  1.  Expand  CIGAR  

•  MD9MD5M3D  •  MDMMMMMMMMMDMMMMMDDD  

2.  Mul&ply  3  &mes    •  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM

MMMMMMMMMMMMDDDDDDDDD  3.  Map  to  Reference  CIGAR    

•  MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!•  MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD  

4.  Split  on  exons  matching  to  Reference  •  MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!•  MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!

5.  Split  on  exons  of  homologous  gene  •  MMMDDDMMMM MMMMMMM MMMMMMMMM MMMMMMMDDDM MMMMMMMMMM MMMMDD DDDDDDD!

6.  Draw  

Page 63: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Pipeline  Get  Species     Get  

Chromosomes                            Get  Member  

Genes                Get  Gene  Tree  with  Alignment  

Get  Gene  Structure   Draw  Genes   Draw  CIGARs  

Page 64: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Examples  in  Browser  

•  TGAC’s  data  •  Ensembl  core  73  

Page 65: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Examples  in  Browser  

•  TGAC’s  data  •  Ensembl  core  73  

Page 66: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Examples  in  Browser  

•  TGAC’s  data  •  Ensembl  core  73  

Page 67: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Examples  in  Browser  

•  TGAC’s  data  •  Ensembl  core  73  

Page 68: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

 Examples  in  Browser  

•  TGAC’s  data  •  Ensembl  core  73  

Page 69: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Exis;ng  Tools  -­‐  Genomicus  

Page 70: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Exis;ng  Tools  -­‐  SynView  

Page 71: Aequatous browser:  Visualising complex similarity relationships among species

The  Genome  Analysis  Centre  The  Genome  Analysis  Centre  

   Future  Work  

•  Synteny  Browser  to  make  tes&ng  prototype  –  h[p://tgac-­‐browser.tgac.ac.uk/homology_73_4way/    

•  Search  for  a  par&cular  gene  •  Set  a  homologous  gene  as  a  reference  and  look  into  gene  tree  

nodes  related  to  it  •  Brassica,  Wheat  

–  For  different  genome  A,B  and  D  •  Release  •  BioJS  

–  Data  format  –  CIGAR  string  to  represent  rela&onships  

•  Create  MySQL  views  for  easy  query  on  Ensembl  Compara  tables  

Page 72: Aequatous browser:  Visualising complex similarity relationships among species

72  

   Acknowledgements  

The  Genome  Analysis  Centre  

Robert  Davey  Group  Leader,  Sequencing  Informa&cs,  

TGAC  

Sarah  Ayling  Group  Leader,    Computa&onal  

Biology,  TGAC  

Mario  Caccamo  Director,  TGAC  

Javier  Harrero  Head  of  the  Bill  Lyons  Informa&cs  Centre,  UCL  Cancer  Ins&tute    

Past:  TGAC  

Gemy  G.  KaithakoVl  Scien&fic  

Programmer,  Genome  Analysis,  

TGAC  

Xingdong  Bian  

Scien&fic  Programmer,  Sequencing  Informa&cs,  

TGAC  

Page 73: Aequatous browser:  Visualising complex similarity relationships among species

THANKS!