Upload
anil-thanki
View
94
Download
1
Embed Size (px)
DESCRIPTION
The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species. P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser
Citation preview
The Genome Analysis Centre
Building Excellence in Genomics and Computa5onal Bioscience
Anil Thanki Scien&fic Programmer
Core Bioinforma&cs Team
Synteny Browser: Visualising rela&onships among species
14-‐May-‐2014
The Genome Analysis Centre The Genome Analysis Centre
Synteny Browser Concept
• Evolu&on • Synteny • Homology • Paralogous • Orthologous • Mul&ple sequence alignment • CIGAR line
• Ensembl Compara • Database • Synteny Browser Pipeline
• Examples in Browser • Future work
The Genome Analysis Centre The Genome Analysis Centre
Evolu;on
The Genome Analysis Centre The Genome Analysis Centre
Synteny
-‐ Synteny the occurrence of two or more genes on the same chromosome within one species
-‐ Conserved Synteny The occurrence of synteny of orthologous genes in two different organisms.
conserved synteny
human chr7
mouse chr5
mblab.wustl.edu/svn/mblab/trunk/web_internal/presenta&ons/MJB/Synteny.ppt synteny ppt
The Genome Analysis Centre The Genome Analysis Centre
Homology
… existence of shared ancestry between a pair of structures, or genes, in different species.
-‐ Wikipedia
The principle of homology:
The Genome Analysis Centre The Genome Analysis Centre
Orthologs
… if homologous sequences are inferred to be descended from the same ancestral sequence separated by a specia&on.
• i.e. Phenylalanine Hydroxylase (PheOH) • is an enzyme that catalyzes the hydroxyla&on of the aroma&c side-‐chain of phenylalanine to
generate tyrosine
h[p://www.bio.davidson.edu/courses/molbio/molstudents/spring2010/piper/orthologs.html
• Orthologs o_en, but not always, have the same func&on.
The Genome Analysis Centre The Genome Analysis Centre
Paralogs
… if homologous sequences were created by a duplica&on event within the genome.
• o_en belong to the same species, but this is not necessary • the haemoglobin gene of humans and the myoglobin gene of chimpanzees are paralogs
MATTHEW2262'S BLOG
The Genome Analysis Centre The Genome Analysis Centre
*.ology
• Ohnology • … paralogous genes that have originated by a process of
whole-‐genome duplica&on (WGD). • Xenology • Homologs resul&ng from horizontal gene transfer between
two organisms are termed xenologs. • Typically have similar func&on in both organisms. • Can have different func&ons, if the new environment is
vastly different for the horizontally moving gene. • Gametology • Gametology denotes the rela&onship between homologous
genes on nonrecombining, opposite sex chromosomes. -‐ Wikipedia
The Genome Analysis Centre The Genome Analysis Centre
Sequence Alignment Mul&ple Sequence Alignment • Sequence alignment of three or more sequences. • Used to infer homology
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
• CIGAR: 3M h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
• CIGAR: 3M1I h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
• CIGAR: 3M1I3M h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
• CIGAR: 3M1I3M1D h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line • Compact Idiosyncra&c Gapped Alignment Report • The CIGAR string is a mix of le[ers and numbers to represent the alignment in shorter and efficient way.
• They are used to indicate things like which bases align (either a match/mismatch) with the reference, are deleted from the reference, and are inser&ons that are not in the reference.
• E.g • Ref: CCATACTGAACTGACTAAC • Read: ACTAGAATGGCT
• Ref: C C A T A C T G A A C T G A C T A A C • Read: A C T A G A A T G G C T
• CIGAR: 3M1I3M1D5M h[p://genome.sph.umich.edu/wiki/SAM
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7M
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2:
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7M
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M3D
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M3D • CIGAR3:
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M3D • CIGAR3: 3M
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M3D • CIGAR3: 3MD
The Genome Analysis Centre The Genome Analysis Centre
CIGAR Line in Compara • CIGAR Line in Ensembl Compara • Read1: CCATACTGAACTGACTAAC • Read2: ACTAGAATGGCT • Read3: CCAACTAGAACTGACTAAC
• Read1: C C A T A C T - G A A C T G A C T A A C!• Read2: - - - - A C T A G A A - T G G C T - - -!• Read3: C C A - A C T A G A A C T G A C T A A C
• CIGAR1: 7MD12M • CIGAR2: 4D7MD5M3D • CIGAR3: 3MD16M
The Genome Analysis Centre The Genome Analysis Centre
Compara
• A single database which contains precalculated compara&ve genomics data
• Raw genomic sequence • Whole genome alignments • (tBLAT, BlastZ-‐net, PECAN)
• Syntenic regions (based on BlastZ-‐net)
• Protein Sequences • Raw Protein Alignments • Protein Family clusters • Protein trees • Gene orthology / paraology predic&ons
Stephen Fitzgerald
The Genome Analysis Centre The Genome Analysis Centre
Compara
• Compara is divided into main three parts: 1. General Tables (12) 2. Genomic alignments tables (8) 3. Gene trees and homologies tables (24)
• Link to Ensembl Core tables for core genomic informa&on • i.e. Gene Structure, Chromosome etc
compara
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
L
BC
A
EF
G
HI
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
L
BC
A
EF
G
HI
R1
R2
R3
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
L
BC
A
EF
G
HI
R1
R2
R3
A1 A2 A3
A6 A5 A4
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
A D
L
J M
O
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
A
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
A
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
A
BC
D
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
select m1.stable_id, m2.stable_id, m3.*, gam.CIGAR_line from member m1 join member m2 on (m1.canonical_member_id = m2.member_id) join gene_tree_node gtn1 on (m2.member_id = gtn1.member_id) join gene_tree_root gtr on (gtr.root_id = gtn1.root_id) join gene_align_member gam using (gene_align_id) join member m3 on (gam.member_id = m3.member_id) where gtr.clusterset_id = "default" and m1.source_name = "ENSEMBLGENE" and m1.stable_id = "LOC_Os01g01010";!
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
select m1.stable_id, m2.stable_id, m3.*, gam.CIGAR_line from member m1 join member m2 on (m1.canonical_member_id = m2.member_id) join gene_tree_node gtn1 on (m2.member_id = gtn1.member_id) join gene_tree_root gtr on (gtr.root_id = gtn1.root_id) join gene_align_member gam using (gene_align_id) join member m3 on (gam.member_id = m3.member_id) where gtr.clusterset_id = "default" and m1.source_name = "ENSEMBLGENE" and m1.stable_id = "LOC_Os01g01010";!
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! O. sa&va
e! Brachy
e! A. tauschii
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
select g.gene_id, g.seq_region_start as gene_start, g.seq_region_end as gene_end, g.seq_region_strand as gene_strand, g. description as gene_name, t.transcript_id, t.seq_region_start as transcript_start, t.seq_region_end as transcript_end, t.description as transcript_name, e.exon_id, e.seq_region_start as exon_start, e.seq_region_end as exon_end, tl.translation_id, tl.seq_start as translation_start, tl.seq_end as translation_end, tl.start_exon_id, tl.end_exon_id from gene g left join transcript t on t.gene_id = g.gene_id left join exon_transcript et on t.transcript_id = et.transcript_id left join exon e on et.exon_id = e.exon_id left join translation tl on tl.transcript_id = t.transcript_id where t.gene_id = g.gene_id and t.transcript_id = et.transcript_id and et.exon_id = e.exon_id and g.stable_id = ?";!
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! O. sa&va
e! Brachy
e! A. tauschii
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline
e! O. sa&va
e! Brachy
e! A. tauschii
e! COMPARA
Get Species Get Chromosomes
Get Member Genes
Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Reference: 1. Expand CIGAR
• 7MD12M
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Reference: 1. Expand CIGAR
• 7MD12M • MMMMMMMDMMMMMMMMMMMM
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Reference: 1. Expand CIGAR
• 7MD12M • MMMMMMMDMMMMMMMMMMMM
2. Mul&ply 3 &mes 1. MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMM
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Reference: 1. Expand CIGAR
• 7MD12M • MMMMMMMDMMMMMMMMMMMM
2. Mul&ply 3 &mes 1. MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMM 3. Split on exons
• MMMMMMMMMMMM MMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Reference: 1. Expand CIGAR
• 7MD12M • MMMMMMMDMMMMMMMMMMMM
2. Mul&ply 3 &mes 1. MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMMM 3. Split on exons
• MMMMMMMMMMMM MMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!
4. Draw
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Homologous gene: 1. Expand CIGAR
• MD9MD5M3D • MDMMMMMMMMMDMMMMMDDD
2. Mul&ply 3 &mes • MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM
MMMMMMMMMMMMDDDDDDDDD
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Homologous gene: 1. Expand CIGAR
• MD9MD5M3D • MDMMMMMMMMMDMMMMMDDD
2. Mul&ply 3 &mes • MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM
MMMMMMMMMMMMDDDDDDDDD 3. Map to Reference CIGAR
• MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!• MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Homologous gene: 1. Expand CIGAR
• MD9MD5M3D • MDMMMMMMMMMDMMMMMDDD
2. Mul&ply 3 &mes • MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM
MMMMMMMMMMMMDDDDDDDDD 3. Map to Reference CIGAR
• MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!• MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD
4. Split on exons matching to Reference • MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!• MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Homologous gene: 1. Expand CIGAR
• MD9MD5M3D • MDMMMMMMMMMDMMMMMDDD
2. Mul&ply 3 &mes • MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM
MMMMMMMMMMMMDDDDDDDDD 3. Map to Reference CIGAR
• MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!• MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD
4. Split on exons matching to Reference • MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!• MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!
5. Split on exons of homologous gene • MMMDDDMMMM MMMMMMM MMMMMMMMM MMMMMMMDDDM MMMMMMMMMM MMMMDD DDDDDDD
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
For Homologous gene: 1. Expand CIGAR
• MD9MD5M3D • MDMMMMMMMMMDMMMMMDDD
2. Mul&ply 3 &mes • MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMM
MMMMMMMMMMMMDDDDDDDDD 3. Map to Reference CIGAR
• MMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM!• MMMDDDMMMMMMMMMMMMMMMMMMMMMMMMMMMDDDMMMMMMMMMMMMMMMDDDDDDDDD
4. Split on exons matching to Reference • MMMMMMMMMM MMMMMMMMMM MDDDMMMMMM MMMMMMMMMMMMMMMMM MMMMMM MMMMMMM!• MMMDDDMMMM MMMMMMMMMM MMMMMMMMMM MMMDDDMMMMMMMMMMM MMMMDD DDDDDDD!
5. Split on exons of homologous gene • MMMDDDMMMM MMMMMMM MMMMMMMMM MMMMMMMDDDM MMMMMMMMMM MMMMDD DDDDDDD!
6. Draw
The Genome Analysis Centre The Genome Analysis Centre
Pipeline Get Species Get
Chromosomes Get Member
Genes Get Gene Tree with Alignment
Get Gene Structure Draw Genes Draw CIGARs
The Genome Analysis Centre The Genome Analysis Centre
Examples in Browser
• TGAC’s data • Ensembl core 73
The Genome Analysis Centre The Genome Analysis Centre
Examples in Browser
• TGAC’s data • Ensembl core 73
The Genome Analysis Centre The Genome Analysis Centre
Examples in Browser
• TGAC’s data • Ensembl core 73
The Genome Analysis Centre The Genome Analysis Centre
Examples in Browser
• TGAC’s data • Ensembl core 73
The Genome Analysis Centre The Genome Analysis Centre
Examples in Browser
• TGAC’s data • Ensembl core 73
The Genome Analysis Centre The Genome Analysis Centre
Exis;ng Tools -‐ Genomicus
The Genome Analysis Centre The Genome Analysis Centre
Exis;ng Tools -‐ SynView
The Genome Analysis Centre The Genome Analysis Centre
Future Work
• Synteny Browser to make tes&ng prototype – h[p://tgac-‐browser.tgac.ac.uk/homology_73_4way/
• Search for a par&cular gene • Set a homologous gene as a reference and look into gene tree
nodes related to it • Brassica, Wheat
– For different genome A,B and D • Release • BioJS
– Data format – CIGAR string to represent rela&onships
• Create MySQL views for easy query on Ensembl Compara tables
72
Acknowledgements
The Genome Analysis Centre
Robert Davey Group Leader, Sequencing Informa&cs,
TGAC
Sarah Ayling Group Leader, Computa&onal
Biology, TGAC
Mario Caccamo Director, TGAC
Javier Harrero Head of the Bill Lyons Informa&cs Centre, UCL Cancer Ins&tute
Past: TGAC
Gemy G. KaithakoVl Scien&fic
Programmer, Genome Analysis,
TGAC
Xingdong Bian
Scien&fic Programmer, Sequencing Informa&cs,
TGAC
THANKS!