Transcript
Page 1: SyMAP Master's Thesis Presentation

SyMAPSyMAP

Synteny Mapping and Analysis ProgramSynteny Mapping and Analysis Program

Austin ShoemakerAustin Shoemaker

Page 2: SyMAP Master's Thesis Presentation

SyMAP TeamSyMAP Team

• Dr. Cari Soderlund• Dr. Will Nelson• Austin Shoemaker

– Interactive SyMAP views– Sytry

• Testing environment for synteny finding algorithms

– Worked with the team on:• The synteny finding algorithm• MySQL database schema

Page 3: SyMAP Master's Thesis Presentation

BackgroundBackground

• Comparative Genomics

• Physical Map

• Computing Synteny

• Properties of FPC to Genome Synteny

Page 4: SyMAP Master's Thesis Presentation

Comparative GenomicsComparative Genomics

• Compare genomes of different species• Knowledge of one helps understand the

other– Gene Function

• Organism O1 has a gene G1

• Organism O2 has a gene G2 with a sequence similar to G1

• G1 and G2 may have similar functions

– Evolutionary History• Genome rearrangements

Page 5: SyMAP Master's Thesis Presentation

Genome RearrangementsGenome Rearrangements

Rearrangement Scenario Result

Inversion

Duplication

Insertion

Deletion

ABCDE ADCBE

ABC AC

AB ACB

AB CD AB BCD

ABC ABCB, ABBC

AB AB AB

Page 6: SyMAP Master's Thesis Presentation

Whole-Genome DuplicationWhole-Genome Duplication

• mya (million years ago)

Last Common Ancestor

rice maize

diverged 50-70 mya70 mya duplication

11 mya duplication

Page 7: SyMAP Master's Thesis Presentation

SyntenySynteny

• At least two pairs of genes with similar structure and function on the same chromosome– Order does not need to be conserved

• Often found using sequenced genomes

• We use a physical map and a genomic sequence

GenomeA

c

d

e

f

g

c

d

e

f

g

GenomeB

Page 8: SyMAP Master's Thesis Presentation

Physical MapPhysical Map

• Expensive to sequence large genomes

• A physical map provides partial ordering of pieces of DNA and pieces of genes

Page 9: SyMAP Master's Thesis Presentation

FPC MapFPC Map

• FingerPrinted Contigs• Soderlund et al. 1997

• Type of physical map

• Made up of clones– Snippets of DNA– We use BAC clones

• Bacterial artificial chromosome clones

– Stored in clone libraries

Page 10: SyMAP Master's Thesis Presentation

Making a BAC Clone LibraryMaking a BAC Clone Library

• Take thousands of copies of a genome• Cut it up into overlapping pieces (~150,000 base pairs)

– Restriction enzymes• Proteins that cut at specific DNA sequences

– Partial digestion• Restriction enzymes not allowed to cut at all possible

locations so that the clones overlap

Page 11: SyMAP Master's Thesis Presentation

ClonesClones

• Each clone is stored in a well on a microtiter plate

• Do not know the order of the clones, or where each clone is on the chromosome

Page 12: SyMAP Master's Thesis Presentation

Clone FingerprintingClone Fingerprinting

• Clone fingerprints are found to gather more information on a clone

• Fully digest a clone using restriction enzymes• If two clones share many fragments, they may

overlap

Page 13: SyMAP Master's Thesis Presentation

Clone FingerprintingClone Fingerprinting

• Fragments are run on a gel– Shorter fragments migrate

faster– Measure migration rate

• False positives and false negatives

Page 14: SyMAP Master's Thesis Presentation

FPCFPC

• Assembles fingerprinted clones into contigs– Contig → contiguous overlapping clones

• Assembles into many contigs instead of one large contig – Unclonable regions– Uneven distribution

Page 15: SyMAP Master's Thesis Presentation

MarkersMarkers

• Markers are pieces of DNA– ~ 300 base pairs

• Hybridization– A marker hybridizes to a

clone when the clone contains the marker

Page 16: SyMAP Master's Thesis Presentation

BESsBESs

• Expensive to sequence entire clones

• BAC End Sequences– BESs are sequences from the ends of BAC

clones– ~800 base pairs– Do not know which end the sequence comes

from– There are errors in the sequence

Page 17: SyMAP Master's Thesis Presentation

AnchorsAnchors

• Locations of two genomes found to be similar through a comparison of DNA sequences

• We use marker sequences and BESs searched against a known genome sequence– Maize has an FPC map with markers and BESs– The rice genome is sequenced

G G C C G T G G T G C T C T T T G C A A T G G G

G G C T G T G G T G C T C T T C G C A A T G G G

Page 18: SyMAP Master's Thesis Presentation

Component SummaryComponent Summary

Page 19: SyMAP Master's Thesis Presentation

Finding ChainsFinding Chains

Page 20: SyMAP Master's Thesis Presentation

Key Synteny Finding AlgorithmsKey Synteny Finding Algorithms

• Vandepoele et al. (2002) – ADHoRe– Variable gap size– Coefficient of determination to determine the

quality of a synteny block

• Haas et al. (2004) – DAGchainer– Directed acyclic graph– Dynamic programming– Gap penalty

Page 21: SyMAP Master's Thesis Presentation

Other Synteny Finding AlgorithmsOther Synteny Finding Algorithms

• Key characteristics for us:– Dynamic programming

• Ordering the anchors to form a DAG

– Gap penalty– Variable gap size

• Not appropriate for finding synteny using an FPC map– Do not consider the error conditions that arise

Page 22: SyMAP Master's Thesis Presentation

FPC to Genome SyntenyFPC to Genome Synteny

• Properties associated with FPC– FPC maps do not cover the entire genome– False+ and False- hybridized markers– FPC coordinates are approximate– Which end of the parent clone a BES belongs

to is unknown

Page 23: SyMAP Master's Thesis Presentation

1 x

2 o

3 x

4 x

5 o

6

8 # x

9 x

a x

b x

c x

7 x

1 2 3 4 5 6 7 8 9 a b c

Genome A(FPC map)

FPC Synteny PropertiesFPC Synteny PropertiesGenome B (sequenced genome)

Page 24: SyMAP Master's Thesis Presentation

NoiseNoise

Page 25: SyMAP Master's Thesis Presentation

SyMAP AlgorithmSyMAP Algorithm

• Anchor (ak, bl)

– ak is the location on the FPC map of genome GA

– bl is the location on the genomic sequence of GB

• Directed Acyclic Graph– E = {u, v | |ak-ai| MA and 0 bl-bj MB}

• where u = (ai, bj), v = (ak, bl) are anchors

– Allows edges decreasing along GA

• Catch off-diagonal anchors• Some inversions

Page 26: SyMAP Master's Thesis Presentation

SyMAP AlgorithmSyMAP Algorithm

• Manhattan distance function with scaling– D(v, w) = |ak - ai| / tA + |bl - bj| / tB

– Average distance between anchors may be different

• Dynamic Programming– Node(v) = 1 + Max(0, MaxuP(v) (Node(u) - D(u,v)))

• P(v) is the set of edges (u,v) E– 1 is the score given to an individual anchor– Plus the maximum path score for a previous node– Penalized by the distance between the nodes

Page 27: SyMAP Master's Thesis Presentation

SyMAP AlgorithmSyMAP Algorithm

• Chains must satisfy constraints• Number of anchors• Strength of line

– Pearson correlation coefficient

– Required to be more precisely linear the closer they are to the minimal number of anchors

– Exception for small and dense chains• Lower correlation due to errors in the assignment of

BES ends or clone ordering within a contig

Page 28: SyMAP Master's Thesis Presentation

SytrySytry

• Tool for testing synteny finding algorithms

• Allows for modifying the parameters of an algorithm and rerunning

• Results are shown as a dot plot– Need to visually confirm

results, as correct– Correct is what looks right

to the user

Page 29: SyMAP Master's Thesis Presentation

Automated Parameter SettingAutomated Parameter Setting

• Difficult to set parameters (e.g., tA and tB)

– Effects of changes can be unclear– Dependent on average distance between

anchors and noise• Optimal values vary between regions

• Have the algorithm set the gap parameters– Attempt to optimize tx for each chain

Page 30: SyMAP Master's Thesis Presentation

Sub-ChainsSub-Chains

• Overall orientation of a synteny chain may not be accurate for sub-chains

Page 31: SyMAP Master's Thesis Presentation

Sub-Chain FinderSub-Chain Finder

• Use only anchors that are part of a chain• Define distance between anchors in terms of the

number of anchors that fall between the anchors• A significant gap signals the start of a possible

inversion

Page 32: SyMAP Master's Thesis Presentation

Sub-ChainsSub-Chains

• Evolutionary history– e.g., total number of inversions

• Assigning an accurate orientation to all anchors in a chain– Beneficial for fixing the clone end assignment

of BES

Page 33: SyMAP Master's Thesis Presentation

BES Clone End AssignmentsBES Clone End Assignments

• BESs are arbitrarily assigned to clone ends– Algorithm takes this into account– However, the synteny when viewing can be

distorted

• Orientation can be used to correct BES assignments

Page 34: SyMAP Master's Thesis Presentation

BES Clone End AssignmentsBES Clone End Assignmentspositive orientation → lines should not cross

2 x

3 o

4

5

6 o

7 x

8 x

1 2 3 4 5 6 7 8

A

B

1 x

BA

2

345678

2

345678

1 1

Page 35: SyMAP Master's Thesis Presentation

BES Clone End AssignmentsBES Clone End Assignmentsnegative orientation → lines should cross

7 x

6 o

5

4

3 o

2 x

1 x

1 2 3 4 5 6 7 8

A

8 x

BBA

1

234567

1

234567

8 8

Page 36: SyMAP Master's Thesis Presentation

SyMAP ViewsSyMAP Views

• Accessible through a web browser• Static views

– All synteny blocks ↔ sequenced chromosomes– Synteny blocks ↔ sequenced chromosome

• Interactive views– Dot plot view

• Genome to genome• Chromosome to chromosome

– Alignment view• FPC ↔ sequenced chromosome• FPC ↔ FPC• FPC ↔ sequenced chromosome ↔ FPC

– Close-up view• FPC ↔ sequenced chromosome

Page 37: SyMAP Master's Thesis Presentation

All Blocks ↔ Sequenced ChromosomesAll Blocks ↔ Sequenced Chromosomes

Page 38: SyMAP Master's Thesis Presentation

Blocks ↔ Sequenced ChromosomeBlocks ↔ Sequenced Chromosome

Page 39: SyMAP Master's Thesis Presentation

Genome ↔ Genome Dot PlotGenome ↔ Genome Dot Plot

Page 40: SyMAP Master's Thesis Presentation

Chromosome ↔ Chromosome Dot PlotChromosome ↔ Chromosome Dot Plot

Page 41: SyMAP Master's Thesis Presentation

Block ↔ Sequenced ChromosomeBlock ↔ Sequenced Chromosome

Page 42: SyMAP Master's Thesis Presentation

Subset FlippedSubset Flipped

Page 43: SyMAP Master's Thesis Presentation

Contig ↔ Sequenced ChromosomeContig ↔ Sequenced Chromosome

Page 44: SyMAP Master's Thesis Presentation

Filters and ControlsFilters and Controls

Page 45: SyMAP Master's Thesis Presentation

FPC ↔ Sequenced Chromosome ↔ FPCFPC ↔ Sequenced Chromosome ↔ FPC

Page 46: SyMAP Master's Thesis Presentation

FPC ↔ FPCFPC ↔ FPC

Page 47: SyMAP Master's Thesis Presentation

Close-up of GeneClose-up of Gene

Page 48: SyMAP Master's Thesis Presentation

SyMAP ImplementationSyMAP Implementation

• Caching is needed:– Downloads large amounts of data from remote

database– History feature

• Navigating back and forth between the same views

• Soft References– Remain alive as long as the memory is available

• Data objects– Hold data in a compact form– Converted to view objects when needed

Page 49: SyMAP Master's Thesis Presentation

ResultsResults

• www.agcol.arizona.edu/symap– Maize and sorghum aligned to rice– Maize FPC aligned to sorghum FPC

• Used in editing the maize FPC maps based on its alignment to rice (Wei et al., in preparation)

• Alignment of maize to rice chromosome 3– Buell et al. (2005)

• Used in OMAP project – Aligning 12 species of rice to the sequenced genome

of rice (Wing et al., in preparation)

Page 50: SyMAP Master's Thesis Presentation

AcknowledgementsAcknowledgements

• Thesis Committee– Dr. Cari Soderlund, thesis advisor– Dr. Peter Downey– Dr. Kobus Bernard

• This work is funded in part by NSF DBI #0115903

www.agcol.arizona.edu/symap