65
Genome Rearrangements Genome Rearrangements …and YOU!! …and YOU!! Presented by: Presented by: Kevin Gaittens Kevin Gaittens

Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Embed Size (px)

Citation preview

Page 1: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Genome Rearrangements Genome Rearrangements …and YOU!!…and YOU!!

Genome Rearrangements Genome Rearrangements …and YOU!!…and YOU!!

Presented by:Presented by:

Kevin GaittensKevin Gaittens

Page 2: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Overview• Bio background• Definitions and Set-up• Reality-Desire• Good Components• Bad Components• Fin

Page 3: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Biological Bakground

• Comparing entire genomes across species

• Need “distance” measure

• Interested in larger differences than just single insertions/deletions etc.

• Genome Rearrangements – chromosome piece (gene) being moved or copied to another location or transferring to another chromosome altogether

Page 4: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Definitions

• Block – section of genome possibly containing more than one gene; one unit

• Homologous – when two blocks contain the same genes. Homologous blocks have the same number label

• Reversal – reversing a series of blocks and also their orientations; distance is measured in number of reversals

Page 5: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example of Reversal

3 4 1 2 5

3 2 1 4 5

Red – right orientationBlack – left orientation

Page 6: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Goals

• Want shortest number of reversals to transform one genome to another– Parsimony assumption – assume Nature

changes optimally

• Desire polynomial time solution

• Oriented has a poly-time solution, unoriented NP-hard

Page 7: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example

1 2 3 4 5

5 2 1 3 4

Add circle if orientation changes

Page 8: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

One solution

1 2 3 4 5

1 2 5 4 3

1 2 5 3 4

5 2 1 3 4

Page 9: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Breakpoints

• Act as a minimum• Happens in the case of:

– first/last label in original not the first/last label in the target

– OR 2 labels are consecutive in original, but not in target

– OR consecutive in original and target but duel orientation is different between blocks

• …5 4… and …5 4…– NOTE: If a pair of labels is an exact reversal in the

target, there is NO breakpoint• …4 5… and …5 4… do not have a breakpoint

Page 10: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Breakpoints for Last Example

1 2 3 4 5

Goal reminder:

5 2 1 3 4

1 is different than first of target

No breakpoint between 1 and 2 since exact reversal in target

2 and 3 not consecutive in target

3 and 4 match, thus no breakpoint

5 is different from last in target

4 and 5 are not consecutive in target

Page 11: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Mathy Stuff :o)

Let L be finite set of labels

L0 = U { a , a } for all a in L

| x | -> remove arrows

Ex:| a | = | a | = a

Page 12: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Cont’d

Oriented permutation over L is a mapping α: [1..n] -> L0 such that for any a ε L,

there is exactly one i ε [1..n] with |α(i)| =a

Basically, permutation “picks” an orientation for each label. If a is picked, then a will not be

Page 13: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example

n = 4

L = {1, 2, 3, 4}

α = ( 2, 1, 4, 3 )

So α(3) = 4

Page 14: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Identity Permutation

• Special case

• Permutation I such that I(i) = i for all i between 1 and n

• For n = 3, I = ( 1 2 3)

Page 15: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Reversals

Let i and j be two indices with 1 ≤ i, j ≤ n

[i,j] indicates a reversal affecting elements α(i) through α(j)

Page 16: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example

Given α = ( 2, 3, 4, 1)

α[2,3] = ( 2, 4, 3, 1)

Note: similar to boxing scheme used earlier

Page 17: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

More Math!

In general:

Α[i, j](k) = α(i + j – k) if i ≤ k ≤ j

α(k) otherwise

α(k) means reversal of orientation of α(k)

Page 18: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Sorting by Reversals

• Is the main goal

• Given 2 permutations α and β, seek minimum number of reversals to transform α into β

• Αp1p2p3…pt = β where p1, p2,…, pt are reversals

• t is called the reversal distance of α with respect to β and denoted by dβ(α)

Page 19: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Sorting con’t

• Look for reversals that “make progress” towards β

dβ (αp) < dβ (α) or

dβ (αp) = dβ(α) - 1

Page 20: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Breakpoints

• Add labels L and R to α to get “extended version”

• One example of a α is:

(L, 2, 3, 1, 6, 5, 4, R)

• If B is identity, then breakpoints at…

Page 21: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Breakpoints

none at 5 4, reverse pair 4 5 is in β

L 2 3 1 6 5 4 R

L 1 2 3 4 5 6 R

2 is not the first block of β

2 and 3 are consecutive, but the orientations are different than what they need and are not a complete reversal

3 and 1 are not consecutive in β

1 and 6 are not consecutive in β6 and 5 are consecutive, but not a complete reversal (orientation of 6 prevents it)

4 is not the final block in β

Page 22: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Breakpoints con’t

• Can remove at most 2 breakpoints with each reversal

• Thus, b(α) – b(αp) ≤ 2

• This also means that b(α)/2 ≤ d(α)

• This is a lower bound for d(α)

Page 23: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Bps cont’d

• b(α)/2 is lower bound

• However, this is rarely achievable

• Want a better lower bound

• Look to something called reality-desire diagram

Page 24: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Reality-Desire

• Happens when 2 labels are adjacent, but do not “want” to be adjacent

• Reality – neighbor a certain label has in α

• Desire – neighbor the label has in β

Page 25: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Diagram

• Oriented labels can be viewed as a battery

• Positive terminal at tip of arrow

• Negative at tail

- a +

Page 26: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example

α

αp

Desire

Reality

Page 27: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Example

Extended α: L 3 2 1 4 5 R

Replace labels by terminals & reality edges:

L -3 +3 +2 -2 +1 -1 -4 +4 +5 -5 R

Add desire edges

Page 28: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Diagram

• To create diagram of reality-desire:– Arrange all terminal nodes around a circle

with L and R at the top– L to the left of R and all other nodes following

α counterclockwise– Reality edges will be along circumference– Desire edges will be the chords

Page 29: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Diagram of Reality-Desire

Happens where not breakpoint

Page 30: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Interpretation

• Number of cycles in RD(α) is cβ(α) and is number of connected parts

• cβ(β) has no breakpoints

• Notice cβ(β)=n+1– Why?

Page 31: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Effects of a Reversal

Let (s,t) and (u,v) be two reality edges characterizing a reversal p with (s,t) preceding in the permutation α. Then RD(αp) differs from RD(α) by:1. Reality edges (s,t) and (u,v) are replaced by (s,u) and (t,v)2. Desire edges remain unchanged3. The section of the circle going from node t to

node u, including these extremities, in counterclockwise direction, is reversed.

Page 32: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Our Example

Reversing (-1,-4) and (+4, +5)

Page 33: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Definitions

• Let e and f be two reality edges belonging to the same cycle in RD(α)

• If orientations induced by e and f coincide, they are convergent– Walk counterclockwise from start of e

(passing through desire edges) until you reach the beginning of f. If the end of f is still counterclockwise, then converge

• Divergent otherwise

Page 34: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Walking Convergent

Still counterclockwise

(+3,+2) to (-1,-4)

Page 35: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

How Reversals Affect Cycles

If e and f belong to different cycles, c(αp)=c(α) -1

Page 36: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

If e and f belong to the same cycles and converge

c(αp)=c(α)

Page 37: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

If e and f belong to the same cycles and diverge

c(αp)=c(α) +1

Page 38: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Summary

If e and f:• belong to different cycles, c(αp)=c(α) -1• belong to same cycle & converge, c(αp)=c(α)• belong to same cycle & diverge, c(αp)=c(α)+1

Page 39: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Lower Bound

• Since number of cycles changes by at most 1 per reversal, can get a new lower bound for reversals

• Suppose αp1p2..pt=β--cβ(αp1p2...pt)=cβ(β)=n+1

cβ(αp1) – cβ(α) ≤ 1

cβ(αp1p2) – cβ(αp1) ≤ 1…

cβ(αp1...pt) – cβ(αp1...pt-1) ≤ 1

Page 40: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Lower Bound

• Add to get n+1 – cβ(α) ≤ t

• If p1,p2,...,pt is an optimal sorting, then t=dβ(α)

n+1 – cβ(α) ≤ dβ(α)

Very good lower bound

Page 41: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Good/Bad Cycles

• A cycle is “good” if it has two divergent reality edges

• If not, it is considered “bad”• Good cycles have at least two desire

edges that cross– Not all cycles that have crossing edges are

good

• Call cycles “proper” if they have at least four edges

Page 42: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Good/Bad cont’d

• If we only have good cycles, lower bound d(α) ≥ n+1 – c(α) is an equality

• How could it be possible for it to be an equality if there are a few bad cycles mixed in to start?

Page 43: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Interleave• Twisting another cycle

while breaking another is only possible if the two cycles are such that some desire edge from one of the cycles crosses some desire edge from the other

• These two cycles “interleave” in this case

Page 44: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Interleave

Page 45: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Interleaving Graph

• Important to verify which cycles interleave with which other cycles

• Take as nodes the proper cycles of RD(α)• Two nodes adjacent iff the cycles

interleave• Connected components are classified as

good or bad• If a component contains all bad cycles, it is

bad. Otherwise, it is said to be good

Page 46: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

RD to Interleave

Gray filled-in circles are good cycles

Page 47: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Choosing a Reversal

Page 48: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Choosing a Reversal

• C is the only good cycle• Let e = (L, +3), f=(-3,-4), g=(-1,+2)• f & g converge, so not a good choice

Page 49: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

e and g

• e and g diverge and produce 2 good components with 1 cycle each

Page 50: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

e and f

• e and f produce a single good component with two cycles

Page 51: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Reversal Choosing cont’d

• A reversal characterized by two divergent edges of the same cycle is a sorting reversal iff its application does not lead to the creation of bad components

• So reversing e & f or e & g are both acceptable

Page 52: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Bad Components

• Good components can be sorted as in previous slide

• First step in dealing with bad components is to classify them

• Component Y “separates” components X and Z if all chords in RD(α) that link a terminal in X to one in Z cross a desire edge of Y

Page 53: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

• E separates F and D• What are some other separations?

Page 54: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Definitions

• Hurdle – bad component that does not separate two bad components

• Nonhurdle – bad component that separates two bad components

Page 55: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Definitions cont’d

• X protects nonhurdle Y if removal of X would cause Y to become a hurdle– If anytime Y separates 2

bad components, X is one of them

• Superhurdle – hurdle that protects a nonhurdle

• Simple hurdle – does not protect a nonhurdle

F protects E

Page 56: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Classification

Page 57: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Formula for Reversal Distance

d(α) = n + 1 – c(α) + h(α) + f(α)

h(α) = number of hurdlesf(α) = 0 or 1

1 if α is a fortressA nonhurdle will become a hurdle at some point

Page 58: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Fortress

• A fortress is a permutation where there are an odd number of hurdles and all of them are super hurdles. They require an extra reversal since a nonhurdle will become a hurdle at some point

Page 59: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Definitions

• X and Y are “opposite” hurdles when we find the same number of hurdles when walking around the circle counterclockwise from X to Y as we do clockwise.

Note: only wheneven number hurdles

Page 60: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Hurdle Cutting

• Reverse edges in same component• Used only with simple hurdles

Page 61: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Final AlgorithmWhile α not B:

If there is a good component in RD(α) then pick two divergent edges in this component ensuring that it does not create a bad component

Else if h(α) is even then

return merging of two opposite hurdleselse

if there is a simple hurdlereturn a reversal cutting this hurdle

else //fortressreturn merging of any two hurdles

Page 62: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Fortress Handling

Fortress, so choose any 2 hurdles and merge

C is good

C

A

B

Page 63: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Complexity

• Construction RD(α) takes linear

• Finding the cycles is O(n)

• For each cycle, determine good/bad– This is O(n) per cycle, so O(n2) total

• Determining interleaving can be done in O(n2)

• Counting hurdles etc. can be done linearly with the other knowledge

Page 64: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Complexity cont’d

• Figuring out a Sorting Reversal for good components is the worst since need ensure we don’t create bad components

• Since reversal is identified with a pair of edges, O(n2) reversals.

• For each one, O(n2) time checking the resulting permutation. O(n4) total

• We need to do this dβ(α) times so O(n5) all together

Page 65: Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Final Slide, Huzzah!• Found accurate distance measure

for genome movements• Found a poly-time solution for

solving the problem• Played with fun graphs