26
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Embed Size (px)

Citation preview

Page 1: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Sorting by reversals

Bogdan PasaniucDept. of Computer Science & Engineering

Page 2: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Overview

Biological background Definitions Unsigned Permutations

Approximation Algorithm Sorting Signed Permutations

Simplified Algorithm

Page 3: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

What is the evolutionary path ? What is the ancestor chromosome?

Chromosomes lists of genes permutation

Unknown ancestor

Human (X chrom.)

Mouse (X chrom.)

Page 4: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Mutation at chromosome level Inversion (1 2 3 4 5 6 7) (1 4 3 2 5 6 7) Transposition (1 2 3 4 5 6 7) (1 5 6 2 3 4 7) Translocation (1 2 3 4 5 6 7) (1 2 3 4 5 2 3 4 6 7)

Inversions Known as reversals The most common Most often reflect the differences between and within species

What is the minimum number of reversals required to transform one perm. into another?

Reversal distance good approx. for evolutionary distance

Page 5: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

1 32

4

10

56

8

9

7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Reversals

Genes (blocks)

Page 6: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Reversals1 32

4

10

56

8

9

7

1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Page 7: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Reversals1 32

4

10

56

8

9

7Breakpoints

1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Page 8: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Breakpoint a pair of adjacent positions (i,i+1) s. t. | i - i+1| ≠ 1

The values i i+1are not consecutive

If | i - i+1| = 1 then the values i i+1are adjacent

Introduce 0 = 0 , n+1 = n+1 (0,1) breakpoint if 1 ≠ 1

(n,n+1) breakpoint if n ≠ n

A reversal affects the breakpoints only at its endpoints

Any reversal can remove or induce at most 2 bkpts.

Page 9: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Strip A maximal run of increasing (decreasing) elements.

Identity permutation has no breakpoints and any other permutation has at least one breakpoint

Greedy at each step remove the maximum number of breakpoints.

Ф() = number of breakpoints in While(Ф() > 0)

Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip)

Greedy ends in at most Ф() steps.

Page 10: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Quality of approximation

Lemma1: Every permutation with a decreasing strip has a reversal that removes one breakpoint.

Proof:

consider the decreasing strip with i being the smallest

i -1 must be in an increasing strip that lies to the left or right

Breakpoint that will be removed

Page 11: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Lemma2: has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips has a reversal that removes two bkpts.

Proof: consider the decreasing strip with i being the smallest

increasing strip must be to the left. i

consider the decreasing strip with j being the largest decreasing strip containing j +1 must be to the right. j

Page 12: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Fact 1: i and j must overlap j must lie in i if it doesn’t then oi has the

decreasing strip that contains j i must lie in j if it doesn’t then oj has the

decreasing strip that contains i

Page 13: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Fact 2. i = j

If i - j ≠ 0 then

- if i - j contains an increasing strip oj has a decreasing strip

- if i - j contains an decreasing strip oi has a decreasing strip

Then = i = removes 2 breakpoints.

Page 14: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Lemma 3: Greedy solves a permutation with a decreasing strip in at most Ф() – 1 reversals

Obs: if i has no decreasing strip at step i-1 the reversal

removed 2 bkpts. we can use one reversal to create a decr. strip exists

a reversal that removes at least one bkpt

Theorem1: Greedy sorts every permutation in at most Ф() reversals.

If has a decreasing strip at most Ф() -1 reversals

If has no decreasing strip every reversal induces a decreasing strip after one step we can apply lemma3 at most Ф() reversals

Page 15: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Corollary: Greedy is a 2-approximation algorithm Every reversal removes at most 2 bkpts OPT() ≥ Ф() /2 ≥ Greedy() /2 Greedy() ≤ 2* OPT() .

Runtime#of steps O(n).

At each step we need to analyze reversalsO(n2).

Total runtime = O(n3).

analyze only reversals that remove bkpts O(n2).

2

n

Page 16: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Signed permutations:reversals change the sign:(1,2,3,4,5,6,7,8,9,10) (1,2,3,-8,-7,-6,-5,-4,9,10)

Problem:

Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm.

polynomial algorithm (Hannenhalli&Pevzner ’95)

relies on several intermediary constructions

these constructions have been simplified

first completely elementary treatment of the problem (Bergeron ’05)

Page 17: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Oriented pair a pair of consecutive integers with different signs

(0,3,1,6,5,-2,4,7) o.p. (3,-2) and (1,-2).

o.p. reversals that create consecutive integers

(3,-2) : (0,3,1,6,5,-2,4,7) (0,3,2,-5,-6,-1,4,7)

(1,-2) : (0,3,1,6,5,-2,4,7) (0,3,-5,-6,-1,-2,4,7)

Oriented reversal: reversal that creates consecutive integers

Score of a reversal: # of oriented pairs it creates.

Page 18: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Algorithm1: As long as has an oriented pair, choose the oriented reversal that has the maximal score.

output will be a permutation with positive elements.

0 and n+1 are positive;

if there is a negative element there exists an o.p.

Claim1: If Alg1 applies k reversals to , yielding ’ then d() = d(’) + k.

Page 19: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Sorting positive perms.:

- signed perm. with positive elements

- circular order: 0 successor of n+1.

- reduced if it does not contain consecutive elements.

framed interval in : i j+1 j+2 …j+k-1i+k

s.t. i < j+1 j+2 … j+k-1 < i+k

(0 2 5 4 3 6 1 7 )

hurdle a framed int. that contains no shorter framed int.

(0 2 5 4 3 6 1 7 )

(0 2 5 4 3 6 1 7 )

(0 2 5 4 3 6 1 7 )

Page 20: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Idea: create oriented pairs and then apply Algorithm1

Operations on Hurdles: Hurdle Cutting: i j+1 j+2 …i+1…j+k-1i+k

(0 1 4 3 2 5 ) (0 -3 -4 -1 2 5 )

Hurdle Merging: i … i+k … i’ … i’…i’+k’

(0 2 5 4 3 6 1 7)

Simple hurdle if cutting it decreases the # of hurdles Super hurdles if cutting it increases the # of hurdles

(0 2 5 4 3 -6 1 7 )

Page 21: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Algorithm2:

has 2k hurdles merge any two non-consecutive hurdles

has 2k+1 hurdles cut one simple hurdle (if it has none merge any two non-consecutive)

Claim2: Alg1 + Alg2 optimally sort any signed perm.

Page 22: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Proof of claims: breakpoint graph

1. each positive el x 2x-1,2x and each negative (-x) 2x,2x-1

(0 -1 3 5 4 6 -2 7)

(0 2 1 5 6 9 10 7 8 11 12 4 3 13 )arcs

Page 23: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Arcs oriented if they span an odd # of elements Arc overlap graph:

Vertices -> arcs from breakpoint graph Edges arcs overlap

Page 24: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Every oriented vertex corresponds to an oriented pair.

Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1.

T= #oriented vertices. U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v

Oriented component if it contains an oriented v Safe reversal does not create new unoriented

components.

Page 25: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Theorem (Hannenhalli&Pevzner). Any sequence of oriented safe reversals is optimal.

Theorem. An oriented reversal of maximal score is safe.

claim1 holds.

Claim2 is proven in a similar manner.

Page 26: Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. 1995.

A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory. 2005

A. Caprara. Sorting by reversals is difficult. 1997 S. Hannenhalli and Pavel Pevzner. Transforming

cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999