27
Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Embed Size (px)

DESCRIPTION

Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France. Motivations Genome Rearrangements. Human. Mouse. Sorting by Reversals. 0 7 5 3 -1 -6 -2 4 8. (HS). (MM). 0 1 2 3 4 5 6 7 8. - PowerPoint PPT Presentation

Citation preview

Page 1: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Faster Sorting by Reversals

Eric Tannier, Marie-France Sagot

INRIA, Lyon, France

Page 2: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

MotivationsGenome Rearrangements

Human

Mouse

Page 3: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Sorting by Reversals

0 7 5 3 -1 -6 -2 4 8 (HS)

0 1 2 3 4 5 6 7 8 (MM)

Page 4: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Sorting by Reversals

0 7 5 3 -1 -6 -2 4 8 (HS)

0 1 2 3 4 5 6 7 8 (MM)

0 1 -3 -5 -7 -6 -2 4 8

Page 5: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Sorting by Reversals

0 7 5 3 -1 -6 -2 4 8 (HS)

0 1 2 3 4 5 6 7 8 (MM)

0 1 -3 -5 -7 -6 -2 4 8

0 1 -3 -5 -4 2 6 7 8

Page 6: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Sorting by Reversals

0 7 5 3 -1 -6 -2 4 8 (HS)

0 1 2 3 4 5 6 7 8 (MM)

0 1 -3 -5 -7 -6 -2 4 8

0 1 -3 -5 -4 2 6 7 8

0 1 -3 -2 4 5 6 7 8

Page 7: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Sorting by Reversals

0 7 5 3 -1 -6 -2 4 8 (HS)

0 1 2 3 4 5 6 7 8 (MM)

0 1 -3 -5 -7 -6 -2 4 8

0 1 -3 -5 -4 2 6 7 8

0 1 -3 -2 4 5 6 7 8

Page 8: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

History

1995 Hannenhalli and Pevznerfirst polynomial algorithm O(n4)

1996 Berman and Hannenhallicomplexity improvement O(n2a(n))

1997 Kaplan, Shamir and Tarjancomplexity improvement O(n2)

1997 CapraraNP-completeness of the unsigned problem

2003 Bergeronsimple presentation

2003 Ozery-Flato and Shamir"It is a central problem in the study of genome rearrangements whether one can obtain a subquadratic algorithm for sorting by reversals"

Page 9: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The Breakpoint Graph

0 7 5 3 -1 -6 -2 4 8

0 -1 -2 3 4 5 -6 7 8

Reality

Desire

Page 10: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The Breakpoint Graph

4 5 1-cycle, adjacency

3 -4 52-cycle

3 -4 5 63-cycle

Two 2-cycles3 -4 -4.5 5 6

Page 11: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The effect of a reversal on the cycles

0 7 5 3 -1 -6 -2 4 8

0 -1 -2 3 4 5 -6 7 8

0 1 -2 -3 4 -5 -6 -7 8

0 1 -3 -5 -7 -6 -2 4 8

0 7 5 3 -1 -6 -2 4 8

0 -1 -2 3 4 5 -6 7 8

0 7 -4 2 6 1 -3 -5 8

Oriented cycle0 1 2 3 -4 -5 6 7 8

Non-oriented cycle

Page 12: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

In the Breakpoint Graph

Oriented cycle = with blue edges joining different signs

Component = Set of cycles, not crossing others cycles outside

Oriented Component = Component with an oriented cycle

Unoriented Component = Component with non oriented cycle

Page 13: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The theorem of Hannenhalli and Pevzner

d = n + 1 - c + t

minimum number of reversals

size of the permutation

number of cycles in the breakpoint graph

number of reversals to clear unoriented components

Page 14: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The theorem of Hannenhalli and Pevzner

d = n + 1 - c

minimum number of reversals

size of the permutation

number of cycles in the breakpoint graph

(no unoriented component)

Page 15: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

0 -1 -2 3 4 5 -6 7 8

A bad choice among oriented cycles

0 1 -2 -3 4 5 6 7 8

0 7 5 6 1 -3 -2 4 8

0 7 5 3 -1 -6 -2 4 8

Page 16: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Different approaches

Naive: Choose any oriented cycle, apply the corresponding reversal, and if it creates an unoriented component, choose another one O(n3)

Better: Test some properties on oriented cycles that cannot create unoriented component O(n2)

Our method: Bad oriented cycles are good ones... later

Page 17: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

0 -1 -2 3 4 5 -6 7 8

Solution : empty

AB

CD

Page 18: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : D

AB

C

0 1 -2 -3 4 5 6 7 8

Page 19: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : D,C

AB

0 1 2 3 4 5 6 7 8

Page 20: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : (D,C)

AB

C

0 1 -2 -3 4 5 6 7 8

Page 21: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : (D,C)

AB

CD

0 -1 -2 3 4 5 -6 7 8

Page 22: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : A...(D,C)

B

CD

0 1 -2 -3 4 -5 -6 -7 8

Page 23: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : A,B...(D,C)C

D

0 1 2 -3 -4 -5 6 7 8

Page 24: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

The algorithm

Solution : A,B,D,C

0 1 2 3 4 5 6 7 8

Page 25: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Time complexity

With any classical data structure, it takes linear time to perform a reversal, so at least quadratic time to sort.

Kaplan and Verbin (2003) invented a data structure to represent permutation, which allows to pick an oriented cycle and perform a reversal in time O(sqrt(n log(n)))

We use the same data structure to sort by reversals in time O(sqrt(n log(n))).

Page 26: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

0 7 5 3 -1 -6 -2 4 8

The data structure

-1

5

0 3

7

-2

-6 4

8

Page 27: Faster Sorting by Reversals Eric Tannier, Marie-France Sagot INRIA, Lyon, France

Future work

Can we do better in time complexity?

Can the method give ideas to- sort with several (>2) permutations? (NP-hard, Caprara, 2002)- sort by transpositions?(unknown complexity)