Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches...

Preview:

Citation preview

Algorithmic Approaches for Biological Data, Lecture #23

Katherine St. John

City University of New YorkAmerican Museum of Natural History

2 May 2016

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))

if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).

(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Millennium Prize Problems

In 2000, the Clay Mathematics Institute an-nounced million dollar prizes for:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 9 / 54

Millenium Prize Problems

Grigori Perelman, 1993

In 2010, CMI announced Grigori Perelmansolved:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

He turned down the prize.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54

Millenium Prize Problems

Grigori Perelman, 1993

In 2010, CMI announced Grigori Perelmansolved:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

He turned down the prize.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54

P vs. NP

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs. NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 11 / 54

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

P?= NP : Why Does It Matter?

wiki

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Many security systems (suchas the Data EncryptionStandard (DES) used to sendATM/bank data) would beeasily breached.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 13 / 54

P?= NP : Why Does It Matter?

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

United Airlines

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Scheduling and routingquestions (such as theKnapsack question andTraveling Salesman Problem)could be done efficiently.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 14 / 54

P?= NP : Why Does It Matter?

wiki

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Some hard biologicalquestions (such as proteinfolding) would be tractable.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 15 / 54

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

In Pairs: Analyze Running Time

Give upper bounds on the worst case running time of:

1 def double(n):

d = 2*n

return d

2 def sum(n):

s = 0

for i in range(n):

s += i

return s

3 def sum2(n):

return n*(n+1)/2

4 def findMin(numList):

m = numList[0]

for i in range(1,len(numList)):

if numList[i] < m:

m = numList[i]

return m

5 (Code from text):

6 (Code from text):

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 17 / 54

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

In Pairs: Analyze Space Requirement

The running time of bubblesort is analyzed above. Give upper bounds on theworst case space needed for:

1 def double(n):

d = 2*n

return d

2 def sum(n):

s = 0

for i in range(n):

s += i

return s

3 def sum2(n):

return n*(n+1)/2

4 def findMin(numList):

m = numList[0]

for i in range(1,len(numList)):

if numList[i] < m:

m = numList[i]

return m

5 def findMaxDist(numList):

n = len(numList)

d = np.zeros(n,n)

for i in range(1,n):

for j in range(1,n)

d[i,j] = abs(i,j)

return np.amax(d)

6 def findMaxDist2(numList):

n = len(numList)

m = 0

for i in range(1,n):

for j in range(1,n)

if abs(i,j) > m:

m = abs(i,j)

return m

7 (Code from text):

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 20 / 54

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.

Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.

Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.

For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

Recap: Small Parsimony Problem

Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.

Thinking in terms of time complexity:How long does it take?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54

Recap: Small Parsimony Problem

Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.

Thinking in terms of time complexity:How long does it take?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.

I Output: The parsimony score of the tree(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structure

I Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.

F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: set

F Has functions for union and intersection ofsets.

F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.

F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position: for-loop

F If overlap, use that label. if-statementF If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1) set operations

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 25 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:

A AT AT T G → A A T T GI For all other nodes, compare to the parent:

A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:

A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G

→ A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T G

I Go position by position:F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position: for-loop

F If overlap, use that label. if-statementF If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 27 / 54

In Pairs: Analyze Fitch’s Algorithm

AMNH

What is the running time and space require-ments for:

First pass: Starting at the leaves, label theinternal leaves (with possible multiplelabels).

Second pass: Starting at the root, choosea labeling, then work towards the leavesminimizing the conflicts.

Print out all tree on n leaves.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 28 / 54

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Analogy: Find the Highest Point

polymaps.org

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 30 / 54

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a point

I Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)

I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

NNI Metric

C

ED

BA

G

F →

A

E

C

D

B

GF

The NNI distance between two trees is the minimal number of movesneeded to transform one to the other (NP-hard, DasGupta et al. ‘97).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 35 / 54

SPR Distance

F

G

ED

C

A B A B

F

G

ED

CA B

F

G

ED

C

SPR distance is the minimal number of moves that transforms onetree into the other.

SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).

SPR for unrooted trees is NP-hard (Hickey et al. ‘08).

SAT-based heuristic (Bonet & S ‘09).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54

SPR Distance

F

G

ED

C

A B A B

F

G

ED

CA B

F

G

ED

C

SPR distance is the minimal number of moves that transforms onetree into the other.

SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).

SPR for unrooted trees is NP-hard (Hickey et al. ‘08).

SAT-based heuristic (Bonet & S ‘09).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

Tree Rearrangements

C D E

A

B

G

F

CF

A

B

D

E

GE

A

C

D

G

F

BA

F

C

B

D

E

GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection

(NNI) (SPR) (TBR)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54

Tree Rearrangements

C D E

A

B

G

F

CF

A

B

D

E

GE

A

C

D

G

F

BA

F

C

B

D

E

GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection

(NNI) (SPR) (TBR)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54

Metrics Matter

NNI & SPR Isoscope

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54

Metrics Matter

NNI & SPR Isoscope

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54

Landscapes

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

A treespace with assigned scores is often called a landscape.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 40 / 54

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

Adjusting Search Space

SPR metric NNI metric

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

The same data, organized by different tree metrics.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 42 / 54

Attraction Basins

resalliance.org

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 43 / 54

Adjusting Search Space

SPR metric NNI metric

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

Simplest Case: for compatible character sequences (‘perfect data’):

Under SPR, there is a single attraction basin.

Under NNI, multiple attraction basins occur even for perfect data.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 44 / 54

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.

Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.

Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.

The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54

Trees as Vectors

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 48 / 54

Trees as Vectors

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45

1|2

34

52|1

34

53|1

24

54|1

23

55|1

23

41

2|3

45

13|2

45

14|2

35

15|2

34

23|1

45

24|1

35

25|1

34

34|1

25

35|1

24

45|1

23

T0 = (1, 2, 3, 4, 5) 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0T1 = ((1, 2), (3, (4, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1T2 = ((1, 2), (4, (3, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 49 / 54

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

BHV & NNI Distances

1 5

423

1 5

432

3 5

412

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

Simplest move between orthants: shrink coordinate/edge and expand.

Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54

BHV & NNI Distances

1 5

423

1 5

432

3 5

412

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

Simplest move between orthants: shrink coordinate/edge and expand.

Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54

In Pairs: Continuous Treespace

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

1 What is the distance between T1 and T2?

2 What is the average (tree at the midpoint) ofT1 and T2?

3 Give three trees that have distance 1 to theorigin, T0 (star tree).

4 Lower figure: what is the distance between(3,0,0,2) and (1,1,0,0)?

5 What is a good average/consensus for thefour trees?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 53 / 54

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to kstjohn@amnh.org.

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to kstjohn@amnh.org.

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to kstjohn@amnh.org.

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to kstjohn@amnh.org.

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Recommended