View
3
Download
0
Category
Preview:
Citation preview
Algorithmic Approaches for Biological Data, Lecture #23
Katherine St. John
City University of New YorkAmerican Museum of Natural History
2 May 2016
Outline
Project & Last Day Notes
Complexity Revisited: NP-hardness, Time & SpaceComplexity
Searching for Optimal Trees
Edit distances between trees
Tree Vectors
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54
Outline
Project & Last Day Notes
Complexity Revisited: NP-hardness, Time & SpaceComplexity
Searching for Optimal Trees
Edit distances between trees
Tree Vectors
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54
Outline
Project & Last Day Notes
Complexity Revisited: NP-hardness, Time & SpaceComplexity
Searching for Optimal Trees
Edit distances between trees
Tree Vectors
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54
Outline
Project & Last Day Notes
Complexity Revisited: NP-hardness, Time & SpaceComplexity
Searching for Optimal Trees
Edit distances between trees
Tree Vectors
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54
Outline
Project & Last Day Notes
Complexity Revisited: NP-hardness, Time & SpaceComplexity
Searching for Optimal Trees
Edit distances between trees
Tree Vectors
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54
Project & Last Day Notes
Last week of lectures (and lab).
Next Monday (last day of class):
I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)
I Open Lab for questions about labs,Rosalind questions, etc.
For those who cannot make Monday, possible todo presentation this Wednesday (see me).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54
Project & Last Day Notes
Last week of lectures (and lab).
Next Monday (last day of class):
I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)
I Open Lab for questions about labs,Rosalind questions, etc.
For those who cannot make Monday, possible todo presentation this Wednesday (see me).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54
Project & Last Day Notes
Last week of lectures (and lab).
Next Monday (last day of class):
I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)
I Open Lab for questions about labs,Rosalind questions, etc.
For those who cannot make Monday, possible todo presentation this Wednesday (see me).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54
Project & Last Day Notes
Last week of lectures (and lab).
Next Monday (last day of class):
I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)
I Open Lab for questions about labs,Rosalind questions, etc.
For those who cannot make Monday, possible todo presentation this Wednesday (see me).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54
Project & Last Day Notes
Last week of lectures (and lab).
Next Monday (last day of class):
I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)
I Open Lab for questions about labs,Rosalind questions, etc.
For those who cannot make Monday, possible todo presentation this Wednesday (see me).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54
Complexity Revisited: Running Times
Theoretical Estimates on Running Time
big-Oh notation
Complexity Classes
P vs. NP
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54
Complexity Revisited: Running Times
Theoretical Estimates on Running Time
big-Oh notation
Complexity Classes
P vs. NP
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54
Complexity Revisited: Running Times
Theoretical Estimates on Running Time
big-Oh notation
Complexity Classes
P vs. NP
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54
Complexity Revisited: Running Times
Theoretical Estimates on Running Time
big-Oh notation
Complexity Classes
P vs. NP
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54
Comparing Algorithms
Measure the size of the problem, usually called n.
Example: for sorting cards, n is the number of cards.
Different approaches can take different amounts oftime.
How long does the algorithm take proportional to n?
Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in
2002) that is hybrid of merge sort and insertion sort.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54
Comparing Algorithms
Measure the size of the problem, usually called n.
Example: for sorting cards, n is the number of cards.
Different approaches can take different amounts oftime.
How long does the algorithm take proportional to n?
Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in
2002) that is hybrid of merge sort and insertion sort.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54
Comparing Algorithms
Measure the size of the problem, usually called n.
Example: for sorting cards, n is the number of cards.
Different approaches can take different amounts oftime.
How long does the algorithm take proportional to n?
Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in
2002) that is hybrid of merge sort and insertion sort.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54
Comparing Algorithms
Measure the size of the problem, usually called n.
Example: for sorting cards, n is the number of cards.
Different approaches can take different amounts oftime.
How long does the algorithm take proportional to n?
Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in
2002) that is hybrid of merge sort and insertion sort.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54
Comparing Algorithms
Measure the size of the problem, usually called n.
Example: for sorting cards, n is the number of cards.
Different approaches can take different amounts oftime.
How long does the algorithm take proportional to n?
Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in
2002) that is hybrid of merge sort and insertion sort.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))
if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analysis of Algorithms
How long does the algorithm take proportional to n?
If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.
Then, the algorithm runs in linear time.
Would write “the running time is O(n).”(“big-Oh” notation).
Usually measure the worst-case running time.
Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,
f (n) < c · g(n)
(That is, after some point, f (n) is smaller thanc · g(n).)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).
(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Analyzing Sorts by Running Times
The sorting algorithms vary in running time, depending onnumber of elements and type of data.
Sorting Demo
Thinking about the worst-case, how many operations areperformed in bubbleSort?
def bubbleSort(a): #Let n be # of elements in a.
for j in range(1,len(a)): #Will go through this loop n times.
for i in range(1,len(a)): #Will go through this loop n times.
if a[i-1] > a[i]: #Takes constant time.
a[i-1], a[i] = a[i], a[i-1] #Takes constant time.
The lines in the if statement take constant time, but areperformed n · n time.
Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54
Complexity Classes: What is NP-hardness?
P?= NP: Roughly, if the answer to a
problem can be checked quickly, canit be computed quickly?
P stands for problems that can becomputed quickly (polynomial time).
NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).
Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54
Complexity Classes: What is NP-hardness?
P?= NP: Roughly, if the answer to a
problem can be checked quickly, canit be computed quickly?
P stands for problems that can becomputed quickly (polynomial time).
NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).
Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54
Complexity Classes: What is NP-hardness?
P?= NP: Roughly, if the answer to a
problem can be checked quickly, canit be computed quickly?
P stands for problems that can becomputed quickly (polynomial time).
NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).
Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54
Millennium Prize Problems
In 2000, the Clay Mathematics Institute an-nounced million dollar prizes for:
Birch and Swinnerton-DyerConjecture
Hodge Conjecture
Navier-Stokes Equations
P vs NP
Poincare Conjecture
Riemann Hypothesis
Yang-Mills Theory
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 9 / 54
Millenium Prize Problems
Grigori Perelman, 1993
In 2010, CMI announced Grigori Perelmansolved:
Birch and Swinnerton-DyerConjecture
Hodge Conjecture
Navier-Stokes Equations
P vs NP
Poincare Conjecture
Riemann Hypothesis
Yang-Mills Theory
He turned down the prize.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54
Millenium Prize Problems
Grigori Perelman, 1993
In 2010, CMI announced Grigori Perelmansolved:
Birch and Swinnerton-DyerConjecture
Hodge Conjecture
Navier-Stokes Equations
P vs NP
Poincare Conjecture
Riemann Hypothesis
Yang-Mills Theory
He turned down the prize.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54
P vs. NP
Birch and Swinnerton-DyerConjecture
Hodge Conjecture
Navier-Stokes Equations
P vs. NP
Poincare Conjecture
Riemann Hypothesis
Yang-Mills Theory
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 11 / 54
More Examples of NP Problems
���������� ���
����
������
������
�������
����
����������� �����
������������
�������������
�������������
����
��� ���� ���
!���
��"������� ���
#��� ������$��� ���
%���
��$������� ���
&���
����� ���� �
����'�������� �����������
�'�����(��)(��
*'�)���
������)�'���
��)���)�'���
+�)(����,����,-
������
��.�-�),
��������''�����
�����(��
��������
�,'�����
������)�
��.�
,�����
���)��,�
������,��
��)���,��
��������
.��������
.-����(
������
������)�(��
.�����(��
��'�*�)���
��+���
���
�)� ���
��.�������
�),�����
.+
+/
�/�/
��������)��
��.�)���.��,
��+������
� � � � � � �
���,����.��
�'��)�
�)������'�����
����/
�//)/�/
�/0/
��' /��)-'���
�������������
��.*����'���1�'��)���)
�)�������.�)���'���
��2 �"�3$��3��� ����&�4�
����
0 50 100 150 Miles
0 50 100 150 200 Kilometers
��'�)���
0 100 200 300 400 Miles
0 100 200 300 400 500 600 Kilometers
��)����
Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.
Mont Tremblant
MonctonPresque Isle
Yellowknife
Sand Spit
Prince RupertTerrace
Smithers
Fort St. John
Fort McMurrayPrince George
Kamloops
KelownaNanaimo
Penticton
CastlegarCranbrook
LethbridgeMedicine Hat
Thunder Bay
Sault Ste.Marie
North Bay
Sarnia
Grande Prairie
Sudbury
TimminsRouyn-Noranda
Kingston
Baie-Comeau
Wabush
Mont-Joli
Gaspe
Charlottetown
Bathurst
Fredericton
Saint John
Sydney
Goose Bay
Deer LakeGander
Îles de la Madeliene
Windsor
Vancouver
Toronto
Edmonton
Calgary
Winnipeg
Halifax
Ottawa
Victoria
London
City
Regina
Saskatoon
Cullaton LakeEnnadai Lake
Saguenay
Bangor
Miami
Orlando
West Palm Beach
Portland
Seattle
Boise
San Jose
Las Vegas
LOS ANGELES
San Diego
SAN FRANCISCO Oakland
DENVER
Sacramento
Salt Lake City
Tucson
Phoenix/ScottsdaleAlbuquerque
Charleston
Colorado Springs
Greenville/Spartanburg
Savannah
Baltimore
Birmingham
HOUSTON(INTERCONTINENTAL)
Louisville
Memphis
Milwaukee
Philadelphia
San Antonio
St. Louis
Tampa/St. Petersburg
Charlotte
CLEVELAND
Dallas/Fort Worth
Detroit
Jacksonville
Kansas City
NewOrleans
New York (La Guardia) (J.F. Kennedy)
Norfolk/Virginia Beach
Omaha
Albany
Atlanta
Austin
Boston
Columbia
Columbus
NashvilleOklahoma City
Raleigh/Durham
Richmond
WASHINGTON, DC (DULLES)
Hartford/Springfield
Cincinnati
Bozeman
Orange County
Portland
Providence
NEW YORK (NEWARK)
Greensboro/High Point/Winston-Salem
Lexington
GrandRapids
Ft. Lauderdale/Hollywood
Syracuse
Buffalo/Niagara Falls
KnoxvilleTulsa
El Paso
Honolulu
Manchester
Ft. Myers
Kahului
Indianapolis
Minneapolis
Dayton
Allentown
Madison
Pittsburgh
Appleton/Fox Cities
Burlington
CedarRapids/Iowa City
Wausau
DesMoines
Ft.Wayne
Green Bay
WhitePlains
Lansing
Moline
Rochester
SouthBend/Elkhart/Mishawaka
Springfield
Spokane
Wichita
Lincoln
Missoula
Rapid City
Reno/Tahoe
Charleston
Traverse City
Akron/Canton StateCollege
Jackson Hole
Kona
Burbank
Gunnison/CrestedButte
Hayden/SteamboatSprings
Montrose
Vail/Eagle
Fargo
Gillette
Rock Springs
Crescent City
Eureka
Aspen
Wilkes Barre/Scranton
Bakersfield
Charlottesville
Chico
Carlsbad
Cody/Yellowstone
Casper
Eugene
Fresno
SiouxFalls
GrandJunction
Medford
Pasco
Palm Springs
Santa Barbara
Roanoke
Imperial
Inyokern
Monterey
San Luis Obispo
Santa Maria
Yuma
Modesto
Springfield
Redmond
Redding
(Reagan National)
Bismarck
Peoria
Asheville
Augusta
Pensacola
Myrtle Beach
Fayetteville/Ft. Bragg
Gainesville
Hilton Head Island
Huntsville/Decatur
Jacksonville
Long Island/Islip
New Bern
Tri-Cities Regional
Wilmington
Newport News/Williamsburg
GreenvilleNorthwestArkansas
Great Falls
LittleRock
Billings
AltoonaJohnstown
Beckley
ShenandoahValley
ClarksburgMorgantown
Helena
KlamathFalls
North Bend
Midland/Odessa
Chattanooga
Gulfport/Biloxi
Huntington
New Haven
Williamsport
Jackson Montgomery
Mobile
Salisbury
Newburgh
Ft. WaltonBeach
Florence
Durango
Paducah
Brownsville
BatonRouge
Corpus Christi
Harlingen
Laredo
McAllen
Daytona
Lubbock
Amarillo
Dallas (Love)
Waco
College Station
Lafayette
Alexandria
LakeCharles
Shreveport
Beaumont/Pt. Arthur
Tyler
Monroe
Victoria
Erie
LiberalDodge City
Great BendGarden City
Hays
Prescott
Hilo
Flint
Long BeachFlagstaff
Midland/Saginaw
Parkersburg
Lynchburg
Elmira
Hyannis
Bar Harbor
Presque Isle
Nassau
Tallahassee
Treasure Cay
Cat IslandAndros Town
Nantucket
LOS ANGELES
SAN FRANCISCO
DENVER
Toronto
Honolulu
Ontario
Kahului
HarrisburgLincoln
Kona
Fargo
Casper SiouxFalls
Bismarck
IthacaBinghamton
Idaho Falls
Kalispell
Billings Duluth
Jackson
Salisbury
Muskegon
Brownsville
Corpus Christi
Harlingen
Laredo
McAllen
Eau Claire
Houghton
Minot
Pierre
Alliance
Chadron
Scottsbluff
Liberal
Kearney
Laramie
Huron
McCook
Dodge CityGreat Bend
Hays
AlamosaPuebloCortez
Farmington
TelluridePage/Lake Powell
Show LowPrescott
Moab
Worland
Sheridan
DickinsonMiles City
SidneyWilliston
Glasgow
Lewistown
Visalia
Hilo
Kapalua
Key West
GrandIsland
Vernal
North PlatteCheyenne
Riverton
LOS ANGELES
SAN FRANCISCO
DENVER
to Anchorage
Bimini
Freeport
George Town
North EleutheraGovernors Harbour
Marsh Harbour
Jamestown
Dubois
BradfordFranklin
Lewisburg
Clearfield
Sarasota/Bradenton
Plattsburgh
Melbourne
Killeen
Del Rio
Mammoth Lakes
Hobbs
Glendive
St. George
�������������
New York (Penn Station)
Boston
Newark(Liberty)
New Haven
Philadelphia
Washington, DC
Stamford
Wilmington
Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service
Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.
Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.
Sudoku: find a solution to a (large) Sudoku puzzle.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54
More Examples of NP Problems
���������� ���
����
������
������
�������
����
����������� �����
������������
�������������
�������������
����
��� ���� ���
!���
��"������� ���
#��� ������$��� ���
%���
��$������� ���
&���
����� ���� �
����'�������� �����������
�'�����(��)(��
*'�)���
������)�'���
��)���)�'���
+�)(����,����,-
������
��.�-�),
��������''�����
�����(��
��������
�,'�����
������)�
��.�
,�����
���)��,�
������,��
��)���,��
��������
.��������
.-����(
������
������)�(��
.�����(��
��'�*�)���
��+���
���
�)� ���
��.�������
�),�����
.+
+/
�/�/
��������)��
��.�)���.��,
��+������
� � � � � � �
���,����.��
�'��)�
�)������'�����
����/
�//)/�/
�/0/
��' /��)-'���
�������������
��.*����'���1�'��)���)
�)�������.�)���'���
��2 �"�3$��3��� ����&�4�
����
0 50 100 150 Miles
0 50 100 150 200 Kilometers
��'�)���
0 100 200 300 400 Miles
0 100 200 300 400 500 600 Kilometers
��)����
Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.
Mont Tremblant
MonctonPresque Isle
Yellowknife
Sand Spit
Prince RupertTerrace
Smithers
Fort St. John
Fort McMurrayPrince George
Kamloops
KelownaNanaimo
Penticton
CastlegarCranbrook
LethbridgeMedicine Hat
Thunder Bay
Sault Ste.Marie
North Bay
Sarnia
Grande Prairie
Sudbury
TimminsRouyn-Noranda
Kingston
Baie-Comeau
Wabush
Mont-Joli
Gaspe
Charlottetown
Bathurst
Fredericton
Saint John
Sydney
Goose Bay
Deer LakeGander
Îles de la Madeliene
Windsor
Vancouver
Toronto
Edmonton
Calgary
Winnipeg
Halifax
Ottawa
Victoria
London
City
Regina
Saskatoon
Cullaton LakeEnnadai Lake
Saguenay
Bangor
Miami
Orlando
West Palm Beach
Portland
Seattle
Boise
San Jose
Las Vegas
LOS ANGELES
San Diego
SAN FRANCISCO Oakland
DENVER
Sacramento
Salt Lake City
Tucson
Phoenix/ScottsdaleAlbuquerque
Charleston
Colorado Springs
Greenville/Spartanburg
Savannah
Baltimore
Birmingham
HOUSTON(INTERCONTINENTAL)
Louisville
Memphis
Milwaukee
Philadelphia
San Antonio
St. Louis
Tampa/St. Petersburg
Charlotte
CLEVELAND
Dallas/Fort Worth
Detroit
Jacksonville
Kansas City
NewOrleans
New York (La Guardia) (J.F. Kennedy)
Norfolk/Virginia Beach
Omaha
Albany
Atlanta
Austin
Boston
Columbia
Columbus
NashvilleOklahoma City
Raleigh/Durham
Richmond
WASHINGTON, DC (DULLES)
Hartford/Springfield
Cincinnati
Bozeman
Orange County
Portland
Providence
NEW YORK (NEWARK)
Greensboro/High Point/Winston-Salem
Lexington
GrandRapids
Ft. Lauderdale/Hollywood
Syracuse
Buffalo/Niagara Falls
KnoxvilleTulsa
El Paso
Honolulu
Manchester
Ft. Myers
Kahului
Indianapolis
Minneapolis
Dayton
Allentown
Madison
Pittsburgh
Appleton/Fox Cities
Burlington
CedarRapids/Iowa City
Wausau
DesMoines
Ft.Wayne
Green Bay
WhitePlains
Lansing
Moline
Rochester
SouthBend/Elkhart/Mishawaka
Springfield
Spokane
Wichita
Lincoln
Missoula
Rapid City
Reno/Tahoe
Charleston
Traverse City
Akron/Canton StateCollege
Jackson Hole
Kona
Burbank
Gunnison/CrestedButte
Hayden/SteamboatSprings
Montrose
Vail/Eagle
Fargo
Gillette
Rock Springs
Crescent City
Eureka
Aspen
Wilkes Barre/Scranton
Bakersfield
Charlottesville
Chico
Carlsbad
Cody/Yellowstone
Casper
Eugene
Fresno
SiouxFalls
GrandJunction
Medford
Pasco
Palm Springs
Santa Barbara
Roanoke
Imperial
Inyokern
Monterey
San Luis Obispo
Santa Maria
Yuma
Modesto
Springfield
Redmond
Redding
(Reagan National)
Bismarck
Peoria
Asheville
Augusta
Pensacola
Myrtle Beach
Fayetteville/Ft. Bragg
Gainesville
Hilton Head Island
Huntsville/Decatur
Jacksonville
Long Island/Islip
New Bern
Tri-Cities Regional
Wilmington
Newport News/Williamsburg
GreenvilleNorthwestArkansas
Great Falls
LittleRock
Billings
AltoonaJohnstown
Beckley
ShenandoahValley
ClarksburgMorgantown
Helena
KlamathFalls
North Bend
Midland/Odessa
Chattanooga
Gulfport/Biloxi
Huntington
New Haven
Williamsport
Jackson Montgomery
Mobile
Salisbury
Newburgh
Ft. WaltonBeach
Florence
Durango
Paducah
Brownsville
BatonRouge
Corpus Christi
Harlingen
Laredo
McAllen
Daytona
Lubbock
Amarillo
Dallas (Love)
Waco
College Station
Lafayette
Alexandria
LakeCharles
Shreveport
Beaumont/Pt. Arthur
Tyler
Monroe
Victoria
Erie
LiberalDodge City
Great BendGarden City
Hays
Prescott
Hilo
Flint
Long BeachFlagstaff
Midland/Saginaw
Parkersburg
Lynchburg
Elmira
Hyannis
Bar Harbor
Presque Isle
Nassau
Tallahassee
Treasure Cay
Cat IslandAndros Town
Nantucket
LOS ANGELES
SAN FRANCISCO
DENVER
Toronto
Honolulu
Ontario
Kahului
HarrisburgLincoln
Kona
Fargo
Casper SiouxFalls
Bismarck
IthacaBinghamton
Idaho Falls
Kalispell
Billings Duluth
Jackson
Salisbury
Muskegon
Brownsville
Corpus Christi
Harlingen
Laredo
McAllen
Eau Claire
Houghton
Minot
Pierre
Alliance
Chadron
Scottsbluff
Liberal
Kearney
Laramie
Huron
McCook
Dodge CityGreat Bend
Hays
AlamosaPuebloCortez
Farmington
TelluridePage/Lake Powell
Show LowPrescott
Moab
Worland
Sheridan
DickinsonMiles City
SidneyWilliston
Glasgow
Lewistown
Visalia
Hilo
Kapalua
Key West
GrandIsland
Vernal
North PlatteCheyenne
Riverton
LOS ANGELES
SAN FRANCISCO
DENVER
to Anchorage
Bimini
Freeport
George Town
North EleutheraGovernors Harbour
Marsh Harbour
Jamestown
Dubois
BradfordFranklin
Lewisburg
Clearfield
Sarasota/Bradenton
Plattsburgh
Melbourne
Killeen
Del Rio
Mammoth Lakes
Hobbs
Glendive
St. George
�������������
New York (Penn Station)
Boston
Newark(Liberty)
New Haven
Philadelphia
Washington, DC
Stamford
Wilmington
Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service
Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.
Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.
Sudoku: find a solution to a (large) Sudoku puzzle.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54
More Examples of NP Problems
���������� ���
����
������
������
�������
����
����������� �����
������������
�������������
�������������
����
��� ���� ���
!���
��"������� ���
#��� ������$��� ���
%���
��$������� ���
&���
����� ���� �
����'�������� �����������
�'�����(��)(��
*'�)���
������)�'���
��)���)�'���
+�)(����,����,-
������
��.�-�),
��������''�����
�����(��
��������
�,'�����
������)�
��.�
,�����
���)��,�
������,��
��)���,��
��������
.��������
.-����(
������
������)�(��
.�����(��
��'�*�)���
��+���
���
�)� ���
��.�������
�),�����
.+
+/
�/�/
��������)��
��.�)���.��,
��+������
� � � � � � �
���,����.��
�'��)�
�)������'�����
����/
�//)/�/
�/0/
��' /��)-'���
�������������
��.*����'���1�'��)���)
�)�������.�)���'���
��2 �"�3$��3��� ����&�4�
����
0 50 100 150 Miles
0 50 100 150 200 Kilometers
��'�)���
0 100 200 300 400 Miles
0 100 200 300 400 500 600 Kilometers
��)����
Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.
Mont Tremblant
MonctonPresque Isle
Yellowknife
Sand Spit
Prince RupertTerrace
Smithers
Fort St. John
Fort McMurrayPrince George
Kamloops
KelownaNanaimo
Penticton
CastlegarCranbrook
LethbridgeMedicine Hat
Thunder Bay
Sault Ste.Marie
North Bay
Sarnia
Grande Prairie
Sudbury
TimminsRouyn-Noranda
Kingston
Baie-Comeau
Wabush
Mont-Joli
Gaspe
Charlottetown
Bathurst
Fredericton
Saint John
Sydney
Goose Bay
Deer LakeGander
Îles de la Madeliene
Windsor
Vancouver
Toronto
Edmonton
Calgary
Winnipeg
Halifax
Ottawa
Victoria
London
City
Regina
Saskatoon
Cullaton LakeEnnadai Lake
Saguenay
Bangor
Miami
Orlando
West Palm Beach
Portland
Seattle
Boise
San Jose
Las Vegas
LOS ANGELES
San Diego
SAN FRANCISCO Oakland
DENVER
Sacramento
Salt Lake City
Tucson
Phoenix/ScottsdaleAlbuquerque
Charleston
Colorado Springs
Greenville/Spartanburg
Savannah
Baltimore
Birmingham
HOUSTON(INTERCONTINENTAL)
Louisville
Memphis
Milwaukee
Philadelphia
San Antonio
St. Louis
Tampa/St. Petersburg
Charlotte
CLEVELAND
Dallas/Fort Worth
Detroit
Jacksonville
Kansas City
NewOrleans
New York (La Guardia) (J.F. Kennedy)
Norfolk/Virginia Beach
Omaha
Albany
Atlanta
Austin
Boston
Columbia
Columbus
NashvilleOklahoma City
Raleigh/Durham
Richmond
WASHINGTON, DC (DULLES)
Hartford/Springfield
Cincinnati
Bozeman
Orange County
Portland
Providence
NEW YORK (NEWARK)
Greensboro/High Point/Winston-Salem
Lexington
GrandRapids
Ft. Lauderdale/Hollywood
Syracuse
Buffalo/Niagara Falls
KnoxvilleTulsa
El Paso
Honolulu
Manchester
Ft. Myers
Kahului
Indianapolis
Minneapolis
Dayton
Allentown
Madison
Pittsburgh
Appleton/Fox Cities
Burlington
CedarRapids/Iowa City
Wausau
DesMoines
Ft.Wayne
Green Bay
WhitePlains
Lansing
Moline
Rochester
SouthBend/Elkhart/Mishawaka
Springfield
Spokane
Wichita
Lincoln
Missoula
Rapid City
Reno/Tahoe
Charleston
Traverse City
Akron/Canton StateCollege
Jackson Hole
Kona
Burbank
Gunnison/CrestedButte
Hayden/SteamboatSprings
Montrose
Vail/Eagle
Fargo
Gillette
Rock Springs
Crescent City
Eureka
Aspen
Wilkes Barre/Scranton
Bakersfield
Charlottesville
Chico
Carlsbad
Cody/Yellowstone
Casper
Eugene
Fresno
SiouxFalls
GrandJunction
Medford
Pasco
Palm Springs
Santa Barbara
Roanoke
Imperial
Inyokern
Monterey
San Luis Obispo
Santa Maria
Yuma
Modesto
Springfield
Redmond
Redding
(Reagan National)
Bismarck
Peoria
Asheville
Augusta
Pensacola
Myrtle Beach
Fayetteville/Ft. Bragg
Gainesville
Hilton Head Island
Huntsville/Decatur
Jacksonville
Long Island/Islip
New Bern
Tri-Cities Regional
Wilmington
Newport News/Williamsburg
GreenvilleNorthwestArkansas
Great Falls
LittleRock
Billings
AltoonaJohnstown
Beckley
ShenandoahValley
ClarksburgMorgantown
Helena
KlamathFalls
North Bend
Midland/Odessa
Chattanooga
Gulfport/Biloxi
Huntington
New Haven
Williamsport
Jackson Montgomery
Mobile
Salisbury
Newburgh
Ft. WaltonBeach
Florence
Durango
Paducah
Brownsville
BatonRouge
Corpus Christi
Harlingen
Laredo
McAllen
Daytona
Lubbock
Amarillo
Dallas (Love)
Waco
College Station
Lafayette
Alexandria
LakeCharles
Shreveport
Beaumont/Pt. Arthur
Tyler
Monroe
Victoria
Erie
LiberalDodge City
Great BendGarden City
Hays
Prescott
Hilo
Flint
Long BeachFlagstaff
Midland/Saginaw
Parkersburg
Lynchburg
Elmira
Hyannis
Bar Harbor
Presque Isle
Nassau
Tallahassee
Treasure Cay
Cat IslandAndros Town
Nantucket
LOS ANGELES
SAN FRANCISCO
DENVER
Toronto
Honolulu
Ontario
Kahului
HarrisburgLincoln
Kona
Fargo
Casper SiouxFalls
Bismarck
IthacaBinghamton
Idaho Falls
Kalispell
Billings Duluth
Jackson
Salisbury
Muskegon
Brownsville
Corpus Christi
Harlingen
Laredo
McAllen
Eau Claire
Houghton
Minot
Pierre
Alliance
Chadron
Scottsbluff
Liberal
Kearney
Laramie
Huron
McCook
Dodge CityGreat Bend
Hays
AlamosaPuebloCortez
Farmington
TelluridePage/Lake Powell
Show LowPrescott
Moab
Worland
Sheridan
DickinsonMiles City
SidneyWilliston
Glasgow
Lewistown
Visalia
Hilo
Kapalua
Key West
GrandIsland
Vernal
North PlatteCheyenne
Riverton
LOS ANGELES
SAN FRANCISCO
DENVER
to Anchorage
Bimini
Freeport
George Town
North EleutheraGovernors Harbour
Marsh Harbour
Jamestown
Dubois
BradfordFranklin
Lewisburg
Clearfield
Sarasota/Bradenton
Plattsburgh
Melbourne
Killeen
Del Rio
Mammoth Lakes
Hobbs
Glendive
St. George
�������������
New York (Penn Station)
Boston
Newark(Liberty)
New Haven
Philadelphia
Washington, DC
Stamford
Wilmington
Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service
Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.
Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.
Sudoku: find a solution to a (large) Sudoku puzzle.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54
P?= NP : Why Does It Matter?
wiki
If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then
Many security systems (suchas the Data EncryptionStandard (DES) used to sendATM/bank data) would beeasily breached.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 13 / 54
P?= NP : Why Does It Matter?
���������� ���
����
������
������
�������
����
����������� �����
������������
�������������
�������������
����
��� ���� ���
!���
��"������� ���
#��� ������$��� ���
%���
��$������� ���
&���
����� ���� �
����'�������� �����������
�'�����(��)(��
*'�)���
������)�'���
��)���)�'���
+�)(����,����,-
������
��.�-�),
��������''�����
�����(��
��������
�,'�����
������)�
��.�
,�����
���)��,�
������,��
��)���,��
��������
.��������
.-����(
������
������)�(��
.�����(��
��'�*�)���
��+���
���
�)� ���
��.�������
�),�����
.+
+/
�/�/
��������)��
��.�)���.��,
��+������
� � � � � � �
���,����.��
�'��)�
�)������'�����
����/
�//)/�/
�/0/
��' /��)-'���
�������������
��.*����'���1�'��)���)
�)�������.�)���'���
��2 �"�3$��3��� ����&�4�
����
0 50 100 150 Miles
0 50 100 150 200 Kilometers
��'�)���
0 100 200 300 400 Miles
0 100 200 300 400 500 600 Kilometers
��)����
Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.
Mont Tremblant
MonctonPresque Isle
Yellowknife
Sand Spit
Prince RupertTerrace
Smithers
Fort St. John
Fort McMurrayPrince George
Kamloops
KelownaNanaimo
Penticton
CastlegarCranbrook
LethbridgeMedicine Hat
Thunder Bay
Sault Ste.Marie
North Bay
Sarnia
Grande Prairie
Sudbury
TimminsRouyn-Noranda
Kingston
Baie-Comeau
Wabush
Mont-Joli
Gaspe
Charlottetown
Bathurst
Fredericton
Saint John
Sydney
Goose Bay
Deer LakeGander
Îles de la Madeliene
Windsor
Vancouver
Toronto
Edmonton
Calgary
Winnipeg
Halifax
Ottawa
Victoria
London
City
Regina
Saskatoon
Cullaton LakeEnnadai Lake
Saguenay
Bangor
Miami
Orlando
West Palm Beach
Portland
Seattle
Boise
San Jose
Las Vegas
LOS ANGELES
San Diego
SAN FRANCISCO Oakland
DENVER
Sacramento
Salt Lake City
Tucson
Phoenix/ScottsdaleAlbuquerque
Charleston
Colorado Springs
Greenville/Spartanburg
Savannah
Baltimore
Birmingham
HOUSTON(INTERCONTINENTAL)
Louisville
Memphis
Milwaukee
Philadelphia
San Antonio
St. Louis
Tampa/St. Petersburg
Charlotte
CLEVELAND
Dallas/Fort Worth
Detroit
Jacksonville
Kansas City
NewOrleans
New York (La Guardia) (J.F. Kennedy)
Norfolk/Virginia Beach
Omaha
Albany
Atlanta
Austin
Boston
Columbia
Columbus
NashvilleOklahoma City
Raleigh/Durham
Richmond
WASHINGTON, DC (DULLES)
Hartford/Springfield
Cincinnati
Bozeman
Orange County
Portland
Providence
NEW YORK (NEWARK)
Greensboro/High Point/Winston-Salem
Lexington
GrandRapids
Ft. Lauderdale/Hollywood
Syracuse
Buffalo/Niagara Falls
KnoxvilleTulsa
El Paso
Honolulu
Manchester
Ft. Myers
Kahului
Indianapolis
Minneapolis
Dayton
Allentown
Madison
Pittsburgh
Appleton/Fox Cities
Burlington
CedarRapids/Iowa City
Wausau
DesMoines
Ft.Wayne
Green Bay
WhitePlains
Lansing
Moline
Rochester
SouthBend/Elkhart/Mishawaka
Springfield
Spokane
Wichita
Lincoln
Missoula
Rapid City
Reno/Tahoe
Charleston
Traverse City
Akron/Canton StateCollege
Jackson Hole
Kona
Burbank
Gunnison/CrestedButte
Hayden/SteamboatSprings
Montrose
Vail/Eagle
Fargo
Gillette
Rock Springs
Crescent City
Eureka
Aspen
Wilkes Barre/Scranton
Bakersfield
Charlottesville
Chico
Carlsbad
Cody/Yellowstone
Casper
Eugene
Fresno
SiouxFalls
GrandJunction
Medford
Pasco
Palm Springs
Santa Barbara
Roanoke
Imperial
Inyokern
Monterey
San Luis Obispo
Santa Maria
Yuma
Modesto
Springfield
Redmond
Redding
(Reagan National)
Bismarck
Peoria
Asheville
Augusta
Pensacola
Myrtle Beach
Fayetteville/Ft. Bragg
Gainesville
Hilton Head Island
Huntsville/Decatur
Jacksonville
Long Island/Islip
New Bern
Tri-Cities Regional
Wilmington
Newport News/Williamsburg
GreenvilleNorthwestArkansas
Great Falls
LittleRock
Billings
AltoonaJohnstown
Beckley
ShenandoahValley
ClarksburgMorgantown
Helena
KlamathFalls
North Bend
Midland/Odessa
Chattanooga
Gulfport/Biloxi
Huntington
New Haven
Williamsport
Jackson Montgomery
Mobile
Salisbury
Newburgh
Ft. WaltonBeach
Florence
Durango
Paducah
Brownsville
BatonRouge
Corpus Christi
Harlingen
Laredo
McAllen
Daytona
Lubbock
Amarillo
Dallas (Love)
Waco
College Station
Lafayette
Alexandria
LakeCharles
Shreveport
Beaumont/Pt. Arthur
Tyler
Monroe
Victoria
Erie
LiberalDodge City
Great BendGarden City
Hays
Prescott
Hilo
Flint
Long BeachFlagstaff
Midland/Saginaw
Parkersburg
Lynchburg
Elmira
Hyannis
Bar Harbor
Presque Isle
Nassau
Tallahassee
Treasure Cay
Cat IslandAndros Town
Nantucket
LOS ANGELES
SAN FRANCISCO
DENVER
Toronto
Honolulu
Ontario
Kahului
HarrisburgLincoln
Kona
Fargo
Casper SiouxFalls
Bismarck
IthacaBinghamton
Idaho Falls
Kalispell
Billings Duluth
Jackson
Salisbury
Muskegon
Brownsville
Corpus Christi
Harlingen
Laredo
McAllen
Eau Claire
Houghton
Minot
Pierre
Alliance
Chadron
Scottsbluff
Liberal
Kearney
Laramie
Huron
McCook
Dodge CityGreat Bend
Hays
AlamosaPuebloCortez
Farmington
TelluridePage/Lake Powell
Show LowPrescott
Moab
Worland
Sheridan
DickinsonMiles City
SidneyWilliston
Glasgow
Lewistown
Visalia
Hilo
Kapalua
Key West
GrandIsland
Vernal
North PlatteCheyenne
Riverton
LOS ANGELES
SAN FRANCISCO
DENVER
to Anchorage
Bimini
Freeport
George Town
North EleutheraGovernors Harbour
Marsh Harbour
Jamestown
Dubois
BradfordFranklin
Lewisburg
Clearfield
Sarasota/Bradenton
Plattsburgh
Melbourne
Killeen
Del Rio
Mammoth Lakes
Hobbs
Glendive
St. George
�������������
New York (Penn Station)
Boston
Newark(Liberty)
New Haven
Philadelphia
Washington, DC
Stamford
Wilmington
Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service
United Airlines
If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then
Scheduling and routingquestions (such as theKnapsack question andTraveling Salesman Problem)could be done efficiently.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 14 / 54
P?= NP : Why Does It Matter?
wiki
If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then
Some hard biologicalquestions (such as proteinfolding) would be tractable.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 15 / 54
P?= NP
P?= NP: Roughly, if the answer to
a problem can be checked quickly,can it be computed quickly?
Solving this, will bring
fame,
fortune, and
change how algorithms aredesigned
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54
P?= NP
P?= NP: Roughly, if the answer to
a problem can be checked quickly,can it be computed quickly?
Solving this, will bring
fame,
fortune, and
change how algorithms aredesigned
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54
P?= NP
P?= NP: Roughly, if the answer to
a problem can be checked quickly,can it be computed quickly?
Solving this, will bring
fame,
fortune, and
change how algorithms aredesigned
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54
P?= NP
P?= NP: Roughly, if the answer to
a problem can be checked quickly,can it be computed quickly?
Solving this, will bring
fame,
fortune, and
change how algorithms aredesigned
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54
In Pairs: Analyze Running Time
Give upper bounds on the worst case running time of:
1 def double(n):
d = 2*n
return d
2 def sum(n):
s = 0
for i in range(n):
s += i
return s
3 def sum2(n):
return n*(n+1)/2
4 def findMin(numList):
m = numList[0]
for i in range(1,len(numList)):
if numList[i] < m:
m = numList[i]
return m
5 (Code from text):
6 (Code from text):
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 17 / 54
Analyze Space Requirements
How much space an algorithm uses canmatter.
If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.
Can easily overwhelm the memory onyour computer.
Can measure space usage as we did fortime complexity.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54
Analyze Space Requirements
How much space an algorithm uses canmatter.
If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.
Can easily overwhelm the memory onyour computer.
Can measure space usage as we did fortime complexity.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54
Analyze Space Requirements
How much space an algorithm uses canmatter.
If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.
Can easily overwhelm the memory onyour computer.
Can measure space usage as we did fortime complexity.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54
Analyze Space Requirements
How much space an algorithm uses canmatter.
If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.
Can easily overwhelm the memory onyour computer.
Can measure space usage as we did fortime complexity.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54
Analyze Space Requirements
insertionSort sorts “in place”, and use noadditional space.
Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54
Analyze Space Requirements
insertionSort sorts “in place”, and use noadditional space.
Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54
Analyze Space Requirements
insertionSort sorts “in place”, and use noadditional space.
Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54
In Pairs: Analyze Space Requirement
The running time of bubblesort is analyzed above. Give upper bounds on theworst case space needed for:
1 def double(n):
d = 2*n
return d
2 def sum(n):
s = 0
for i in range(n):
s += i
return s
3 def sum2(n):
return n*(n+1)/2
4 def findMin(numList):
m = numList[0]
for i in range(1,len(numList)):
if numList[i] < m:
m = numList[i]
return m
5 def findMaxDist(numList):
n = len(numList)
d = np.zeros(n,n)
for i in range(1,n):
for j in range(1,n)
d[i,j] = abs(i,j)
return np.amax(d)
6 def findMaxDist2(numList):
n = len(numList)
m = 0
for i in range(1,n):
for j in range(1,n)
if abs(i,j) > m:
m = abs(i,j)
return m
7 (Code from text):
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 20 / 54
More Complexity: Fixed Parameter Tractability
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.
Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54
More Complexity: Fixed Parameter Tractability
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.
Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54
More Complexity: Fixed Parameter Tractability
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.
For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54
More Complexity: Fixed Parameter Tractability
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
A
J
E
D
H
C
F
G
BA
I
J
I
H
G
F
ED
C
B
Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54
Recap: Small Parsimony Problem
Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.
Thinking in terms of time complexity:How long does it take?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54
Recap: Small Parsimony Problem
Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.
Thinking in terms of time complexity:How long does it take?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.
I Output: The parsimony score of the tree(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structure
I Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Algorithm Design: Scoring Trees Under Parsimony
AMNH
How do you code this?
I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree
(with respect to the leaf labels).
What data structures do you need?
I Tree structureI Count of the number changes
Algorithm:
I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).
I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.
F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: set
F Has functions for union and intersection ofsets.
F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.
F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position:
F If overlap, use that label.F If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1)
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54
Fitch’s Algorithm: Pseudocode
AMNH
First pass: Starting at the leaves, label the internal
leaves (with possible multiple labels):
I Given labels for children, compute label forthe parent:A T A T G
A A T T G → A AT AT T GI Go position by position: for-loop
F If overlap, use that label. if-statementF If no overlap, use the union.
I Useful Python container type: setF Has functions for union and intersection of
sets.F s1 = set(l1) set operations
s2 = set(l2)
print s1.intersection(s2)
print s1.union(s2)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 25 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G → A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:
A AT AT T G → A A T T GI For all other nodes, compare to the parent:
A T A T G
AT AT G ACT G → A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:
A T A T G
AT AT G ACT G → A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G
→ A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G → A T G T G
I Go position by position:F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G → A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G → A T G T GI Go position by position:
F If overlap, use that label.F If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54
Fitch’s Algorithm: Pseudocode
AMNH
Second pass: Starting at the root, choose a labeling,
then work towards the leaves minimizing the conflicts.
I At root, choose one labeling:A AT AT T G → A A T T G
I For all other nodes, compare to the parent:A T A T G
AT AT G ACT G → A T G T GI Go position by position: for-loop
F If overlap, use that label. if-statementF If no overlap, choose label from child.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 27 / 54
In Pairs: Analyze Fitch’s Algorithm
AMNH
What is the running time and space require-ments for:
First pass: Starting at the leaves, label theinternal leaves (with possible multiplelabels).
Second pass: Starting at the root, choosea labeling, then work towards the leavesminimizing the conflicts.
Print out all tree on n leaves.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 28 / 54
Searching for Optimal Trees
Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.
Visually: think of the trees on a 2D map andthe height above sea level is the score.
Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54
Searching for Optimal Trees
Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.
Visually: think of the trees on a 2D map andthe height above sea level is the score.
Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54
Searching for Optimal Trees
Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.
Visually: think of the trees on a 2D map andthe height above sea level is the score.
Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54
Analogy: Find the Highest Point
polymaps.org
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 30 / 54
Analogy: Find the Highest Point
Sampling:
Choose 1000 random points.
Find height at each point.
Output the sampled point with largestheight.
Will you reach the highest point?
Only if very lucky or a very dense sample.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54
Analogy: Find the Highest Point
Sampling:
Choose 1000 random points.
Find height at each point.
Output the sampled point with largestheight.
Will you reach the highest point?
Only if very lucky or a very dense sample.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54
Analogy: Find the Highest Point
Sampling:
Choose 1000 random points.
Find height at each point.
Output the sampled point with largestheight.
Will you reach the highest point?
Only if very lucky or a very dense sample.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54
Analogy: Find the Highest Point
Sampling:
Choose 1000 random points.
Find height at each point.
Output the sampled point with largestheight.
Will you reach the highest point?
Only if very lucky or a very dense sample.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54
Analogy: Find the Highest Point
Sampling:
Choose 1000 random points.
Find height at each point.
Output the sampled point with largestheight.
Will you reach the highest point?
Only if very lucky or a very dense sample.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Analogy: Find the Highest Point
Hill Climbing:
Start at the harbor.
Can see 25 meters in all directions.
Walk upwards, repeat.
Will you reach the highest point?
Maybe, but maybe not.
I Could reach small peaks, but missthe larger ones.
I Start in multiple places to see more.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a point
I Choose the next point from its neighbors (e.g. best scoring)I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)
I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Local Search Techniques
Goal: Find the point with the optimal score
Local search techniques prevail:
I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat
Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54
NNI Metric
C
ED
BA
G
F →
A
E
C
D
B
GF
The NNI distance between two trees is the minimal number of movesneeded to transform one to the other (NP-hard, DasGupta et al. ‘97).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 35 / 54
SPR Distance
F
G
ED
C
A B A B
F
G
ED
CA B
F
G
ED
C
SPR distance is the minimal number of moves that transforms onetree into the other.
SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).
SPR for unrooted trees is NP-hard (Hickey et al. ‘08).
SAT-based heuristic (Bonet & S ‘09).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54
SPR Distance
F
G
ED
C
A B A B
F
G
ED
CA B
F
G
ED
C
SPR distance is the minimal number of moves that transforms onetree into the other.
SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).
SPR for unrooted trees is NP-hard (Hickey et al. ‘08).
SAT-based heuristic (Bonet & S ‘09).
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54
TBR Distance
F
G
ED
C
A B
F
G
ED
C
A B
F
G
ED
C
A B BA
CD
EF
G
TBR distance is the minimal number of moves that transforms onetree into the other.
TBR is NP-hard and FPT. (Allen & Steel ‘01)
TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;
Bordewich, McCartin, & Semple ‘08)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54
TBR Distance
F
G
ED
C
A B
F
G
ED
C
A B
F
G
ED
C
A B BA
CD
EF
G
TBR distance is the minimal number of moves that transforms onetree into the other.
TBR is NP-hard and FPT. (Allen & Steel ‘01)
TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;
Bordewich, McCartin, & Semple ‘08)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54
TBR Distance
F
G
ED
C
A B
F
G
ED
C
A B
F
G
ED
C
A B BA
CD
EF
G
TBR distance is the minimal number of moves that transforms onetree into the other.
TBR is NP-hard and FPT. (Allen & Steel ‘01)
TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;
Bordewich, McCartin, & Semple ‘08)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54
Tree Rearrangements
C D E
A
B
G
F
CF
A
B
D
E
GE
A
C
D
G
F
BA
F
C
B
D
E
GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection
(NNI) (SPR) (TBR)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54
Tree Rearrangements
C D E
A
B
G
F
CF
A
B
D
E
GE
A
C
D
G
F
BA
F
C
B
D
E
GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection
(NNI) (SPR) (TBR)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54
Metrics Matter
NNI & SPR Isoscope
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54
Metrics Matter
NNI & SPR Isoscope
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54
Landscapes
Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)
A treespace with assigned scores is often called a landscape.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 40 / 54
What does the landscape look like?
Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).
(from wikipedia)
If very smooth, ‘hill climbing’ will workwell.
The phylogeny problem is that of finding those trees which optimise some function of the input data.
We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.
If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54
What does the landscape look like?
Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).
(from wikipedia)
If very smooth, ‘hill climbing’ will workwell.
The phylogeny problem is that of finding those trees which optimise some function of the input data.
We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.
If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54
What does the landscape look like?
Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).
(from wikipedia)
If very smooth, ‘hill climbing’ will workwell.
The phylogeny problem is that of finding those trees which optimise some function of the input data.
We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.
If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54
Adjusting Search Space
SPR metric NNI metric
Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)
The same data, organized by different tree metrics.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 42 / 54
Attraction Basins
resalliance.org
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 43 / 54
Adjusting Search Space
SPR metric NNI metric
Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)
Simplest Case: for compatible character sequences (‘perfect data’):
Under SPR, there is a single attraction basin.
Under NNI, multiple attraction basins occur even for perfect data.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 44 / 54
In Pairs: Tree Rearrangements
F
G
ED
C
A B
For the tree on the left:
1 What are the NNI neighbors?
2 Give a SPR neighbor that is not a NNI neighbor.
3 Give a TBR neighbor that is not an SPR neighbor.
4 Find a tree that is NNI distance 3 but SPRdistance 1.
5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54
In Pairs: Tree Rearrangements
F
G
ED
C
A B
For the tree on the left:
1 What are the NNI neighbors?
2 Give a SPR neighbor that is not a NNI neighbor.
3 Give a TBR neighbor that is not an SPR neighbor.
4 Find a tree that is NNI distance 3 but SPRdistance 1.
5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54
In Pairs: Tree Rearrangements
F
G
ED
C
A B
For the tree on the left:
1 What are the NNI neighbors?
2 Give a SPR neighbor that is not a NNI neighbor.
3 Give a TBR neighbor that is not an SPR neighbor.
4 Find a tree that is NNI distance 3 but SPRdistance 1.
5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54
In Pairs: Tree Rearrangements
F
G
ED
C
A B
For the tree on the left:
1 What are the NNI neighbors?
2 Give a SPR neighbor that is not a NNI neighbor.
3 Give a TBR neighbor that is not an SPR neighbor.
4 Find a tree that is NNI distance 3 but SPRdistance 1.
5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54
In Pairs: Tree Rearrangements
F
G
ED
C
A B
For the tree on the left:
1 What are the NNI neighbors?
2 Give a SPR neighbor that is not a NNI neighbor.
3 Give a TBR neighbor that is not an SPR neighbor.
4 Find a tree that is NNI distance 3 but SPRdistance 1.
5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54
Weighted Trees
Philippe et al., ‘05
Branch weights are part of the model.
Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54
Weighted Trees
Philippe et al., ‘05
Branch weights are part of the model.Indicated by length of edges in drawing.
Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54
Weighted Trees
Philippe et al., ‘05
Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.
The metrics and search spaces above treat them as identical.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54
Weighted Trees
Philippe et al., ‘05
Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54
Popular Tree Metrics
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum
Those based on tree rearrangements:
Subtree Prune and Regraft (SPR)
Tree Bisection and Reconnection (TBR)
Nearest Neighbor Interchange (NNI)
Used for Searching for Optimal Trees, NP-hard
Those based on comparing tree vectors:
Robinson-Foulds (RF)
Rooted Triples (RT)
Quartet Distance
Billera-Holmes-Vogtmann (BHV or geodesic))
Used for comparing trees, poly time
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54
Trees as Vectors
1
2 3 4
5
T1
1
2 4 3
5
T2T0
12|345
124|35
123|45K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 48 / 54
Trees as Vectors
1
2 3 4
5
T1
1
2 4 3
5
T2T0
12|345
124|35
123|45
1|2
34
52|1
34
53|1
24
54|1
23
55|1
23
41
2|3
45
13|2
45
14|2
35
15|2
34
23|1
45
24|1
35
25|1
34
34|1
25
35|1
24
45|1
23
T0 = (1, 2, 3, 4, 5) 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0T1 = ((1, 2), (3, (4, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1T2 = ((1, 2), (4, (3, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 49 / 54
BHV Distance
Billera, Holmes, Vogtmann ‘01
Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.
View each split in a tree as acoordinate in the space.
Identify edges of orthants to formspace
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54
BHV Distance
Billera, Holmes, Vogtmann ‘01
Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.
View each split in a tree as acoordinate in the space.
Identify edges of orthants to formspace
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54
BHV Distance
Billera, Holmes, Vogtmann ‘01
Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.
View each split in a tree as acoordinate in the space.
Identify edges of orthants to formspace
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54
Identify Edges of Orthants
(All images from Billera, Holmes, Vogtmann ‘01)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54
Identify Edges of Orthants
(All images from Billera, Holmes, Vogtmann ‘01)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54
Identify Edges of Orthants
(All images from Billera, Holmes, Vogtmann ‘01)
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54
BHV & NNI Distances
1 5
423
1 5
432
3 5
412
123|45
12|345
23|145
13|245
1
2 3 4
5
(1,1,0,0)
(1,0,1,0)
1
32
4
5
3
2
1
4
5
(3,2,0,0)
1
2 3 4
5
(3,0,0,2)
Simplest move between orthants: shrink coordinate/edge and expand.
Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54
BHV & NNI Distances
1 5
423
1 5
432
3 5
412
123|45
12|345
23|145
13|245
1
2 3 4
5
(1,1,0,0)
(1,0,1,0)
1
32
4
5
3
2
1
4
5
(3,2,0,0)
1
2 3 4
5
(3,0,0,2)
Simplest move between orthants: shrink coordinate/edge and expand.
Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54
In Pairs: Continuous Treespace
1
2 3 4
5
T1
1
2 4 3
5
T2T0
12|345
124|35
123|45
123|45
12|345
23|145
13|245
1
2 3 4
5
(1,1,0,0)
(1,0,1,0)
1
32
4
5
3
2
1
4
5
(3,2,0,0)
1
2 3 4
5
(3,0,0,2)
1 What is the distance between T1 and T2?
2 What is the average (tree at the midpoint) ofT1 and T2?
3 Give three trees that have distance 1 to theorigin, T0 (star tree).
4 Lower figure: what is the distance between(3,0,0,2) and (1,1,0,0)?
5 What is a good average/consensus for thefour trees?
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 53 / 54
Recap
Efficiency (in time and space) matterswhen data gets large.
More on tree searching next lecture.
Email lab reports to kstjohn@amnh.org.
Challenges available at rosalind.info.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54
Recap
Efficiency (in time and space) matterswhen data gets large.
More on tree searching next lecture.
Email lab reports to kstjohn@amnh.org.
Challenges available at rosalind.info.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54
Recap
Efficiency (in time and space) matterswhen data gets large.
More on tree searching next lecture.
Email lab reports to kstjohn@amnh.org.
Challenges available at rosalind.info.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54
Recap
Efficiency (in time and space) matterswhen data gets large.
More on tree searching next lecture.
Email lab reports to kstjohn@amnh.org.
Challenges available at rosalind.info.
K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54
Recommended