199
CENTRE FOR NEWFOUNDLAND STUDIES TOTAL OF 10 PAGES ONLY MAY BE XEROXED (Without Author's Permission)

CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

CENTRE FOR NEWFOUNDLAND STUDIES

TOTAL OF 10 PAGES ONLY MAY BE XEROXED

(Without Author's Permission)

Page 2: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 3: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 4: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 5: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING

EVOLUTIONARY TREES

BY

0 II arold Todd Wareham

A tlu•sis submitted to th<' School of Graduate

Studi<•s in partial fulfillment of the

n•<ptir<'lll<'III.H for tht' dcgr<'<' of

Mnsl.cr of Science

Depnrtnwnt. of Computer Science

i\l<'llH>rial { Tui\'<'l'sity of Newfoundland

D<'Cl'lll her I 992

St .. John's Newfoundland

. . !; , . . I ;,

.. ~.

Page 6: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

1+1 Natlonal library of Canada

Bibliotheque nationale du Canada

Acquisitions and Direction des acquisitions et Bibliographic Services Branch des services bibtiographiques

395 Wellington Street Ottawa. Ontario K1AON4

395, rue Wellington Ottawa (Ontario) K1AON4

The author has granted an irrevocable non-exclusive licence allowing the National Library of Canada tp reproduce, loan, distribute or sell copies of his/her thesis by any means and In any form or format, making this thesis available to interested persons.

The author retains ownership of the copyright in his/her thesis. Neither the thesis nor subsiantial extracts from it may be printed or otherwise reproduced without his/her permission.

L'auteur a accorde une licence irrevocable et non exclusive permettant a Ia Bibliotheque nationale du Canada de reproduire, prater, distribuer ou vendre des copies de sa these de quelque manh3re et sous quelque forme que ce soit pour mettre des exemplaires de cette these a Ia disposition des personnos interessees.

L'auteur conserve Ia proprh~te du droit d'auteur qui protege sa these. Ni Ia these ni des extraits substantials de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.

ISBN 0-315-82&42-8

Canada

Page 7: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

' ' I , • ' ' • ' ~ - ' I '- '" ' •'' ' 1 ' • ' ' • 1 - - > I • • , o <., 1 0 ,.

Abstract

Tlw pror<'~~ of rC'ronst rurt i 11~ 1'\'olu t ioua r~· t fl'l's ra 11 ht• \'il'\\'c•d for­

mally as an opt.imi;mt.iou prohl<•m. HPrPiltly, dPrisiou pruhlPms assuriatPd

with lhC' most commonly IIRI'd approarlt~•s to n•c·oust.rurt inp; surh t I'I'Ps

have bC'ell shown to lw N P-rom pll•t (' [ l>a~·Hi. D.l SHH. l>S:O·Hi, DSHi. <: Fx~.

l\ri8~. Kl'viR6]. In this tltPsis. a fnlllii'Work is PstahlisiH•d th01t inrorpo·

ratPs all sttrh prohll'll1S st.udil'd t.o 1\all'. Within t.\ii:; framc•work, t \iP N J'.

completeness rPslllts for dC'dsiou prnhll'llls ar«' l'XI.I'nciPd h:.· appl,vinp; t.!JPo

rcms from [CT!ll, Ga.s~O. GI\IW2, .IVVXO, 1\STX!l, 1\n•XX, SP\!ll]to dl'l'iVP

bounds on t hP rompu tatiou01l <'om pl<'xi t.y nf SP\'Nal funr t. ion:; assol'ia t.Pcl

with l'arh of tlwsP probh•rns, U:llllPiy

• ('l'lllualiou fm~etiou.o;, whirh rPI.urn t.h<' ros1. of t.l11• opt.iural I !'PI'( H),

• solutiou fuurlimz.o;, whkh rl'l.llru "" opt.i111al t.r,.,.,

• SfHtn7li1lfJ ftmclions, which rr•t.uru t.lrP IIIIIIIIH'r of optimal t.n•P:-;,

• f:11llmcratirm funcfiow~, whidr sy:-;I.PIIrat.ka.lly ''IIIIIJJNal.f' all opl.it11al

trees, and

• rmulom-.w~/('(:/irm fuw:tion.o;, whirh rPt 11 rn a. ra udoruly-:-;P]l'ri.Pd "''''" ·

her of the sPt of optima.! trf'I'S.

Where applicable, bounds arr> also pr('Sf'lli.Pd for tlrt~ VPrsiou:-; of r.lti'SI' fu rw­

tions that arc restrict('(! to tn~es of a ~iVI'II rost or of m:-;t IP.'iH t.lla 11 or

greater than a given Jimit. Based iu part ou t.IH'SI~ r1~sHII.s aud t.IJI•ort•rw;

from (B1190, G.J79, KMB~I, Krr'!Hii], hounds an! d1~riV1~d on lrow dos1~)y

II

Page 8: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

polynorui;d-l.illlf' rtlp;oril.htrrs l'illl approximatf" Olllimal l.fi'I'S, In partiru-

Ia r, it. is slrow•r trsi rrp; tlr(• ff'l'l'lll. rl'strll.s of [A L~l SS!l1] t.ha I no plrylop;P·

urol.ir irrfl'rf'l.rr- optimal-rost. solution prohll'lll r-xamiuPd in this t.hPsis lras

a polynorni;.l-tiur" approxirnal.imr r.du•rrrl' un)Pss I' = Nl'.

Ill

1 j • . , .,

:• . ...

' • l I ~ J ·.

I t

·.~ .! :i

Page 9: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Acknowledgement.s

The rt'SI'C\ITh I'<'JHII'It'd in t hi:; t lwsis aud t lw :-;t udit•:-; lt•adin~ to it haw t akc·n

to ot.lu•rs; this is part.icnlarly tnw of ~raduatc• studc•nts. ~I .\' <tpnlo)!.it•s to thusc•

whose lu•lp I hH\'<' forgottt•IJ to or did 11ol uwut iotl. Tll<'ir \'arious kinclllt'S:oil's tnust

IH• acknowll'dg<'d hy t.IH' fa<'l. that I am. at. ]a:;t., writing t lwsl' <IC'knuwlc•d)!.t'llll'llts.

I would lik<' tot hauk tht• School of c:rarluat.c· St.udic·s. t.lw Alu1111ti Assnciat.iu11

of l'vl<•morial llniwrsit.y of N<'wfomullallcl, \Villiatrt Day. and Wlodc•k Zulll'l't'l\ fur

t.lw fiuaurial assistatu·c· Hllrl<'r whidt I c·ar-ric•d ont. t lw lind. t.wu .\'<'ill'S of Ill,\' st.ndic•s.

For tlw past t.wo ,YPilrs, I have• l)('t'll vc·ry forl.llltitl.t• to wmk part.-l.illll ' for Hidtmd

Grc•at.hatrh aud Brad dP Young of t.l11• Physic-al Oc·c•ilflo)!.rilphy p,ronp ;tl. till' f\IIJN

Dc•J>rt.rt.mt•Jit. of Physics. I would lik<' t.o t.ltilllk J,ot.lt t.ltt'llt illld t.lw st;.tr of tIll'

Physical Oc<'allography group for providin,l!; t.lll' lltoiiP.Y, t.lw c•nvirollllll'lll., """ t.llf'

pat.iPnn• that. ht~vc• allowPd lll<' t.o complc•t.c• aucl write• ttp rny n•sc•fltTh.

I would like• to thank tlu· Dc•aJI or Sdi'JIC'C', t.lll' l>l'illl of St. nclc•ttl. All'air:.; illlll

S<•rviees, tlw Carwcliau lustitut.c• for Aclvanc·c·d Hc•sc•an·h, Willia111 Day, ;111d J';1ttl

Gillard for till' g<'JI('rous trawl fu11ding I lta v<• r<'n'iVI'd owr t.lw last. four .Y''flf'S .

The conferences thc•y haV<' c~nahl<'d IIH' to aUf•tul will lu•twlit. 1111' ;dways.

I would like to thauk till' st.alf of tlw Qtll'<'ll ElizalH"tlt I I Lilm11·y for tlllll'lt

friendly assistance over tlw yc•ars. I owe a simiiHr rlc•J,t, to t.lw st.alf illld fanalt.y of

the Department of ComputPr Sd<•Jin•, part.icnhtrly Elaillc' Buotw, .lf'tlltifc•r But.tou,

IV

Page 10: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Slt;srun lfytll':i, P;,:rkia ~lnrplty. ~lik" H<l)'llH'Ill. Hllrl ~oltt11 \\'hitc•, Paul Fardy.

David Filir·ld, ittld Allan <:ouldiu,!!, lwiJwd with ty1wsdtiug this tl~<·sis in 1~\'rEX.

I know t.ltal. I ant only partially a wan• of all I hat ltns lu•c•n dotw for IIH' hy .Jmw

Foil z :tlld l'aul CiiJ;IJ·d. who ltm·l' sc•rwd as llc•arls of Ill,\' Dqwrt IIH'tll 0\'t'l' I he•

,.,,,m.,•· of Ill,\' st 11dil's: t.his stnall I. hanks tllllsl for tlo\\' surrin·.

I wo11ld likc• t.o I hank all of tlw JH'oplc· who Sl'ttl 1111' n•pri11ts and lllilttllscript.s: in

p;trt indar. I wo11ld lik«' to tltank ~lark 1\t·c•nll'l. ,\lall Sc•ltnan. ~l<tcllttt Sudan. and

Sc·inosttk«' Toda. whose• lltilttllscripts ar<' tlw basis for IIHIIIY n•sults in this tlu•sis. I

wo11ld also like• to t.h;tttk l.lw JWopiP who ha\'1' clarific·clmy I !tin king anclmy l'<'~lllt.s

\'iac•-IJlail and JH't'soual c'oll\'c•rsations. ltol.ahly .lo<' F('lsc•nstc•in. Bill Gasarrh, Antll

.lfl,!!,ol.a, .lint 1\fldin, .Joltatlnt•s 1\i)hlc·r. t\1ark 1\n•IJI.c•l, t\l<mriqll<' ~lt~l<l-~lontl'l'o,

( :ar,\· Olsc•tJ. Ala11 SPI111an. arul .Ja('()ho Tonln. In particular. I would lik<· to tlta11k

.loltantu•s 1\i)hlc·r, for sugg<'st.ing simplifications and rompl<'l.iolls for Sl'Vt•ral n•sults

in Sc•fl io11 ·I. I.~. 1\lmk h:rc•nt 1'1, for t.Jt,. di~russions t.hnl. h•d (among ot.lwr things)

I o the· NP-harclllc'ss rc•clliC't.ion for t.lw F11GT[2:::] prohl<•m, and Bill Gasarrh, who

is illclin•c·t.l,v rt•sponsihlt· for almost all r<'stdl.s n'JH>rt.<'d in this th('sis.

I would likt· to I hauk my many fric•nds, for putting up wit.h my <'l't'atic grad

sltlclc•tJI l)(•ltavior I hc•st• last. ft'\\' Yl'ars a11d sinmlt.attmusly making such contiuuC'd

ht•hador possihlt•.

I would like• tot lwnk my Jlfll'<'tlls for t.ltPir· individual and invaluabl(' support

I 11 lilt' o\·c•r Ill,\' ,\'<'ill's of slucly.

\'

Page 11: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Fiu<dly. I woultllik1' to t hau k my S11JH'r'·isor. \\.illiam Day. for his lllilll~· Yl'ilrs

of pat h•~tn•. sound addc1•, aucl assistalwt·.

VI

Page 12: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Contents

Ahst.rnd .. II

A(~kuowledgement.s iv

Cont.eut.s .. vn

List. of Tnbles X

J.:st. of Figures ..

XII

lr.at.rod net. ion 1

1.1 Orp;auizat iou of This TIH'sis .J

2 Notation 6

~.I ( :raplts. 11,\'fH'I'graphs. and Tn•t•s . ;

( 'o111p11t atioual ( 'omplt•xity Tlwory

3 Comput.at.ional Problems in Phylogenetic Systematics 23

:u Phylogt'JH'I ic Sysl<•mat.ks . ..... . .. . .. . .. .

NP-( 'ompl<'ll' Prohh·ms in PhylogPt)('tir Systt·matks.

:L:!. I :w

( ~harach•r ('om pat ihility !il

Dist atH't' ~tat rix Fitting 55

:t:!.·l Summar,\· 67

.. \'II

Page 13: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

4 The Computational Complexity of Phylogenet.ic Inferelll'c:' Ftmt~-

tions 71

. J.J Fuucl illll ( 'ompiPxity ( 'lasst•s .

·I. I. I <'las:-ws \\'ithi11 F/1"'

1' ;:t

·I. I.~ ( 'l<tsst•s \\'it hill FPSI',\( 'E(pol,\'): ( 'uuut ittp; ( 'lassc•s so

·1.2 E\'alnat ion Funrt ions :-IIi

t.:J Solu t io11 Fund ions . !I! I

·1.·1 Spa 1111ing F1111C't ions

.J.;) EnHIIII'I'ilf io11 Fund ions I U.l

.J.(i Handom Gt•Jic•rat ion Fund ions 111."1

·I.; Summary I IIi

5 The Approximability of Phylogenetic lafercnr:e Funct.iuus I OH

!).1 Typc•s of Approxinwhilit.v I 0! I

Absolul.t' Approximability IIi

!).:J Fully PolynomiHI and Polyuorni<~l Tinll' Approximation Sdll'lll!':o; II!J

!),.J HPiativc· A pproxinwhility I~X

!)}) Approxirm1hilit.y hy Nc•ural Ndwurks

T,.(i Summary I:Ui

6 Conclusion 1 :sn

References 141

\'Ill

Page 14: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

A Phylogenetic Systematics and the Inference of Reticulation 167

B The Computational Con&plexity of Phylogenetic Parsimony Prob-

lcms Incorporating Explicit Graphs 173

IX

Page 15: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• , 0 • w · • .... , . ,....,.. ..., ; • .,...,.. ~ ,,•~~ .. ,,_,OJ,UU.t t•~~ .. ··, .. ~"• ' •'-""' ' ~ ' ·•• •·1 •· ·· •~• _,. .. ,,.:,.

List of Tables

Phylogl'n<'t.ic pars i moll,\' cri t cria . . .

Bask NP-rompld(• d(•cision prohh·ms

:J Phylogetwtir parsinwny dt•cision prohl<•m sdH'ma1 a ( non-rdklllill t '

tn•ps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ;J~I

·I Phylogc net.ir pat·simony dt•rision proh!t•nt :·wht•mat.a ( llOll·rl'l.il'lllill c>

t.r<•es) (nmt.'d from Tahh• :J) . . . . . . . . . . . . . . . . . . . . . ·10

!) Applirahility of input rharacl.t•t· t't•strict.ions t.o phylop;c'll<'l.ic· parsi -

mony crit<•l'ia . . . . . . . . . . . . . . . . . . . . . . . . . . . . ·II

Phy logPttC'l.ic parsimony dPcision pro hl<•ms (noll- n•t. kula 1.<• t.ri'<'S)

7 Phylog<•net.ic parsimony dt•cisioll prohh•ms ( non-rl'l.inlla.t.<· t.n•t•s)

(cont'd from TabiC' 6) . . . . . . . . . . . . . . . . . . . . . . . . . ~Ia

8 Phylogenetir. p<trsimony d<•cision problt•m sd!l'mata. ( r<'l.iculat.P l.t't'<'s) ·JII

!J Reductions for phylogmt<'l.ic parsimony d<'cision prohl<•ms . . . . . ·IX

10 Reductions for phylog<~tll't.ic parliimony d<·c·iliion prol,l<•tns (c·out. 'cl

11

l:l

from Table !J) .•••••.•..

Steiner Tree decision problems .

Character compatibility decision prohll~llls

til

13 Character compatibility decision prol,lems (c:out.'d fro111 'J';d,), ~ I ~ ) r,;J

14

15

Reductions for character compatibility tb:isioll prol,l1~rns

Distance matrix fitting decision problmns ... . .. .. .

X

:if)

Page 16: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lfi 1\uxiliary dl'dsiou prohl<'IIIS for NP-ha.rdncss proofs of distanrc

lllid.l'ix fitting d<·cision probiPms .iS

17 H<·dur-t.ious for clistaur<' matrix fitting clc•c.ision problems. 62

I H i<<·duf'f.iotls for disf.aure Jllatrix fitt.iug ciC'cisiou prohiPms ( cont 'r!

frotll Tahl<· 17)

U) Corn•spowlmH'<' IH"I.W!'C' fl phylog<'tld.ic infPn'IIC!' probl('m.~ in this

l.lwsis and problt•ms in t.lw lit.<~ratun~ . . . . . . . • . . . . . . . 69

:W Conlpllt;d.ional c.omplc>xit.i<'li of phylop;l'lletic infc.•n•nn• functions . 107

~I Formul<~t.ions of SAT in firsl-ord<.>r logic . . . . . . . . . . . . . I 1·1

:!~ Formulatious of SAT in firsl-ordt>r logic {cont'd from Table 21 ) 115

'2:~ A polynomial-tim<' n•lative <lpproximation algorithm for STEINEH

THEE IN GHA PIJS . . . . . . . . . . . . . . . . . . . . . . . . . 1:30

'2·1 Approximahilit.y of phylog<'lll'tic infere11cc optimal-cost solution

fuurt.ious .......... , . . . . . . . . . . . . . . . . . . . . . 137

'2!'1 Nou-approximahility of various IC'vds of the Function Bmmdcd NP

Qu<'l'.\' Ilit•rarrhy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1:3i

XI

Page 17: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

List of Figur~s

Graphs and hypergraphs

2 Typl's of hyperarcs II

TypPs of nu•trks .

A discrl'te rhat·<lrl<'r t.r<'t' ;I()

Char<lcll'l'-:-il.at.e t.re<·s . . :11

6 Strudm<>ofsuhgraph Orr lls<'d inrt•duct.ion frotn X:H' t.o FBlfT~[/"t] Iii

7 R<~cluctions among phylog<'I!Pt.ic inf<•J't'JII'I' dl'rision pro!Jl(•Jlts. . . . fiX

8 Restriction n•d uct. ions among phy logPnl'l.ic parsi IJJOIIY dc•cisiou pruh-

lerns . . . . . . . . . . . . . . 70

9 Fund ion classes wit. hi 11 F pN 1' HI

10 Function classes within FPSP/\CE(poly): c·otlllt.iup; dm;sc•s

11 Difficulties with the rc•<luc.tion from wt•ight.Pd MIN- VEHTEX COVJo:H

to weighted phylogenetic parsimony Pvaluat.ion pro!,lt•IJJS !Jl

12 Conditions for multiple part.it.iou II~V<'Is ou :ml1p;raph (,',. . !17

1 :J Reductions among implicit. and c~xplkit graph lltJwc·i~IJt.Pcl Binary

Wagner parsimony decision prolJic!ms . . . . . . . . . . . . . . . . I 77

XII

Page 18: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

By a11d largc•, as the! mass of kuowh·d.e;c grows, meu devote little attC'ution to

t.lw d<!rt.d. Yl'l. it is !.lu~ dP<trl who an~ fn•<pwutly om pathfinders, and we walk all

IIIIC'Oii'H:iously alo11g t)IC' roads they have choscu for us.

Lorc•11 Eisdc~y, 7'/u: Man Who Saw Thmugh Time

Dediratl'd in Memory of

Loreu Eisdry

( Hl07 - i !l77)

XIII

•.

Page 19: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

o • ~ .._. .... -, • ,, - o o o , 1 ~ •, ,_ • • ·~· ., " ' , •• •' ' , " , ,. .., • 1 " '" •0'1-.,~

1 Introduction

Phylogenetic systematics is tlw suhdhwiplim• of t'\'olut.ionary hiolog~· that dt•als

with reconstructing t.ht• tr<'t' that. n·pn•st•nt.s t.ht• t•volut.ionary rPiat.ionships of a

set of sp£'cies. Such trePs a.n• ust•d t.o !'l't'HI.t• t.axouomic classilkatiuns for SIH'I"it•s

and to evaluat.<• altemative hypot.ht•st•s of adaptat.io11, <'\'oiut.ionar.r llll't'llltllisnl,

and ancient geographical n•lat.ionships [ECHO, FB!)O, N<•iH7, Nl'~l] . As Uu• dat.a

are seldom available to n•coust.ruct, t.lll' act.ua.l historical t.n•,•, ollt' t.akt•s as atl

estimat~ of this tree a. subset of t.lw s<'l. of all possihlt• t.n•Ps t.hat. Hl't' lu•st. n•lat.iw

to some biologically-relevant criterion.

Many approaches t.o n•coust.mct.iug <'Volut.ionary t.n•t•s haw bt'l'll dPVI'Iupt•d

over the last thirty y(-'at's [Fcl88, PIIS!l2, SO!Hl]. Th<•s<' approaclu•s aJ'<' of l.wo

types [S090, p. <112]: method-based approar.ht•s, wltirh int,•gn-.t.t• t.lu~ nit.Priou

for tree selection directly iuto the nwt.hod for searching t.IH' sd of all possihlt•

trees, a.nd criterion-based approar.hes iu whkh tlu~ nil.t•riou a.nd s1•ardt 111t'l.hwl

are distinct. Method-based a.pproa.dws oht.aiu opt.i111al t.n·<·s quickly, IHtl do not.

rank the suboptimal trees and the alternative hypotlH's<~s l'tw<HI!•d hy t.lu ~sl' trPt 's.

Criterion-based methods do give sur.h rankiugs J,ut, m1~ IIIUt:h slowPr l~t•l 'il.llsl'

known algorithms have to evaluate all possiblt~ t.wes; lwun~, pradkal implt~ rtll'll-

t.ations of criterion-based approaclws s!'ttle for d1~riviug approxinw.t. io11s t.o t.lu~

optimal trees rather thau the optimal trees t.h< ~ lllsi'IVI's,

Consider the formal computational problems assodat1~d wit.h t.lu~s•~ t.ypt~s of

Page 20: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

approaciH•s. TIH~ proJ,I<~ms assodatf'rl with nwt.hod-has<:d approaches typically

llltVI~ polyJJollri;d-t.irJH~ algorit.hrrrs, arrrl rtf(' thus of little interest hcrc. In this

I.IJ('sis, I will IH~ c:orrt·<~nwcl wit.h t.he problems associat.Pd with criterion-based

11pproadws. Since l!lH~, dedsion problems associated with the most comrnonly­

IISI'rl iiJlJll'oadrcs in phylogenetic. systematics have been shown to be NP-cornplete

[l>ayH7, D.ISHfi, DSHfi, DSB7, GF8~, l\ri88, I\M86]. While this implies tlrat related

prohl<~llrs such as prodrrcirrg optimal trees are harder than NP, it is not knowu

t'Xilctly how mudr harder these prohl<·ms are, or how closely fast algorithms may

approxi11ra.t.P optimal trees. This latt.er problem is especially important because

t.rc•<•s of slightly dilr<•rt•ut. or even the same ('()-; t, can imply very different evolu­

tionary hypotlws<•s [Mad!)!, p. :Jl!'i]. There arc many examples in the biological

lit.<•r·a.t.m<' of hypotlwses that have hccn modified or retracted in light of different

t•st.ima.t.<·s of t.hc optimal trC'e e.g. the "Out of Africa" hypothesis for the origin

of t.h<• human mit.rochondrial DNA gene pool [MR.S92, SSV92].

lu this t.lrt•sis, I will dc>rive hounds on the the computational complexities of

st•vt•ral fmrctious hasl'd on plrylogeuC'tic inference problems:

• r11alualion funtlion!i, which return the cost of the optimal tree(s) ;

• ,o;ofulion fuudion8, which rctmtr an optimal tree;

• .o;paunin!J funclion8, which return the number of optimal trees;

• rllllrllt'l'alioll fu111'1io118. whic-h syst.C'mat.ically enumerate all optimal trees;

2

Page 21: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

£1 . . .... ~~ "' · "· ·· • • ~· · · -:l. '( , ..... , . ... ~ •• • • • • • • • ....._ • ..._ •• • • , • • _ ..... - ..... _ .

and

• randonH~cicrlion ju11cfio11~. which n•t.urll a rarHiomly-st•IPd<•d lllt'llllH'I' ol'

the set of optimal tr·ces.

Rounds ar<.• also dcriv<.•d for t.hmw f11ndions that. rt't.um l.rt•t•s of a ,1!;1\'t'll cost

or of cost lrss than or grt'ai.Pr than a p;rvt•n limit.. In addit.iou, I will cl<•ri\'t'

homrds on th<.• approximahilit.y of t.ll<' solution funct.io11s a~~ociat.Pd wit.lr t.ht• urust.

commonly-used approaciH•s t.o phylog<'n<'l.ic i11f<•n•un•. Ht•sult.s an• p;i\'t'll rrot. onl~'

for evolutionary trees basPd <•xclusivl'ly o:r dirlrot.onwus SJH'<'iat.ioll t' V<'III.s hut. also

for trees incorporating surh evc>nt.s as hybridization and l'<'l'OIIIhi nat. ion.

The results in this thesis ha.Vl' ht'<'ll oht.aincd J,y applyi11p; t•xisl.illp; l.t·c ·huicfiiPS

t.o a set of closely-rela.t.Pd prohl<'ms. Tlr<'S<' rt•sult.s an• of :-;ip;trilica.u<'<' t.u nmrpll·

tational complexity t.heory t.o the t•xt.<~llt. t.lra.t., hy isola.ti11p; asp<·,·t.s of prohlc•nrs

that cause unexpected increases or dt~cn•ast•s in complt•xit.y, l. l~e •y sup;p;Psl. furt.lrc•r

avenues for research. The biological relevatH'<' of tiH'se r·,•sults is mor·c• prol.lt•Jit·

atic. Some biologists have argued that. t.hes<~ rPsult.s arP uol. i1pplintllh• IH't'il.usr•

( l) the defined problems are too geiJ(~ral, aud prohiPIIIH of prad.ic al int.c•n•s t. lllil,V

be solvable in polynomial time, ami (2) the framework of c1.syurptotk worst.-,·ww

analyses in which these n•fmlts were derived is llllr<~alistic. (.J. S. Farri!; and M. F.

Mickevich, personal commuukatiou). ltr formulat.iug tlw prolllmns r ~xamirrt••l in

this thesis, there have been undeuiablt• t.radmfrs of fiddit.y l.o J,iologind r••ality

for the sake of tractability of analysis. llowr!VI!r, such trad1•olfs uwlr~rl i r! mauy

Page 22: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Hppli,·atious CJf nuttlwmatir.s to real problems, and are not only mwvoidabk hut

rll'n•:-;sary iu tlw iuitiHI stagc~s of fiJI iiJwstigatioll. The pmposc• of this thesis is

not t.o pi'<'S<'III. I'I!SIIIt.s of dir<•d r<~I<'Vallr.<• to biologists, hut t.o lay a theoretical

franrc•work in wlrkh surlr rc•sults may <Ill«' day lw d<•rived.

1.1 Organization of This Thesis

This t.lwsis is laid out. i11 four sc!rl.ious.

lrr Sl'ct.ioll ~' I giw various dd1uitiolls used in this tlrPsis, including graph-

t.lrc•orl'!.ir dc•lillit.ions of IIOif-n•tirulatE~ a11d reticulate' evolutionary trees and an

intrwlul'l.ion to computational rompl<'xity theory.

In St•ct.iort :J, I n•view hasir. concepts in phylogenetic analysis as well as all

pn•viously-dt>liru•d phylogc•ndir. inf<'rcllre dcdsio11 problems and the reductions

h,v whidt tlt(•y haw lwPII shown to he NP-completc. A framework is given that

i ucorpor·atc•s all such prohiPms stud i<•d t.o date. This section also includes def-

i11it.ions aud rc•chwtions for II<'W prohlc•ms involving r<>ticulatc trees, as well as

S<'VPral tlt'W n·durtious for pr<'viously-drfined problems. The tree of reductions

arnotlA all prohlc•ms <'xamirwd in this t.ht'sis is shown in Figures i and 8, and the

rorn·spolldl'tH't' hl'hVP<'Il phylogl'nPtic iufNencc problems examined in this thesis

aud t hos<' in till' lill'l'at.lll'<' is gi\'c'll in Table 19.

In Sc•ct.ion ·1. I mw the OptP hierarchy [C:KH92, KreSS) and paddahili ty

[<'T~l I. (;as~<i] t.o classify tlw phylogt•m•tic inft•rt•nre evaluation problems into

·I

. -~ 'J ). '1

:1

' 1 ,) l 'l l I

j

·:

,, I I J

Page 23: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

t.wo groups with in F ""'I'. Tlw t'Olll plexi tit>:-; or t ht•:-;t• prohlt•JllS, a loll~ \\'it h St'\'t'I'H I

ot.hc>r prop<.•rt.ies of tllt'st' prohlt•ms <llld tlwort•ms from [.JV\'1'\(), l\ST:-\!1, Sc•HI 1]. an•

used to dl'rivl' houuds on tiH' t'omplt•xitit•s of tlu· associat.t•tl solution, spauuiu~.

C'llUillC'ration, atul random-g<'IH'rat.ion funrt.ious. All hounds and lwrdtu•:-;:-; t·c•sult.s

derived in this st•ction arl' Sllllllllariiwcl in Tahlc• :w.

In Sl'r.t.ion !), I USC' rc•sttlt.s from [HII!IO, (LJ7!1, 1\1\IB:-\1, 1\rc•:-\~] to ch•rivt• low<'!'

hounds on til<' approximahilit.y of phylogt•ndic inferc•twc• prohlt•Jits hy polynomial -

time algorithms. In particular, it. is shown usinp; t.ht• I'C'I'<'III. rc•sult.s of Arora

et al. [A LMSSD2] that no phyloppll!'l.ic illf<'I'C'III'C' opt.irnal-l'ost. solution prohlc•nt

cx<unincd in this t.h<'sis has a polynomial-t.inw approximation sdwmc• unlc•ss P

= NP. All bounds on approximahilit.y tlc•riwd in this sc•c·t.ion HI'<' stlltllllat'iiwcl in

Table ~4 .

Each of SC'rt.ions :3, ~.and I) ill'gins with a :mhsc•rt.ioll on nol.nl.ion particular l.c1

that section and c.oncludrs wit.h a s11m111ary of t.lw rPsnlt.s dPrivc-d in t.lta.t. sc•c·tiott.

Brief discussions of the biological n•levatU'<' of t.hc•sp rc•sult.s arc• p;ivc•u at. t.lu• c·tul

of each such summary.

Page 24: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

2 Notation

This sc•diou consists of graph-t.lworPtic d<'finit.ions of !'Volutionary tre('S and an

introdud.iou to f'omputa t. ioual complexity tlwory. First. t.hNe are some general

dt•li 11 i tious.

l>1•fit11' alphalu•t ~ = {0, I}, all strings .r as h<'ing nwmlwrs of~·, and all

laup,•tap;1•s /,as IH'inp; IJH•ndH'rs of~~·. LC't. l·tiiH' t.hc length of string .r, ILl be tlw

c·ardittality of L, aucl /./II<' tlw sPt of ;d) strings in L wi th )C'ngth /. For a language

/,, w-/, = ~· - /,. D<·filw (.1·, y) as an inv<•rtihiP funr.f.ion that C'tlrmlc•s pairs of

strings into a single· :.;tl'ing, aud \'J, as l.h<' r.haract.eristic function of languag(• L

i.c•. \ J,(.l') = I if .1' E Land 0 otlu•rwisc•.

L..t. jv = { n, I,~ • ... } IH' t h(' IJOilllCgat.iw i nt<·gers, Q+ be t lw II Oil ll<'gati \'r~

mt.ioual llllllllH'rs, an<l n+ lw tlw notiiJPgativc real nlltnhl'rS. Given functions

f : X --t r aud y : 1·· --t Z, lc•t. (lom(.f) and r11g(.f) he the domain and range

of .f, rc•spc•c·t. i \'C'Iy, and !I o f : X --t Z I><' thP compos it ion of .f and lJ i.e. (g o

f)(.,·) = !l(.f(.r)}. If fmiC't.ion .f is not dditu•d 011 input .1~ , then f(.t') = .L If

V.1· E .\' { lmy(f(.t·))l = I}, .f is ... iu,qlr-1mlurd; ('Is<', f is mullivalucd. A function

.f : .:\i --t }/is ·""'"olh if t.hc• fmwt.ion !1 : 111 --t I /(n) is polynomial-t ime computable

atul f{.r) $ .f(y) for all .r :5 ,11 [1\rd~S . p. ·19:J]. For an arbitrary total order Ron

hinary st l'inp;s. cll'fill<' tlw rmlt'l'illf/ Pu as the pair of functions ( E, L) such that

f(i . /) rl'lums Ill<' ith mc·mlwr of (~*) 1 und(•r Rand L(.r , y, /) indicates if J' :::; y

ttnclt•r U for .r. ,11 E (~·)';this tllt'sis will fortts on those ordNings for which E and

()

Page 25: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

L arC' computahlt• in polynomial t inw (st•t• St•rt ion ·t.:'i).

Tlwre art• S<'\'t•ral t~'JH'S of hounds for a llUilH'I'ical funrt ion. Tlu•st•l,ouut!s ntll

IH' reprPst•ntt'd hy clttssl's of functions [BJH::-::-:. p. :t!i]:

• O(.f) is t.IH' st't. of functions !I such that for somt• r· > II and fur all hut

fi IIi t(')y Ill a llj' II, .fJ( 11) < I' · .f( II).

• o(f) is t.hC' :-;Pt of funrtious !I such that. for t'\'t'l',\' I'> 0 t~nd fur ;,IJ hut linilt·l~·

many 11, !I( 11) < 1' • .f( 11 ) •

• n(f) is t.)w St'l. of functions .tJ such t.IHtl. for S()Jil(' ,. > II and fur inlinit.t{\'

many u, g(n) > r · .f(u).

Class<'s O(J), o(f), and H(f) rotTt•spotul t.o Joost• uppt•r, st.rid. uppt•r, and loost•

lower hounds, respertivl'ly. Tltt'S<' rlasst•s ( ' fill also ht• dl'litlt'd o\'1'1' wlllllt• das:ws

of functions ratht•r th<w a si11gl<' fuurtion t•.g. 0(7wly ), o(Tmlyloy), wht'l'l' 1mly =

Uk 1/ = nO( I) and polylo!J = Uk logk 11 = log0(l) 71. Alllo~J;arit.ltnts i11 t.his t.llt'sis

will be t.o base :l.

2.1 Graphs, Hypergraphs, and Trees

A gr·aph. G = ( V, E) is a set. V of V<~rtic<·s and a sl't. /~of l'dp;t•s s11d1 t.lt;ll. t•adt ('d.J!,t'

links a pair of vertices. Edges iu wltidt orw wrt.<•x is rll'sig11al.t~d t.lw soun·t• and

the other the target arc calle(l arcs; a graph Will post!d of arcs is a tiif'r·d,.,[ .f/T'flph..

A path between vertices u amlv iu a graph G is a S<!CJIII!III'I~ of altt!ruat.ill~ Vf!rt.in!s

7

Page 26: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

<llld l'd.f.!,<'s n1r·,112 •• • 1111 fi&Uu+J surh t.hat r;; is au Pdge in G hetwc•<•n 1'; and lli+t, c,

il.llll t:r+l it.J'(' distirwf. c•tlges ill n, IJ) = It, and l'n+l = v; directed paths are defined

similar·ly. i\ graph is mnurrlrrl if t.ht'fl! is a path between each pair of vertices

i11 !.Ill' .u;raplr; if thc•r<• is au t•clgc! hc•tw<'<'ll eac:lr pair of vertic<.'s in the graph , the

.u;raplr is t·omplt·h. If all <·dg<'s in a graph lie on a singlr pat.h het.wC'en a pair of

v<•r1.irl's, t.IH' graph is /inmr. 1\ hypr·qfl'tlph II= (\1, E) is it set. V of vertices and

a sPt. /~'of lryp<'f'('dg<·s surh that <'itrh lrype>redgc links a group instead of a pair of

VC'r1.irc•s. A lrypc•r('dgc• whos<' Vl'r1.<•x-sd has bren partitioned into disjoint target

awl sour<·c· wrl.<·x-sd.s is c:all<'d a hypr:rarl'; a hypergraph composed of hypcrarcs is

a tlin·l'l,.,/ hyp,.r:qmph. 1\ hypr:I'Jmlh between V<'rtices u and v in a hypcrgraph His

a. s<'<jii<'IICP of ;dt.PI'tHtl.ing V<'l't.ic<'s and hypercdgcs Vt e1 V2 .•• VnC71 1'11+t such that e;

is a ltyp<·r·c·dg<· in //linking .,i and ''i+h c; and c;+1 are distinct hyperedgcs in 11,

,,, = 11, a.IICI 1'11+1 = 11; clirC'd<'d hypC'I'paths arc defined similarly. If a graph has

a fuudion assoriat. i ug numbers (i.e. weights) with its vertices (edges), the graph

is rall<·d a m·rh·.r- (rtl!Jr-} wrighlrd gmplr; otherwise, it is an unwcighlcd gr·aph.

Wc·igllt.c·cl aud llllW<'ight<'d hypcrgraphs arc defined similarly. Hypergraphs arc

usdul ill simplifying and g£>m•ra.lizing results from graph theory, especially those

n·sult.s dealing with combinatorial probl<>ms [Bcr7:J, p. viii). For other standard

p;raph and hypc•rgraph dPfinitious, s<'t' [Bcr73, Ber85); thl' definition of hyperarc

is from [A DS~H].

A pat.h from any \'Prll•x to itself is called a cycle. A graph that does not

8

Page 27: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

' • '' ~· ~.. • - · ••- .- ' .,.. ,, ·- -~ ' •' '' '' ,_,.... ·-·•'"H • I( .. , .,._.~ , ,.,.....,- ~, "'.e."' • · .. ·>,._,._,,..,.,...,..,., -'II••• ,"'1: "C ... ...,..,.

(a}

(h) (f)

(c.)

(d) (h)

Figure I: Graphs and hypNgraphs: (a) graph; (h) dirc•d•!d graph; (•·) •lin·ct•·d acyclic graph; (d) direc:tcd t.n•e; (e) hyp(•rgraplt; (f) din•d.1•d IJyju•rgrHph; (~)

directed Berge acyclic hypergraph; (h) directl'd hyp(!rl.n·•··

!J

Page 28: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

nmt.aiu ally ryciPs is (/.f'yrlif'. A11 ac:yrlk c:onner.ted graph is called a tree. Directed

cydf•:; mul din~c:t.Pd ;u:yc:lk AI'Hphs are rlc•finecl similarly. Define a dircclcd tree as

il dirf'f'I.Pd rtrydi<· graph t.hitl satisfies thre1~ additioual restrictions:

I. t.lu • r·~· i:; a. dist.irrp;Hi:;lwd verf.<•x cai!P.d llu~ mol,

~. tiH'I'f' j:; at. ]Past on<· dir·<•f·t<·d path from t.hc root to every vertex in t.he tree,

and

a. tllf' root. nt~~rrol. IH' tlw targ('t, of any arc aud every other vertex is the target

of <'Xilcl.)y Oil<~ ii.I'C'.

A h,YJH'f'llill.h from any vNtex to it.s<'lf is rall<•d a Berge cycle. A hypergraph that

dol's uol. rorrt.a.i 11 <lilY B<•rge ryc.!Ps is /Jrrgr acyclic. Di rectcd Berge cycles and

clir·t'I'I.Pd B1•rg<· aryf'!k hypt•rgraplrs HI'<' d<•fiued similarly. Unlike graphs, there

art• many l.yp<·s of ary<·lirit.y for lryp<·rgraphs [Duk85] which are based on Berge

ryr!Ps that. satisfy additional r<'st.rirt.ions; however, Berge acyclicity implies each

of t.ht'Ht' ot.lwr t.ypPs of acyrlirit.y [FagHa, Tll<'orem 6.1]. Define a directed hypcrlrcc

as a clirt•<·tt'd Bt•rgt•·aeyclic lryp<'rgraph that satisfies four additional restrictions:

I. t.llt'rt' is a dist.inguislu•d wrtt•x callc•d the roof,

:!. t.lwrt' is at. l<'ilst Ollt' dirl'cted llyp(•rpath from the root to every vertex in the

hypt•rl.n't',

:t. t.IU' root CC\IIIIOI IH' in t.ht• targt•t-s<.•t of any hyperarc and every other vertex

is in tlw targPI·st•t. of t•xartly ont' hypt•r·arc, and

10

.. · ~

·'

I ~ ·,

.,

Page 29: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

? . '•• ~· l '

lo ol

lo ol

(a) (h) (1')

Figme 2: Types of hypcrarn;: (a) 2-hypt'I'Ht'<'s; (h) :J-hyiH'I'Cl.ITs; (r) ·1-hypt•rarrs.

4. there arc only three typ('S of hypc•rar'C's in till' hypt•rl.l't't' (st•t• FiAIII'c' ~):

(i) one source vertex and Oil<~ targ<•t vc~rt.ex (!'1-hyfJf'l'lLI't"),

(ii) two source vertices and one target vertex ( ."1-hypt·ran·), o1·

(iii) two source vertices and two t.argd verl.i('( ~S ( 4-hypr:/'ll:/'f'),

Note that the correspondence of arcs to 2-hyperan~s lltakc•s din•dc•d l.rt'<'s spt•r·ial

cases of directed hypcrtrees.

Trees are used in evolutionary biology to r<~pn·scmt. cwolut.iou;u·y rda1 io11ships

between species. In the biological literature, dircdc•d t.n•c•s a.re nt.llc~d mo/('([ lnu;

and undirected trees arc called 1mroolr:d lrr.cs. I 11 evolutionary tn·es, c~dw~s a I'P i 11-

terprcted as species undcrgoi ng cvolu t.iouary ell anp;<! ( lim:fl!fr·s), vc~rt.it:Ps a1·c~ i11t.m·-

preted as speciation events in which II<!W spec:ies are generated, aud t.)u! root V<!ri.<!X

is interpreted as the most recent common aucest.or of the spec:if~s bdug studic!tl.

All types of trees give an estimate of the! pat.t.ern of spedatiou t!Vt!rtl.s; lun·mvt•r,

I I

Page 30: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

ouly dirt•r·l.t!d t.rc·t~s hypothesi;.:e the rlirc~ctiou in which evolutionary change has

prot·t•t•dt·d. Many of t.he trees in this th<'sis will bf' <>dgc-wcighted trees, in which

NH'h 1•dgr~'s WPip;ht. is iut.erpn•ted as tltl' amolllll. of evolutionary change undergone

J,y tJu• SJH'f:il's r·or!'I'Spon<Jing to that C'dg<~.

Tlw n•str·ictions ;dmve on din~ded t.rec•s ami hypC'rtrees guarantee the fol­

lowing J,iologira.lly ll'~<<•ssary J>I'OJlf't'l.ies: (I) no species can give rise to one of its

illlf'<'sl.ors, and (~) Pc1ch specics arisf's from l'Xactly one speciation event. All t.ypcs

of l.n·,·s C"an rt•pn•s<•nt. dichotomous speciation events; directed hypertrecs can also

n•pn•st•ut two mon• romplex <·volutionary events using their 3- and 4-hyperarcs

hyhl'idi~at.ion {t.lu· cn•a.t.iou of a third entity from two j>f\.rent entities) and

n·t·urnhirliltior~ (an a.ltPriug of one or both of two entities). Such events involving

t.lw n<•at.ion of t.wo or more paths lwtwccn pairs of vertices in a tree are called

rdir·ulation.o;, and t.re<'S iucorporating these events arc said to be reticulal.e. Retic.­

ulatiou as t!Piirwd hl'l'<' is npplirable not only to problems involving hybridization

mrd int.rogr·t•ssion as defined in evolutionary biology [F'un85, StaC75], but also

f.o prohh•ms iu\'olving horizontal g<'tle transfer (Sne75], multi-allele rccombina­

l.iou 1'\'t'ttt.s [Ilt'WO], and f.mnsmission of copying errors in medieval manuscripts

[ Lt't•l'\1'\]. St•t• A p)wtulix A for further discussion of reticulation.

Noft• that in 1.111' hiological literature, graph-theoretic trees are often given

dill't·n·Ht twnu•s tl<•pt•nding on what they represent and the methods by which

t ltt•y \\'t'!'l' dt•riwd i.e. phylogram. dendrogram, cladogram. and that a single tree

12

Page 31: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

· ~ ''"" , ... ... .... ... . - · ·· · ...... '.. ~.~,,_ ... .. _.~ ... ...... _ ,_..,. .~ ..... ,,.. ... ,a,,...,...-..... , .. , ... . ... ........ ..... ............ ~ ... ........ ( ..... -..-...--.~ ..

may on o...:casion imply H wholr. rlass of trt'es [IIP~·I].

2.2 Computational Co1nplexity Theory

For a more iu-depth trt>at.nwnt. of comp11lational rotnplt•xit,\' t lu•ory, sC'c' [BJH ;:-;~.

BDG90, G.J79 , HSiS, .JohHO, \V\\'8()]. Biologists will lillll [DayH:.!j a p;ood inl.ru-

duction to certain topics in this sPdion.

There arr. many types of formal romput.at.iotwl prohlt•tns <'·A· dt•cisiou, <'\'al­

uation, ronnli ng. ThesP prohlf'lllS can lw 1111 i fit•d using t.lu• franwwork c lt•vPiopc•d

in (WW86, pp. 100-101] cf. [.JVVHH]. ))<'lin<· a rc•lat.ion U : ~· x ~· ou pain.;

of objects e.g. (boolean formulas) x (t.rnt.h assignnu•nt.s t.o hooiPall val'iahl«•:-.);

(graphs) x (cliques). F'ormal r.omput.at.ional pmhlc•ms c;u1 IH' vi«'W<•d HS fnnl'l.iotls

defined on the projection of R ont.o a gi wn denwut. .r.

• Decision Ptoblcm (PilOIJ):

For some hoolcan-valucd predicate r: ddirwd Oil ~-'

PROB(:r.) = 3y [(x,y) E R 1\ G(y)].

• Solution Problem (SOL-PROIJ):

For some hoolcan-valued predicate G defined ott ~·,

SOL-PROB(x) = { y I (x,y) Ell 1\ O(y)}.

If relation R has au associated valuaticm fuudiou b : ll -+ N, li. r:om~sporuls

intuitively to an optimization problem. Defiue tire followiug JHo!JI< ~IIIS em sudr If..

Page 32: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• (,'inr:rH·osl Solution l)roblr:m (SOL- VA L.BQ-PROJJ):

SOL-VAL.EQ-PROB((:r,/,:)) = { y I (x,y) E It 1\ b(:t:,y) = k}

• (,'i1)(:u-limil Solution /Jroblr·m. (SOL-VAL.LE-PilOB, SOL-VAL.GE-PROB):

S 0 L-V A L. L E- P HOB ( ( :r, ~:) ) = { 11 I (a:, y) E R 1\ b( .1: , y) ~ k}

SOL-VAL.C:E-PH.OB((;t:,k)) = {]/I (;r;,y) E R 1\ b(:r.,y) 2= k}

• Oplimal-r·o.o;l Bvalual.ion Problr:m (M/N-PR0/3, MAX-PROB):

MIN-PitOB(:r.) = miu b(:t:,y) (:r,y)EH

MAX-PHOB(;r.) = max b( .t:, y) (:r,u)EH

• Oplimal-r·o.o;f Solution Problem (SOL-X-PROB, X E {MIN, MAX}):

SOL-X-J>HOB(.r.) = { Yl (:r.,y) E R 1\ b(x,y) = X-PROR(x)}

'l'ltJ'('(' otltl~J· t.ypt•s of problems may be defined on the ranges of Y, Y E {SOL-

PHOB,SOL-X-PROB} (X E {MIN, MAX, VAL.EQ, VAL.LE, VAL.GE}).

• SpatnthttJ Problrm (SPA N-Y):

SPAN-Y(.t·) = I{Y(x)}l.

• /iaudom-Srlrdion Pt'Oblfm (llAND-Y ):

HAND-Y(.r) = !f, wltl'I'P !I is a randomly-selected member of {Y(:c)}.

• Eu~tmrrtiiio11 Problem (ENUM-Y ):

ENU~l-Y(.I', i) = y, where y is tiH• i-th member of {Y(:r)} under some

pulynomial-t.inw or(h•ring P.

l ·l

;J.

·' !'

j 'I ~

j '3

' . ' i .l

.l

Page 33: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.. ... - ,.-, ..... _ . ' •••• ' ... ~ · ·-' . .. ,, -· · ·· ···· ....... .... ...... ~·~ ...... ~ ...... .. ·· · - ·· · ..... .... . __ .... . ... ~ooo ............... .,.,..,.....

Each of these problems corresponds to a function. A dt•cision pwhlt•tll also ror­

respouds to the language compos<•d of t.h<• snh:·wt. of its instann·s wlwst' solution

is "yes:'. A problem X is said to he solwd by an algorithlll if fur any input . . 1· ,

that algorithm compntl's a singh• vahw from {.{(.,.)} for t.ht• fund ion f t•mhoclit·cl

in that problem. Let. X, denote t.lw sl't of sinp;IP-valtt<'d functions t'ol'l'l'!'iJltllltliup;

to algorithms that solve prohiPm X. Inputs t.o a prohlt•nt will ht· rallt•d ill.o.;ltll/l'f',o;

and outputs will be called .'iolulions.

Define deterministic. Turiug machin<•s (DTM), noJHIPt<'l'lllinist.k Tl\1 (NTM),

and deterministic aud noudetcnninist.k orad1• TM (DOTM, NO'l'M) that. n•r­

ognizc languages ( arccplors) and compu1.P functions ( lmu:Hiw·,.,.,.,) iu t.lw st.au­

dard manner [BDG88, G.J79). A DTM t.ransduC'1~r N comput<·s y oil iuput .1~

(N(x) --+ y) if y is the fiual r.out.cnt.~> of N's out.put. tap1· for t.lw computat.iort

of N on x. A NTM transducer N computes y ou iuput ;t (N(;1~) 1--+ y) if t.hNI'

is an accepting computation of N ou ;r: such t.hat. y is t.hP final cont.c~ut.s of N's

output tape [Sel91, p. !l). DTM trausducers Wlllpllt.e sin,t!;l1·-val111•d functions,

and NTM transducers compute partial multivalued fundio11s. A11y i11put. to it

TM transducer that does not have an acr.epti11g computation coJilpllt.(~S syutlllll

1.. An OTM that forces all queries to be made simult.<uwously is 11.on-rulttpl.i11l',

while an OTM that allows queries to he made 011 t.he basis to answ1•rs t.o previous

<1ueries is adaptive.

The computational resources used by an algorithm t.o solve au iust.aru:e of it

11

Page 34: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

prol,lmll Cllll lw visualized as the computat.ional resources used by the TM which

t'tH'rt~sponds t.o that algorithm. Let 1'A(;r,) he the amount of rcsourcP- R used hy

;dgorit.lnn A o11 iuput :r.. For a fuuctiou f: .N--+ .Nand a computational resource

/(., an algorithm A is f-ll (f-R compulablr.) e.g. polynomial-time, polynomial­

t.itw~ ('omputahl<~, ir 1'A(I:r.!) E O(J). A problem is f-R ir there is an f-R algorithm

t.hat. solvPs that prohhmt.

Prohlc~ms can he group<'d into complexity classes based on bounds on compu­

l.al.iona.l t'l'sout·ces required to solve those problems e.g. DTIM E(]Joly ), which is

t.llf' sd of all prohlems solvable hy polynomial-time DTM. Some standard com­

piPxity class<•:-; for decision problems arc:

A II decision problems solvable by polynomial-time

DTM.

NP All decision problems solvable by polynomial-time

NTM.

PSI' ACE All decision problems solvable by polynomial-space

DTM.

EXPTIM E All decision problems solvable by exponential-time

DTM.

lt. is known that P ~ N P ~ PS PACE~ EX PTI ME, and that P c EXPTIME

(BDGHS, Proposition 3.1). Several classes of complexity between P and NP are:

16

Page 35: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

' !' 1 .. ... - ,p '" '- ........ - ........... , . ..... - .. . .. ' , ,, .. - · •. .. - ... ..... - .. .. . .. ... . ·- - · . .. .... ..... ,o~ ...... . ...... , "''':~ ·~-·~ .... -~, ... ...... ~ •• , .. -voow\• • .. ~.o••· ... - ... t .. · r'~, .l."""'f'"

UP All dt•cision prohl<•ms sul\'ahl<' by poi,\'IIOIIlial-t.imt'

NTI\·1 such that for <'ach input .• t.lwn· 1s at. most. um•

accepting computation.

FewP All decision prohlc•n1s solvahlt• hy polynomia.l-t.imc•

NTM such that for <'ach input I ami a fixc·d polyllolllial

p, there are at. most p( Ill) <H'Ct•pt.inJ); t'Oill p11 I. a t.ions.

R All decision prohl<'lllS solva.hl<• hy polynomia.l-t.illll'

NTM such that for each iuput. I, <'it.h<•r t.lu•rt• a.r·c• uo

accepting computat.io11s or <•Is<' at. 1<-asl. lwlf of a.ll <'0111-

putations arc acc('pt.ing.

Many problems are of complexity hP-twc•en NP aud PSI'ACE, aucl art' wit.hiu t.hP

levels of the Polynomial Hierarchy [MS72]:

ep - A p - ~p - fJ o - uo - .uo -

ep _ pB~(O(Iogu)] k+J -

PH= U ~~ k=l

17

Page 36: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

wlu•n! fJY ( N fJY) is t.lw class of problems solvable by polynomial-time DOTM

( NOTM) that r.au usc! any orad<~ i11 dass Y, a11cl p)'[/(n)J (P1r[f(nlJ) is the class

of prohlc!flls solvable hy polynomial-tinw adaptive (non-adaptive) DOTM that

C/111 a:;k up t.o f(n) queries to au oracle in class Y. Levels ~L E~, and n~

W«'f't! ddirwd in [MS72], and lev<'l (-)~ was defined in [Wagl\88]. It is known that

<->t ~ ~~ ~ }:;r U lit ~ pl:rfl] ~ (-)r+r, that. P II ~ P8 PACE, and that if for some

~:, lit = 2:t t.lt<•n PH = ~~ (St.o77, WagK!JO, Wra77]. Two working hypotheses in

r·o111plexit.y t.h<'ory are that P =/= N P and that PH does not collapse to any finite

ll'vc•l.

Many of the! language r.omplcxity. classes above can be restated as classes of

siup;lc•-valuPd funr.tions. Let FX denote the class of funct.ions computed by TM

us<•d t.o dc•liru• language class X e.g. FNP, Ftl~, FPSPACE. Define FPSPACE(poly)

a.s those• fuudions in FPSPACE whose outputs are polynomially bounded in

t.llt' lt•ngth of tlu· input, and FPII = Uk=l F~~. It is known that ~~ c Ftl~,

ft' /)l;r!J(n)J ~ F p~n/(n)+l), F' pl;r(J(n)J ¢.. F pl:f+J(f(n)-t] uuless P = NP, and

F/'l;r+.l11 ¢.. FP~tlf(n)J unless F'P/1 = pp'Er (Gas92, Krc!J2b]. The relationship~;

within aud lu'l.WPPII rla.sst's in F'PH have been established only at the lowest levels

( l'<'C' St•ct.ion '1. 1.1 ); thos<' relationships known to date suggest that classes in FPH

hl'lww V<'I'Y dilft•rently from their analogues in PH [Gas92]. Classes of multi valued

functions an• al:-~o possibl<•. There are many restrictions of polynomial-time NTM

t.mnsducers that g<•ttt.•rcltt• sttrh classes [Scl91); one such restriction is f'.q, the st;b-

18

Page 37: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

;.

r.··· ~

f ...

. . ,.

.,

' .

set of functions fin clal'l'i F snrh that. graph(.()= {(.t·,y) l.f(.r) .._. y} E /' i.t', out-

puts can be checkt•d in polynomial t.inw [SPWI. Val7ti]. llt•lint• FI\IPII = Uk=l F~~:

are called NPI\IV, N PMVq, and N /'i\1\'~r-• in [FIIOS!l2, St·l!ll], <md NT~I in

F'N P_q which compute total functions an• rallt•d NP uwt.rir '1'1\1 in [1\rd\KJ. Funt'-

tions in F N P9 compute.• tht• solutions <t:-;sociat.<•d with dt•c·ision prohlc•ms iu Nl'.

For further discussion ahout. t11esl' mHI ol.h<'r fuuct.iou dass<'s, S<'t' St•C'I.ion ·1. 1.

A reduction n ex ri' is an .-~.lgorithm that. solws prohlt•m II hy mdn!!; an al-

gorithm for problem 11'. Hedud.ions cml<•r prohl<•ms hy mmput.al.ional lwrdnt•:-;s,

The two main typt's of r<>dttclions hd.Wt't'll dt•cision prohlt•ms art•:

• many-one ($~J: A $~1 8 ir t.lwrt• is a polyuomia.l-t.inu· function f sur·h

that x E A if and only if f( ;r.) E /3 .

• Turing (::=;~): A ::=;~ B if there is a polyuomial-t.inu~ fundiou using IJ as

an oracle that determines if ;1: E A.

A generalization of many-one rcducibilit.y l'allt•d rtt.dr·ir~ 7Trludbilily holds hd.WI'<'II

single-valued functions.

Definition 1 (adapted from [Kre88], p . 493) /,r:l .f •. fJ: ~· --. ~·. A mdric:

rcductiou from f to g is a pai1' of polynomia/-timr: fundion.c; ('I'll '/~) , whrTr: '!', :

E* ~ E• and T2 : E* x E* ~ ).:•, such that f(x) = '11(x,g('/'!(;r))) fo1' all ;r. E 1;• ,

The following variant holds hetwPen problmns.

I !J

Page 38: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Definition 2 Lr:l II n:·.i If' IJf: 7n·ob/(;7n!i and SOL-X (/) be lhc set of solutions

u.o;.w)('ialr:rl wilh iu.<>lrmr:r: I of pl'obh·m X. A metric reduction from n to ll' is a

pair ofpol!Jn.mn.ial-limf· funrfirms (T~t 72). when· T1 : I--+ I' andT2: I x 8'-+ S,

,o;rLr·lt /hal for uny,r;iu!Jir-valurd f7Lndion f /hal sol11c.c; 11', T2(J, f(T,(l))) E SOL­

II(/) for any ill.';/rmu I of II.

This n·durihilit.y i.s a r«!.st.rictPd V<'rsiou of the Tming redur.ibility between partial

mult.iv;du(•d fund.iou.s defitwd in [FIIOS92). Note that the defini tions of these

nu•tric n•(lucihilitiPs are equivalent for problems that are single-valued. Another

rc·lation ca.llc~d rdiuenwnt can also hold between multivalued functions. Given

mult.ivahH'cl functions f aud g, g is a refinement off if dom(/) = dom(g) and

for all .1· E rlmu(.rJ) and all y, if .Q(:r:) ..-. y then .f(:r) H y. These relations

rnn also hold lwtwe('IJ whole classes of functions. For instance, if F and G are

two cbtssc•s of partial nmltivaltu.•d functions, then F ~c G if every f E F has a

rdi ttt'llll'lll. in (,' [S<'I!J I, p. '1]. Roth inclusion and refinement relations can hold

lwtWt'<'ll mult.ivilltJl'd function classes, and single-valued classes can be included in

mult.ivahu•d class(~S (ind<'('d, this is equivalent to refinement); however, only the

singlc•-vahwd I'PrinPment. relation ran hold b~tween multivalued function classes

and singi<'-Vilht('d rlassc•s.

Givc•n t.wo problems ;r. andy and a reducibility r, x andy are computationally

rquir•alrnl if .r 7'-redures to y and y 7'-reduces to x. Given a class of problems

.\" and a n•dudhilit.y r. a prohlc•m ,11 is said to be X -hard if each problem in X

20

Page 39: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

· I

'

e ,.

f: !.'

" .,., •. , -.,rr ... ...,..,~,_.... •• .,. r.-• ~·· • . .... ~ . ~-~ ~· , ,.,., " <• ••..., .. . ~ .. ""'10' ............ ........ - ,,.. ... ,.,.,...._, ... , ... . ~ ..., , , _ _.,_,.•••u.n _,.,....._,.. ,....,... ____ ... , ._.,. • . _, ___ ,,....

r·-rcduces toy. If ,11 is X-hnnl aJHI is also iu .\'. ,11 is X-t·omplt ·/1': if !I is X-hml

and is not in X, !J is pmprrly X -hart!. Two limited t~'JH's of l't•dut'l.ions al't•:

• at'ilhmrlir-rqrtirmlrnrr r'ffludioll,'i: Prohlt•ms II and II' tlilft·r only in t lwi r

cost-functions bn and /Jn, , a.11<l tiH'rt' t'Xist.s a pair of pol,\'Homial- tinu• func·-

tions (71, 72) such that for all iustann·s ,,. of II ancl II', bn( .r) = '/'1(btJ'( .r)}

• rcsf.riclion redurlious: Prohh•ms II ami II' dilrt•r only irr t.h;ll. dom( 11) C

dom(ri') i.e. ri is a. suhprohi<·m of II'.

By definition, rcstrid.ion and arithnH't.ir-t•quivalt·ru.·t• rt•ductious arc· 1111111Y-oi1C' and

metric reductions.

Though some of the problems disr.ussc•d in t.his t.IU'sis arc• must. ua.l.urally de•-

fined on n+' all problems will be rest.ric.t.C'd t.o Q+. Ht•alniiiiii)(•J's in p;<'IIC'I'Cl.l ('an-

not be used because irrational numlwrs t•.g . .fi., cannot. IH~ n•pr<•sc•Jit<'d within

a computer whose running time is houndc~d l>y a fuuct.io11 of t. hc~ IPugt.h of if.H

input. All irrational numbers that arise in c:akulations 11111st. also lu~ dimiuat.c•cl

or approximated e.g. vx--. r JX 1· A case study iu how a r·~~~d- u'u"hc·r· prohl«'nr

is modified to be computable is given for the Euclid<~au Mirrirnal St.drH'I' 'l'n•c•

problem in [GG.J77]. The lower bounds .giveu IJy such Jnotlili1~d pro!Jic ~ms <HJ tlw

complexity of the actual problems is the best that call he clmu! wi thiu t:olllpll·

tational complexity theory as it currently exists. lloweV<!r, t.lwre may lw other

options [BSSS~J , Ko91].

21

Page 40: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Though prol,bns will he defined on Q+ for the convenience of readers, all

probh•IJJS will adually operate 011 N. This is easily done by multiplying out t.he

rational d<'llolllirtat.ors i.(', ~~ ~ --t {'I, !J} + 12. Thus, the bit-representation length

of IIIIIIIIH'I'S will IH· proportional to their value. This cnsmes that the length of

c·c·rt.aiu snwllrat.ionaiJIIJI!llwrs will not exn•(•d that of larger numbers (e.g. though

U < l:l, ll:JI + 11 '11 > IJ:il; bmVC"ver, IJ:ll < IJ:l· 141). This property is necessary

iu S<'V<'I'HI proofs in S<'dions :J.2.1 and :J.2.:1.

22

Page 41: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

•''"' ... ' " ""•'•'" ~•• I P •• , , ,. • ooo r o, • .. ~ , , "' o-.r•O , , o ~ ¥' , ,_, , • • • ..- • • • - ... . , , ,.., , •o• <• --~ r-• ,o, ...... .... ~h• •"' ~"· '-o o<...t U•O - ' ~"• <• '•o • oi OO,'~ " , ...... .J

3 Computational Problems in Phylogenetic Sys­

tematics

This section hC'gins with All 0\'('1'\'h•\\' of \'ariou~ coun•pts iu phylogt'lll't.ic" sys­

tematics. This is foilowC'd hy a n·vii'W of n•rt.niu dt•cisiou prohkms a:-;sucialt•cl

with phylogenetic. analysis using t.IH' phylop;t'llt't.k p«~rsimou,v, <'IHirac·lt•r l'orn­

patibility, and various of t.hc distaJH'l' mat.rix fit.t.inp; nitc•l'ia, and a l't•vit•\\' of

the f' mctions hy which thcs<' problems havt• lwt•n shown l.o ht• N P-ronaplt•t.c•

[Da.y8:J, Day8i, D.JS86, DS86, DS8i, l\riH8, 1\MHH]. This :·wct.ion also includt•s

definitions and rc>ductions for several new phylop;('JII'I.k p11rsimony prohii'IIIS t.llllt.

allow limited amounts of rcticulat.ion , as Wl'll as a IH'W n·dul'l.iou fo1· tht• i\ddit.ivt•

Evolutionary Tree problem [Day8:J] .

3.1 Phylogenetic Systematics

Systematics is the subdiscipline of biology that deals wit.h onh·riup; SfH't'if's into

set.s of groups (systems) according f.o various kinds of rdat.ion:-~l1ips lwt.wc•c•11

species (e.g. ecological roles, geogmphical proximity, owrall similarity) [i\xH7,

Hen66J. Phylogenetic systematics is in tum the :-;uhdisdpliue of biologkal sysl.<!lll­

atics couccrned with ordering species hased on their (!Volut.ionary rdationships;

specificdlly, species are grouped together hy descent from a com moll a!lt:r!sl.or, ;uul

these groups are nested hierarchically to make an evolut.icmary l.r< ~e. The pro(:<!ss

Page 42: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

of n•r·on~l.ruc:ting r!volut.iouary t.r«•r!~ is r.allr•rl phylogcnclic analysis (phylogenetic

iufr't·r·uf:r:), anrl I hr! evolut.iouary trees so reconstructed arc called phylogenies.

Tlw units t.hat. are orrlererl in phylogenetic analysis are called taxa. Two types

of data. llr<' t.ypi('ally availahle t.o rr•c.or1st.ruct evolutionary relationships among

taxa:

• Discrete Character Matrix: Tlw data arc an m-by-d matrix giving the

valuPs possessed hy each of a set of m taxa for each of a set, of d charac-

f.«'rist.ks. Tlt<•sr~ dmrac:terist.ic.s are called characlet·s and their values are

t'hm·adr'l' s/alr·s For example, a c.haract.cr flower colour might have char-

al'l.r•t· st.H.I.r~s blue, yellow, and red. Character-states are grouped into

rlmmdr•rs hy the relation of homology [Ax87, ECSO, Wil81}.1 The vector

of C'harader-st.atcs over all taxa for a particular character is a character

pallr:rn, and the vector of character-states over all characters for a partie-

ulat· taxon is a. character disl1'ibution. If a character has only two states, it

is binary; r•lse, it is unconstrained ( m.ultistatc). If a character has a graph

1 1/muolog,l/ is t.hc relation among different structures in different species that evolved from a wmmonnJH'«'st.ral stwcirs (e.g. the character mnnuualiau fore-limb that has states arm (1m­man lll'ings, ap«'s), foreleg (dogs, horses, tigers), wings (bats), and flippers (dolphins, whales)). Tht•rt• art• o!.ht•r kinds of rrlat.ion~ alllong observed character-states, such as analogy, the rela­tion of similar st.rurl.tJrt•s in dilferrnt species that have arisen independently in several ancest;al spL•rit•s (t•.g. t.ht• chnractcr wings t.hat groups together the wings of insects, birds, l\nd bats). All rhararh•r-~t.al.t• relations givr evolutionary information of some sort; however, only homol­ogy dt•limit.s groups of sprdrs sharing a common ancestor, and thus only characters formed by homology arr us£'ful in rrC'Onstructing evolutionary trees.

In t.hr ra .. c;e of molrcular sequences, homology can hold among different sequences from dif­ft•rt•nt. spl•dt•s, as wdl a.<~ bet.wccn different positions in different sequences; indeed, the problem of l'stahlishing srqut•llr!'-position homology is that of deriving a sequence alignment [5090, pp. 4 IG-·11 i]. Not.t• t.hat. in molt•cular biology, t.he term "homology" is also a synonym for sequence similarity (1\lll!lO, pp. i - !J] .

24

Page 43: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.... __ .,_ ,....,,, ... ""'"'' ' ' ' .......... ' ... _,,,,. • ...,_...,.~,,~,.. ._..,. •-·--·-- "'' • "'· •• ,." ··-~·-·"'~..--~ N.1"" ..... ••\ f"'-4fS:n"'lf\.#("r.-.;<.e•t•l-4- "'--·-·---

·'· f.:

imposed on its states who~;t' t•tlgt•s spl·rify tht• allnwahh· rhml~t·s from om•

state to anoth£'r, the charad.t•r is ot·drr·nl; t•h;t•, t.lw <'hara<'t.t'l' is Ulllll'lhrn/.

.• j If the edges of an ord£'rcd charart<'l' ':; graph an• dired.l'd, t.ht• rhill'act.~·•· is

pola1'i=cd; else, it is 1Wpola1'i=cd. H thP edgl's of au ordt•rt•d rlwrart.t•r's graph

have weights, the character is u•ri,qhlrd; dsP, it. is unwri!lhlnl. Onl<'l'l'd char-

acters are typically baHed 011 JiJH'ar, compll't.<', or t.n•t• ~raphs. lu polarizc•cl

characters, if state X is the somn.~ of a dir<'cl.t•d path t.o st.at.t• \", X is tw-

cestml to Y and Y is derived n•la.t.i ve to X. Tlu~ sl.al.t• t. hat. is iiii<'Pst.ral t.u a.ll

character-states in a polarizl•cl c.hm·adt•J' is t.lte «UJrPst.ral sl.at.t• for that. dwr-

acter. By convention, the aucest.ral ami dcri wtl statt.~s i 11 polmi:t.c•tl hi 11 a.ry

characters are writtcu as 0 and J.

• Distance Matrix: The data are all m-hy-m matrix ~ivi11g a llll'iiSIII'I'

of dissimilarity or similarity bctwc•(.m c~ac.h pair of t.a.xa i11 a. s!'l. ,c.,• of .,,

taxa. Tlte terms "similarity" aud "dissirnilarit.y" dc~11otc• quautiti<•s f hal.

a.l'e precisely defined and i11versdy related; wlr<!ll l'igor is not. r·c·quin·d or

specified, both will he denoted hy the term "<list.ruu:e" (SO!JO, p. ~~:~J. Ld

Mn be the set of non-negative rat.icmal real-valued mat.ri<:<!S 011 11. taxa, ;uul

Bn C Mn be ~he set of all matrices whose~ off-diap;oual vahH!S ilrt! iu {I, 4}.

Call members of Bn aud Mn binary and 1tncon.o;lminr:tl matrices I'<!SJIC!divdy,

by analogy with discrete charadcrs. Let. X.o;' he matrix X 011 H rt!sl.rir:l.<!d

to 8' c S. Every matrix represents a dist.auce fuud.iorr tl: .'{~ -+ 'R, w!Jic:il

25

Page 44: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

rnay satisfy some subsd of tlw following properties.

:l. V:r:,y E S, d(;r:,y) = 0 implies x = y.

:1. \1:1:, y E 8, d( ;r., y) = d(y, ;1:).

•1. V.r, y,:: E S, tl( ;1:, !J) ~ d(:r:, :) + d(::, y).

r, , V.r,y,:; E ,c,·, d(;l',!J) ~ max[d(;l:,::),d(::,y)).

fi. V.r·,y, ::,wE S',

d(,,., y) + d(::, w) ::; max[d(:r.,::) + d(y , w ), (l( ;r, w) + d(y, z )].

Conditions (•I), (5), and (6) arc known as the triangle, ull.ramciric, and

adtlili1•r irH·qualilics, rl'SJH•ctivcly. A function that satisfies conditions ( 1 ),

(:~),and (:q is a .,;;rmi.mcl7'ic; if condition (4) i,; also satisfied, the function is a

mf'lrir·. Mdrirs that '3atisfy conditions (.5) and (6) are known as ultramcl 1'ics

aud lrrr mrlrirw, respectively (sec Figure :3). The m:!!'her of distinct off­

diap;onal values iu an ultramctric is the height of that ultrarnetric, Tree

!llt'l.rics and ultranwtrirs can bl' represented as trees; let Un (An) be the

~t'l. of cllllllt.ramdric (additive) trees on 11 taxa, Un ,q C U11 be the set of all

ult ramd.ric tr('l's on 11 t.axa of height at most q, 1 ~ q ~ n(n - 1 )/2, and

Irfr : Uu - M11 (7r..t : A11 - 1\Jn) be thr function that maps an ul trametric

(<uhlit.iw) lrt'<' outo its ultramctl'ic (tree metric). In this thcc;is, A 11 will be

n·st.rirt<•d t.o A;~ ( di ... crcli::rd addilillc trees) whose edg~s have length k/2, k >

26

Page 45: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

............. .... ' ... , . . .... ~ .. .. . .. .. '. ~ - ..... ... . , .. .... ~ . ............ _.""'

2 ~~ ·I 5

2 :J 0 25 r;o :m r;o n lfi I:~ It' I:'

1 0 5 :3 2 2.1 0 r;o :to r;o •) 1.1 0 .j !I ~

2 5 0 " 3 .10 50 0 fiO 15 :J I:J ·I 0 - (i I

:3 3 4 0 'I :Jo :10 iiO 0 !)() ·I I~ !I I () II !) flO fiO Iii !iO 0 !i I:' ~ (i II ()

60 . .. ... . . . .

50 .. .

·10 . . . . . . ' .......

5 3 a ;,

:JO . .. .

20 . .

lO . . .

0 .. . . ' .. . ,, 2 ,, :J !) 2

2

(a) (h) (c)

Figure 3: Types of metrics (tak(m from [DayHH]): (a.) a. nwt.ric· and a possihlt· representation in the 2-D Euclidean plane; (h) an ult.ramdric: and it.s assol'ial.t·d ultrametric tree; (c) a tree metric: and its assoc:iatPd aclditivn t.rP<'.

0; note that ultrmnetrics drawn fro111 A;~ h<LV<~ ini.<',!!;('J' olf-rliap;onal (•nt.ri(•s.

Ultrametric trees arc by dcflnitiou rooted whilr! ;ulclit.iw t.rc~(·s <'<1.11 ),(~ (•it.ht•r·

rooted or unrooted. Ultramet.rie trees correspond l.o root.(•d <uldit.ivr~ l.l'f'('S

in which each leaf is the same dist.auc:e from t.he root..

Discrete character matrices arc generated hy examining the taxa of iul.(!r<!sl.. Dis-

tancc matrices are generated directly via. certain l.ecltni<pws (i.e•. imrn•rr•olop;kal

assay, DNA - DNA hyhridizatiou) or deriw!d from discrete eharad(~l' rnat.rkc~s

'27

Page 46: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

by applying a distar1ce f11nction defined on pairs of character distributions. Raw

dist.ane<~ matricc~:-; must. oft.eu he transformed into matrices that reflect "true"

~~volutiouary distances [SO!JO, pp. 122-1:IG]. These types of data arc not ideal for

tlu~ task of n~mustrud.iug C'Volut.iouary history, but !.hey arc sufficient: as taxa

origiual.<! hy iulwrit.a.uce with modification, <'ach ancestral lineage in the evolu­

tionary tree! has l<'fl, its signature in its descendents, either as character states

that haw propagal.<'d t.o that lineage's descendents, or as a certain evolutionary

distance by wl1ich each such desceud('nt is separated from every other taxon in

t.lu• t.r·c•c'. lienee, many of the ancestral lineages, as well the details of the process

hy which ancestral litH!agcs gave risP to the observed taxa, can he reconstructed

using t.hl' types of data above [EC80].

'I'IH're ar<' several other useful representations for evolutionary tt·ces besides

l.n·<· gmphs. In trees constructed using discrete character data, each vertex in

the• t.r<·c has its own set of character-state values. These trees can be summarized

hy tlw dtamct(•r-st.at.C' sets or their vertices (sec Figure 4). Alternatively, for

<'ach chamd.t•r·, one can map the set. of vertices possessing each chara.cter-state

onto t.ha.t. dwt·ader's character-state graph to create a cladistic character, and

sttmmariz<' a tn•e by its set of cladistic characters. Cladistic characters are often

t•a.sit•r t.o visualiz<~ a .. c.; trees of subsets (see Figure 5). Discrete clw.racter matrices

and individual rharartt•rs may also be summarized by cladistic characters. Non­

rt•tkulal.c• (•clgt•-wdghted trc('s can be summarized by their patristic mat1'ices,

28

Page 47: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.• '.

..

' . ' ~ r (

P == (p;j], where Pii is the sum of till' weights of all t•dgt•s 011 tht• path IH'I.Wt't'll

taxa i and j in the given trc•e. t\ trC'P whost• pat.l'istic matrix is mt nlt.ranlt't.l'ir of

height q can be reprcsrnted [.JSil, pp. -1~-- !iO] [1\l\IS<i, p. :H 2] ns a (11 + I )-h'll.t!;t.h

sequence of pairs (P;,l;) such that

2. I; is an intege1· such that 0 =It < /2 < ... < l(•t+l)'

3. Pi is a ]H'opcr refinement. of Pi+ I ( l :S i :S q), ;uul

For example, the partition representatiou of t.ht• ult.rautPI.ric t.n•t• of !td~lt t. '' itt

Part (b) of Figure 3 is

(P11lt) = ( { { 1 } , { 2} , { a } , { ,, } , { r, } } , o ) ,

( Pz, /2) ( { { 1}, { 2}, { tJ}, {a,;,}}, 1;,),

( P3, l:J) ( { { 1 , 2} , { tJ } , { ; I, ;; } } , ~!)),

( P4, /<~) - ( { {I, 2, tJ}, { :J, !i}}, :iO), a11d

(Ps,/5) - ( { { 1, 2, a, t1, r,}}, Ml ).

By convention, the weight of an edgt! iu a tree rec.oust.nu:l.t!cl usiup; disndP-

character data is the sum of the weights of all dtarad<~r-st.at.e dlil.llp;<!S ( duLrrtdt:r-

state transitions) between the vertices clefiuirtg that t!clge (st~e Figun~ tJ ).

Each approach to phylogenetic aualysis consiclemcl iu this t.lu!sis ernlwrlit•s a

r.riterion that assigns a cost to each possihh! twe rdativt! t.o a part.it:ular t!at.a

29

Page 48: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

11101

~ 101 OJ 11000

~ 1 2 3 !)

10100- 10000-

;; II 11100 00000

) 10011

1000~ -

~ 11001

Figur<! 4: A discret.l~ character trC'e. This umooted tree is based on 5 unweighted binary rharad.crs, and has a length of 9. The number on an edge denotes the charad<•r wltosc stat<~ has changed on that edge. Note that there arc multiple dmrad.(•r-st.ate trausitions iu characters 2 and 5.

set.. Tlu~ t.re(~s sdec:ted hy each approach a::; the best estimates of the actual

(Wolutionary t.reP for a. data set are the trees whose cost is optimal for that

dat.a sd uucl<'t' I. hat approach's criterion. Hence, each approach to phylogenetic

analysis is au optimization problem.

Sewral of the most popular approaches to phylogenetic analysis that use

disrn•l.t• charar.ter data arc:

• Phylogenetic Parsimony [Hen66, I<F69): Selects the evolutionary tree

of shortest length that reproduces the character distributions for the given

taxa, wiH're the lrnglh of a tree is the sum of the weights of all edges in the

tr<'P. The hypothesis encoded in this tree is preferred because it explains as

much of t.h(• ohs(•rved character distributions as possible by character-state

t.rausit.ions in a common ancestor, and invokes the fewest ad hoc hypotheses

of suhst•qtwnt character-state change [Far83}.

30

Page 49: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

0

c.

A :3

B 3

c 2

D 2

E

F

0

(a)

0

(h)

c2 2

2

I

0

0

0

0

Ca

:J

a 2

I

2

()

Figure 5: Character-state trees (adapted from [Day88]). Part (a) Hhows t.llre(! character-state trees Ott C2 , and C3 • Part (b) shows tlu! t.r(!es of 1m l,sds corr(!· sponding to each of these characters as determined by discrete dumu:ter IIIILf.ri x X on the set oftaxa 8::: {A,B,C, D,B,F}.

:JI

Page 50: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Character-State Character ( !riteriou Transition Restrictions Order Type

Wagner WL No restrictions. Linear ( Lillf!ar) Wagner "VG No rcstricLions. Ordered ( Cmwral) Fitch Fi No restrictions. Complete (!ami 11 -Sokal cs No transitions from derived to an- Ordered

cestral states. Dollo Do One transition from ancestral to de- Linear

rived state per character. Chromosonw CI One transition from ancestral to Linear I nv<·rs ion heterozygous state per character; ( l'olyrnor·phism) no transitions from ancestral to de-

rived or from derived to ancestral or heterozygous states.

( :1•tu•r a I i:wd Ge Specified for each character. Ordered

Table 1: Phylogenetic parsimony criteria.

Tlwn• an~ several phylogenetic parsimony criteria, each of which encodes a

different model of evolution by placing different restrictions on the types

and 1111111hers of character-state transitions allowable in a tree (see Table

l ). The Wagner Linear {KF69], Waguer General, and Fitch {Fit71] criteria

nssurne the simplest model of evolution, in which character-state change is

rcvcrsihl<•. The Camin-Sokal criterion [CS65] assumes that character change

i.s irreversible, while the DoJio criterion [Far77] assumes that character-state

rhange is reversible but character-state origin is unique. The Chromo-

sonw lm'<'rsion criterion [F'ar78] is a restricted Dollo criterion whose char·-

adt>rs have three statrs: ancestral (A), derived D), and heterozygous (H

= {A, D}). Tlw Gl'neralizl'd parsimony criterion [SC83] represents charac-

32

Page 51: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

ters as matrices of distances b<•twcen r.hnntder states (.o.:/rpmalricr .. ~). which

allows this criterion to simulate all possihl<• parsimun~· nit.<•l'ia by plal'inp;

appropriate restrictions on tht• st.at.<•-transit.ioll W<·ip;hts [SO!HI, FigHt'<' II,

p. 464}.

Note that in the biological lit.eratur<', phylop;<'tl!'l.ic parsimony lll!'l.hocls an•

also called cladistic parsimony or cladistic llll't.hods, and that. t.lu• l.t'l'lll

"phylogenetic systematics" is on occasion I'Psl.ricl.<•d l.o t it<• i 11 fPrt•ttc<' of t•Vo·

lutionary trees by phylog<•netic parsimony nwthods.

• Character Compatibility [M EHr>]: H<•consl.t'llds tlw l'Volu t. iollii.I'Y l.n·<·

from the largest subset of the given dtamc.t.ers that. at·<~ pairwis<' comtmt.ihl<',

where two cladistic character:; /\'and /,an~ r.omp<tt.iblt• if t.ltt•n• I'Xist.s a t.n·<·

of subsets M such that the trees of subsds K ami L of l.ht•sp charad.<~rs arc•

subsets of M. F'or example, in Figure;,, dw.rad<·r~ (.'1 ;uul C''J. ;u·p com Jml.il•l<•

and c2 and c3 are compatible, hut. c, and e:, ii.J'(~ nol. t'Oillpat.ibh•.

• Maximum Likelihood [Fd8lj: Sdeds tlw twolut.iourtt·y t.rc•e thai. hm; l.lu•

greatest probability of producing t.he frequencies of c~ach t.ypc~ of dt;u·arl.l'l'-

pattern in the data, i.e. t.lw maximum likdiltoo.J P(Citamdf:/'.'1 I 'J'r,.,:),

relative to some probabilistic modd of charader-:;t,at.e chauge.

• Invariants (Evolutionary Parsimony) [CFH7, Lak~7]: Sd,~d t.IH~ < ~volu -

tiouary tree that best. satisfies its as!;ociated i1um1·ia.nls, whidt an~ a.lgdmtic

Page 52: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

constraints on the ohsf!rved frequencies of each type of character-pattern

t.bat !told for that tree over all possible discrete character matrices. The set

of invariauts for f!ach evolutionary tree is derived relative to some proba-

f,ilistic: model of character-state~ change.

ThPrt! a.re many approaches to phylogenetic analysis using distance matrix

dat.a, all of which assume tltat tiH~ given dist.auces represent or closely approximate

ad,ua.l c~volut.iouary distances between taxa. Most of these approaches compute

tlu! ul1.mm<'trir or tree metric corrcspouding to the tree that has the minimal

clist.arlct• from the given semimctric according to some statistic. Many of these

st.at,istks are based on the Minl<owski metrics Lq, q ;::: 1, defined on pairs of

llmt.rict!S /) and P on taxa S.

f.~q{D, P) = { 2: IDxy- Pxylq}l/q J~,ves

Loo((l, p) = ma~ IDxy- Pxyl :r,yE~

Sl'Vt'l'a.l snell statistics for scmimctric D and ultrametric or tree metric P are

atul

P,t(/J, P) = L IDxy- Pxyr [aE{l,2}] x,yes

F(D, P) = lOO X Lx,yeS IDxy- Pxyl Lr.,yES Dxy

(1)

(2)

(3)

(4)

wht•n• F1 il' t.ht• .f-st.ntistic [F'ar72], P2 is the least-squares fit criterion [CE67],

mul F wm1 d<•linl'd it1 [P\V76]. Note that each statistic in both of the groups

34

Page 53: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

{ L11 Ft, F} and { L2 , F2} is arithm<•t.irally t•quivah·nt. t.o ot.ht•r mt•mht'I'S of i t.s

group. One can also vil'W till' gin•n dist.anct•s not as t.argc•t.s t.o llt' approximat.t•d

but as lower bounds on what should he apprm:ima.l.l'd. This is t•mhodit•d i 11

the concept of dominance, i.e. for ml'trics /) atul /J' on a st•t. of oh.krt.s ,1.,', /)

dominates D'(D ~ D') if'v';z:,y E 8, /J.J'Y 2:: D~.11 [.JS71, p. rl::!]. Though dominann•

was originally proposed for fitting ultramdrir l.r<"l'S, it, has also 111'1'11 wwd in Holllc'

methods for fitting additive t.n•es [S090, p. 'tf)l] .

Each approach to phylogenetic analysis <~mhodic•s SOilll' lliOdt'l or t.lu• c•volu­

tionary process of character change; some art' 111ol'e explicit. t.ha.u ot.IH'rs in t.ht•

statement of the model that they use. The t.n•c•s pmclucc•d hy C'ach approarh a.n•

useful to the extent that one believes in the model c•mhodit•cl hy t.lmt. approa.C"b.

See [Fel88, PI-1592, 8090] for a complc~t.e review of appl'oad~t•s t.o phylo,!!pnot,j('

analysis and computer programs implementing thc•st• appma.dl!'s.

3.2 NP-Complete Problems in Phylogenetic Systematics

Since 1982, decision problems for t.hc maJor phylogcmet.ir: parsimouy nit1•ria

[Day83, DJS86, DS87, GF82], t.he character compat.ihilit.y nitPI'iou (I>SHfiJ, aud

various distance matrix fitting criteria fo1· ultranwtrit: awl ;ulclit.ive l.rec!H [l>ayX:J,

Day87, Kri86, Kri88, KM86] have been shown NP-complete using rc!dur:t.ions fro111

the NP-complete problems given in Tahlc 2. As later sedious of t.his tlwsis will

make extensive use of both these dcfiuitious aucl these redudicms, t.lu~y will be~

Page 54: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

VEUTEX CoVEll (VC) (GTI]

Instance: A graph G = (V, B) ami a positive integer /( :51VI·

Question: Is t.IH!re a vcrlcx r.ovc1·of size I< or less for G, that is, a subset V' ~ V such that IV' I < /( and, for each edge { u, v} E E, at least one of u or v lwlougs to V''!

EXACT CoVEll BY 3-SETS (X3C) (SP2]

Instance: 1\ set X with lXI = :tq and a collection C of a-element subsets of X.

Question: Dol's C contaiu an rxacl cotJcr for X, that is, a subcollection C' ~ C surh t.hat ew!ry clement of X occurs in exactly one member of C'?

CLIQIII~ [GT19]

Instance: 1\ graph G = (V, E) ami a positive integer J :5IVI·

Question: Do('S C: contain a cliqur of size J or more, that is, a subset V' ~ V surh that IV'I > ./and every two vertices in V' are joined by an edge in E?

Tahh•2: Basic NP-complcteclccision problems (taken from [G.J79]). The reference tmmlwrs m.;sigtu.•ll to these problems in the list of NP-complcte decision problems in [G.J7!l] arc given in squan~ hrackcts.

n•viewt'd in this section. Each reduction will be given a formal definition in the

sl.yh• of [l\a.r72], followed hy a sketch of its proof of correctness.

3.2.1 Phylogenetic Parsimony

t•:arh of th<•st.' prohiPms is given as input a discrete character matrix for m taxa

aud •I rharacters, and operates on an implicit graph G whose vertices are the

sd. of all d-dinwnsiona.l points defined by the statrs of the given characters and

36

Page 55: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

........................ ~ .. ---~··-···· .. -·- ......... ........ _ .... )O'In.J-·1\..,....__, _______ ......,_ .. ~----- ----·-.. -· ---·-.. -

whose edges are specified by the allowahlt• transitions lwt.W<'<'Il t.h<' st.atl's in t.ht·~"'

characters. Each phylogcnetic parsimony problem st•t'ks tlw l'\'olut.ionary tn•p iu

G of minimum length that includes the given taxa, suhjcrt t.o t.lw n·st.ricl.ions

on character-state transitions that arc particular to t.lmt. prohlc•Jit 's nit.c·riott (st•t•

Table 1 ). The given r.hamcters can IH• r<•strict<•d itt various ways to p;PtH'rat.<• a

family of phylogenetic parsimony prohll'm "sdwmata.'' (st'l' Tahh•s :t, ·I, and l'\);

each phylogenetic parsimony r.rit<.·r·ion rau t.h<'ll lw applit•d t.o t.ltt'H<' sdwmat.a. t.o

generate problems. The hierarchy of sllhprobleltts gPIIPral.t•d hy t.ltt•st' scht•nmt.a.

will be useful in lat.er sections of this tlu•sis.

Consider the following restrictions 011 the gi wn rha.mct.<•rs:

• Cladistic vs. Ordered vs. Qualitative: A rl(u/i.'>lit· (C) JII'O"It•llt is

given polarized characters, an ordered ( 0) prohlc~111 is giv<~ll orcl<•n•cl dmt·­

acters, and a qualitative ( Q) is gi veu 1111onlt~ml dt amrl.<•rs. Eitch prul,l<!lll

finds solutions that arc cousistcnt wit.h its clmra.ders; how<~V~~r·, cmi<~J'('<I and

qualitative problems must also find charader polari:ml.iolls aucl oniNillgs

for which solutions exist.. Thn dadist.ic / qmtlit.at.ivc ~ dist.indiun was IIUI.d<~

in [D.JS86, EM77, EM80] for binary characl.ers; as qualit.al.iv<~ arul cmkrcd

problems are equivalent for biuary cltaract.ers, t.he dist.iudiou of onl<~ri11g is

only applicable to unconstrained characters.

Cladistic problems correspond to phylogenetic; aualysiH pror:r!dllrPs t.hat. pro­

duce explicitly rooted trees, orrlc~red problems c:om~sporul t.o proc:edurc~s

Page 56: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

t.llat. produce eitner rooted or unrootcd trees, and qualitative problems cor­

resrond t.o procedures :::· w1• as Transformation Series Analysis [Mic82] that

simult.ancously produce trees and derive character ordering and polarization

from the given data (cf. [Lip92]).

• Binary vs. Unconstrained: A problem is binary (B) if it is restricted to

hirw.r·y charactct·s; otherwise, the problem is unconstmined (U).

• Weighted vs. Unweighted: A problem is unwcighted (U) if it is rc­

st.rir.t.ed t.o unweight.cd characters; otherwise, the problem is weighted (W).

'I'll<' live sr.lwmal.agcneratcd by the first two of these restrictions arc given in Ta­

hles :J aud 'l; t.hc remaining restriction yields a total of ten schemata. The validity

of t.hese restrictions for each of the phylogenetic parsimony criteria is shown in

Ta.hle !l. Hestrktions do not. apply to a particular criterion i~ they conflict with

t.lw rest.rirt.ions imposed by that criterion e.g. Dollo criterion characters can only

haw three st.atcs; Pitch criterion characters are by definition unweighted and or­

d<•rcd. The application of all phylogenetic parsimony criteria to valid schemata

yields :m phylogenetic parsimony problems {see Tables 6 and 7).

Additional phylogenetic parsimony problems may be generated by allowing

1'\'olut.ionary tn~<.·s to include limited amounts of reticulation. Consider the prob­

IPm sclwmata in Table 8 defined for each uon-reticulate phylogenetic parsimony

pwhl('lll X. TheHc schemata restrict the amounts of available (the SHX and SRX

:-;cltl'llHt.la) or allo\'.·able (thl' ~~-HX and k-RX schemata) reticulation that can oc-

38

Page 57: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

BINARY CJ,ADISTIC X (BCX)

Instance: Positive integer d; a subset ,c.,• of {0, I }d ; and a po11it.iw inl.c•p;c•r IJ.

Question: Is there a phylogeny satisfying r.l'it.erion X t.lwt. iuchulc•s .'i, is rooi.C'd at the root-type vertex, and has length at. most. /.J'?

BINARY QUALITATIVE X (BQX)

Instance: Same as BCX, exc.ept that 110 rhara.d.er is n·quired to lw clin·c-1.1'cl.

Question: Is there a phylogeny satisfyiug criterion X t.hat. iuc·.luclc•s S iiiHI lm:-~ length at most B?

Table 3: Phylogenetic parsimony decision prohlc~rn sdu~nmta (uotJ·I'c~t.ic-ulatc• trees) (adapted from [D.JS86]). Tht~se schernat.a are sf.a.t.c•d rc ~lativc~ t.o a. phy­logenetic parsimony criterion X. If X E { Cl, CS}, root-type is "all-atJc'c~stral";

if X = Do, root-type is "all-derived".

Note that the statements of problems given ahove difrcr from (D.JSHli, I>SH7j iu that the bound B is on the number of edges rather t.hau t.lu ~ 1111111her of w~rt.ir«'s

in the tree. The two formulatious arc cquivaleu1,; ltowevc•r, t.lw for·uu~r a.llows a more natural interpretation of weighted problems.

Page 58: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lJNCONSTIIAJNJm CLADISTIC X (UCX)

Instnnce: Pusitiw iut.eger d; sets At, ... , Ad of character-states, and dit·cctcd dmrcu:ter-statt~ graphs G1, •• • , Or1 specifying allowable transitions among tlws<· sta.t<·s; a. subset 8 of A1 x ... x Ad; and a positive integer B.

Question: Is tlu·rc~ a phylogeny satisfying criterion X and the given directed cha.rad.Pr-state graphs that iuclttdes 5', is rooted at the root-type vertex, and has h•ngt.h at most. 13'!

lJNCONSTIIAINEI> OIIDEHEI> X {UOX)

Instance: Sanw as tJCX, except that none of the character-state graphs are din·d<·d.

Question: It.: ther<' somP polarization of the given character-state graphs that allows a phylo~cuy satisfying criterion X that includes 8 and has length at most 11'!

UN<:ONSTHAINI:D QUALITATIVE X (UQX)

Instance: Sanw as UCX, <'XCl~pt that none of the character-state graphs are ord<'rt'd.

Question: Is t.hPn' som<' ordering and polarization of the given ch~racter-state gr<~phs that allows a phylogeny satisfying criterion X that includes 8 and has l<•ngt.h at most /3'?

Tahh• ·1: Phylog<•nt'Lir parsimony decision problem schemata (non-reticulate II'<'<'S) (cout.'d fmm Tabh• :3}.

...

·,,'~ \ .,

:: j

(

.,

.· ... '

I

·~

Page 59: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Cladistic I u n·,·eight.ed I Binary I 0 rdt'l't'd I #

Criterion Weighted lJ IH'OJISI.ra i JLt•d Qualit.at.i\'t' Pwh. Wagner Linear WL J J O,Q s Wagner General WG J J O,Q H Fitch Fi J 0 ~

Camin-Sokal cs J J C,O,Q I~

Dollo Do J J C,O,Q I~

Chromosome Cl C,O,Q a Inversion

~

Generalized Ge J J 0 ·I Total :HI

Table 5: Applicability of input charactt•t· rc>stridions to phylogt'llt'tic pa.rsituouy criteria. The given total tlltmbrr of prohlc>ms is small•·•· t.hau <'XPI'f'l.t•d IIC'raus•· some of these problems are cquival<'nt; HI'(' Table•:; (i and 7 for dC'1.1tik

cur in a tree. Sec Appendix A for furtlwr di:;cussion of tltc•s•· sdu·mal.a. Eal'h l'a.n

be applied to all phylogenetic parsimony prohh•ms C'l'<'iii.Pd so far, p;iviup; a. tot.;d

of 156 phylogenetic parsimony problems. One such prohlt•m i:; k-ltl JtiOWL, t.hc·

k-Recombination under Unwcightcd Unconstraiul'd Ordered W<tp;tu•r Litu•ar p;tr-

simony problem. Note that as ret.iculat.ion is always dirc!f'l.<'d , t.hl! l.rt•c•s prodw·"d

by these problems arc rooted.

It is not obvious at first glance that the prohlt!lllS allow an! in NP. CoiiVt!ll -

tiona! tree-traversal algorithms can be modified to dw1:k all parsi111o11y nit.1•ria

for both non-reticulate and reticulate tn~t·s i11 tiuw poly11o111ial i11 tlw sill<! of tlw

candidate solution [StaT80, Scctiou a], hut. sur.h solut.io11s arP uot guaraul.<!t•d l.<1

be of size polynomial in the size of the insl.!• 1lC!' lwcaust! tlwy 111ight. lw as larg<! as

the implicit graph on 0(2'1) vertic.es from wh ir:h t.lu~y an~ t akPII. JloW«!Vt!l', uud• ·r

41

Page 60: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Ac:ronym Problem

IJBW Unwcighted Binary Wagner IJBOWL Unweight.cd Binary Ordered Wagner Linear lfBQWL Ouweightcd Binary Qualitative Wagner Lineai· IJBOWG lJuwdghted Binary Ordered Wagner General IJBQWG Unwcightcd Binary Qualitative Wagner General BFi Binary Fitch

HIJW Unweighted Unconstrained Wagner UUQWC: Unwdghted Unconstrained Qualitative Wagner

General UF'i Unconstrained Fitch

WBW Weighted Binary Wagner WBOWL Weighted Binary Ordered Wagner Linear WBQWL Weighted Binary Qualitative Wagner Linear· WBOWG Weighted Binary Ordered Wagner General WBQWG Weighted Binary Qualitative Wagner General

UlJOWL llnweighted Unconstrained Ordered Wagner Linear lJIJQWL. Unweighted Unconstrained Qualitative Wagner

Li ncar W\JOWL Weighted Unconstrained Ordered Wagner Linear \tVUQWL Weighted Unconstrained Qualitative Wagner Linear UlJOWG Uuweighted Unconstrained Ordered Wagner General WlJOWG Weighted Unconstrained Ordered Wagner General WlJQWC: Weighted Unconstrained Qualitative Wagner General

Tahlt• (): Phylog<•net.ir parsimony dt•rision problems (non-reticulate trees). Each p;ruup of <·quivalt>nt. probll'ms is indented, and appears after the acronym for that p;rou p.

Page 61: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Acrouym Problem

tJ BCCS Unweighted Binary Cladistic Camin-Sokal tJBQCS Uuwcighted Binary Qualitative Camin-Sokal

tJBOCS tJnwcighted Binary Ordered Camin-Sokal tiBQCS Unwcightcd Binary Qualitative Camin-Sokal

(J uccs U nweightcd Unconstrained Cladistic Camin-Sokal llliOCS Unweighted Unconstrained Ordered Camin-Sokal liUQCS Unweighted Unconstrained Qualitative Camin-Sokal WBCCS Weighted Binary Cladistic Camin-Sokal WBOCS Weighted Binary Ordered Camin-Sokal WBQCS Weighted Binary Qualitative Camin-Sokal wtrccs Weighted Unconstrained Cladistic Camin-Sokal WtiOCS Weighted Unconstrained Ordered Camin-Sokal WUQCS Weighted Unconstrained Qualitative Camin-Sokal U BCDo Unweighted Binary Cladistic Dollo trBQDo Unweighted Rinary Qualitative Dollo

lJBODo Unwcightcd Binary Ordered Dollo UBQDo Unweighted Binary Qualitative Dollo

llllCDo Unwcightcd Unconstrained Cladistic Dollo lJ IJODo Unweighted Unconstrained Ordered Dol1o tJ\IQDo Unweighted Unconstrained Qualitative Dollo WBC:Do Weighted Binary Cladistic Dollo WBODo Weighted Binary Ordered Dollo WBQDo Weighted Binary Qualitative Dollo WtJCDo Weighted Unconstrained Cladistic Dollo WUODo Weighted Unconstrained Ordered Dollo WUQDo Weighted Unconstrained Qualitative Dollo

CCI Cladistic Chromosome Inversion OCI Ordered Chromosome Inversion QCI Qualitative Chromosome Inversion \IBGP Unweightcd Binary Generalized lJlJGt• Unwcight£>d Unconstrained Generalized \VBGP Wcightt'd Binary Generalized WllC:t• WP.ighted Unconstrained Generalized

Tahlt• i: PhylogPIIL'lir parsimouy dt•dsion problems (non-reticulate trees) (cout'd from Tahlt• (i).

Page 62: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.,, ·.

\· i I

SELECT HYBRIDIZATION UNDEH X (SHX)

Instance: San1c as for problcn1 X, \Vit,h au f\(ldlthluetl par;t.JHPtt'r li~ a givc'll polynomial-sized (in the parameters of X) sd. of a-hypi'J'Cir<'S.

Question: Same as for X, with the additional condition t.ha1. t.ht• phylu~t·ny t'illl include any subset of the :3-hyperarrs in H.

k-HYBRIDIZATION UNDtm X (k-HX)

Instance: Same as for problem X, exc<•pt that. t.lw implicit. graph illc'oJ'IHH'H.I.Ps a fixed type of 3-hypcrarc, and there is a.11 additional pat·a.nwt.c•J' ~·,a positivi' integer.

Question: Same as for X, with the additional condition that. tlw phylop;I'IIY C"llll

include~~~ 3-hypcrarcs of t.hc fix<'d t.ypt'.

SELECT RECOMBINATION UNDER X (SR.X)

Instance: Same as for problem X, with an a.cldit.i01ml para.nlC't.c•r U., a l!,iwn polynomial-sized (in the parameters of X) sd of 1-hyJu•nu·•·s.

Question: Same as for X, with the additiomd condition that tlw phylop;t'II.Y l 'illl

include any subset of the 4-hyperarrs in II..

k-RECOMBINATION UNDEH. X (k-RX)

Instance: Same as for problem X, except. that. t.lw implidt. gmph irll'orJiorat.c•s H.

fixed type of 4-hypcrarc, and there is an additional par·;uneiN k, a. positiVI' integer.

Question: Same as for X, with the additional c:owlit.io11 that. t.lw phyloppny ..au

iuclude ~ k 4-hypcrarcs of the fixed type.

Table 8: Phylogenetic parsimony (b:isicm prohlc:m sdu:11mt.a (rdk•tlat.t: t.rc•!•s) . These schemata arc stated rcl1\tive to a 11011-r<:t.kulat.c: phylogc:ndic parsi111o11y problem X.

Page 63: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

n~rtaiu additional restrictions, the problems defiued above can be :ohown to be in

NP. Cousider the relationship hetW<'Cil solution cost and size.

Lemma 3 A polyuominl-liuw rwndr.tr.nninistic computation is guaranteed to find

all solutions V lo an in:;lancr: I of an unweighted (weighted) JHu·simony problem

X .o;w·h thai bx(}/) :5 p(j/1) (bx(Y) $ ]J(I/I)W~niu(/)) for some polynomial p.

Proof: Ohsc~rvc• that the l<u-g<•st solution of cost k to an iustance lunweighted

parsi111ony prohle111 is a lr<'c 011 1.: + I vertices, and that the largest solution of

l'ost. 1.: t.u au instance I of a weighted parsimony problem is a tree on k/Hlmiu(/)

wrl.kc•s with edgc•s of Wc~ight l.Ymin(/), where Wmin(/) (Wmax(/)) is the smallest

( largc•st.) rhat·actc•r- transi tio11 weight i 11 the given instance. I

Solutions sat.isfyi ug t.IJ('sc> hounds exist for each non-reticulate pity logenetic par­

si!lwtty prohlc•m dl'finc~d ahovc. For Wagner Linear, Wagner Genera], Fitch,

Camin-Sokal, atul C:c•ut>ralizc•d problems, this solution is a tree rooted at the

;d)-ann•st.ral vc•rt.ex which has paths to each taxon that usc the appropriate

dtat'iu'tl'r-sl.atc• transitions to gcnc>ratc the states for that taxon; the solution for

t.IH' l>ollo (Chromosonw luVl'rsion) problem is a tree rooted at the all-ancestral

(all-A) Vt'l'l.l'X that. has a. path t.o th(• all-deriVl'd (all-H) vertex, which then has

paths to l'ach taxon that use the appropriate "reversal" transitions to generate

t.llt' st.ai.Ps for t.hat. taxon. Each of these solutions is of size O(mdl (log Wmax))

ancl cost. O(uul/(lr .. lllx)). whc•rc> 1 is t.he IC'ngth of the longest path between two

stalt•s iu any dtal'<lrlt'r-stat.c• graph in the instance and H11114x = I if the problem

·15

Page 64: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

---vl"'""',..-""-.,~•,., ••,.. .. .....,.._,.... ..... .._,_..., • .,,__- .o. •<~ ••r . .... .. -.--~-·•·._.,..-~__....,.,,....,..........,,.....-....,.....,~""'*........,......'_.........,

is unweighted. As the cost of a solution i!' proportiunal to it.s si~l' in HI\Wl'i~hh·d

problems, any solutions (including optimal solutions) lwt.h•r t.l1at1 t.hosl' ~in•11

above have costs t.hat satisfy thP bound in Ll'lllllHl :t

Corollary 4 A polynomial-lime nondrlrrminislit· comJnllnlion i.o; .f!lltll'tllllt't'd lo

find all O]Jfimal .<~olutions of any in sf anu of tl 1/lm-rd iculal r 1111Utri!1h fnl l'h !flow­

n die parsimony Jn"oblr.m.

This relationship dor.s not hold for· w<'ight.t•d problt•ms; solutions of lowl'r cost.

may exist that arc larger than solutions of hip;ht•r cost .. llowt•vt•r, if t.ht• prohlt•m

is restricted to those instances I such that. wlllllX(/) < l'(l/1)11'111in(/) for HOJilt'

polyrwmial p, any solutions (including optimal solutiot1s) lu•t.t.t•r than t. host• p;iwn

above must have cost k ~ O(nulLWnmx) ~ p'(I/I)W111ax(/) ~ ]l'(I/I)Wmin(/) for

some polynomials p',7l', and thus haw cost.s that satisfy t.hP lmuucl in Lt'111llta :t

Corollary 5 A polynomial-time norulclr.rminislif' l'ompulalitm i.<~ .'/IUll'tWII'f'rl /o

find all opti1.tal solutions of any inslant·r. I of tt rwn-rrlirulalt· mr·i!Jhlnl phy/o!Jr·­

nctic parsimony problem such /hal W11111x(/) :5 1'( Ill) W111i11 ( I) fm· HOUil' polyrwmirtl

p.

Thus, all uou-rcticulate unwcighted aud wdghterl phylog1~11d.ic parsi111ouy pro!.­

lcrns defined above whose weights are so restricted are iu NP. /\s solut.ious t.o

cladistic problems are also solutions to ordert!d and qualit.at.iw~ probi1 ~1J1s, aud <ts

each reticulate problem cau iucorporate a uumlwr of retkulat.iuus at most. pulyuo-

Page 65: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

rnial iu tire paramct{!rs of its uon-rcticulate counterpart, all parsimony problems

ddiued above an! in NP.

Tlu! reductions given in [D.JS86, DS87] that establish the NP-hardness of the

rmu-rdinllat.e uuwdghtcd binary Camin-Sokal, Dollo, and Chromosome Inversion

pltylogf!llt'l.i<: parsimouy problems are given in Tables 9 and 10. These reductions

us1! t.lf(' bask id{!a of Karp's reduction from EXACT COVER to STEINER TREE

IN GH.APIIS {h~trTl] ~namely, reduce some problem involving the selection of a

snhcollt•dion of a coll<!ct.ioll of subsets on a set of items onto a three-level tree in

wltidt tlu! lt•a.Vt!S ronespcmd to the items, the root to the selected subcollection,

and t.he t·c•ntrtining iuternal vertices to the subsets in this subcollection.2 In the

I'PIItwt.ions in Tahlt!s nand 10, the items arc the edges of G and the subsets in the

culh•rt.iou arc! t.he sds of edges adjacent to each of the vert. ices in G. The trees that

<Ln' solutions in ('ach of the reduced instances contain subtrees that have three

h•v<'ls ([D.JSHn, Lt•mma I); [DS87, Lemma 2]), where the internal vertices selected

ou t.ht• st'<'tllld lt•wl of each t.rec correspond to satisfying vertex covers for the

ul'iginal illsl.attct•s. lu t.hc case of the Dollo and Chromosome Inversion problems,

t•adt solution t.rc·t~ has a ''tail" composed of the vertices in Y which ensures that

t.ht• l.rt'l' has a. root that is consistent with its problem's criterion. Moreover, one

fan construct trl't's from satisfying vertex covers that correspond to satisfying

2TIIt' rt•durt.ion Ill' giv£'11 in [Kari2J is flawt'd, ns the reader can verify for items T = {11,/,, ,., rl, r, J,u} anti rollt~rtion of suh~>t.'t.s S = {{a, b,c}. {c,d,c}, {c,/,g} }; the edges of weight ll allow a llolut.iou t.n•t• of lt•ngth a t.o the r<'dllrC'd instance, even though the original instance has nn t'Xat'l rowr. 1\n•ntl'l has fixed this problem by a variant on [Kar72] that yields a reduction from SET <'OVEH to STEINEH TREE IN GRAPHS [GKR92, Tlworem 3.4].

4i

Page 66: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

i f r. ,.

' '· ! '

r· ·,

; )

;

' •

,. " -~ ~·

~ ·-·· - ~ ·--~ - . -.... ~.-.,... .. .......... ~...--., ....... ____ ~ ... ~- ··4~---"· -··~ .... ""-.-... ............. ............. ,_, ...... -~flo.~wt).O'j ... ?_,.. ........ "' ... ----·-"""""""'

VC sf,l UBCCS I l!BQCS [D.JS8fl}

d = lVI, where each charar.t.<•r COtT<'SJHHlds t.o a particular wrl.t•x I' E v.

S=OUX,

where 0 is the all-ann•stral wrt.('x, and X is t.hc• :wt. of wrt.in•s corresponding to the edges in E (for r. = {1t,11} iu 8, I.Ju•t·c• are l's in the clmract.ers c.orrespondiug t.o It aud '' and U's elsewhere).

B = K +lEI

VC s;,l UBC/Jo I UBQDo (adapted fmm {D.ISHrl})

d=3IVI, where characters 21 VI+ l t.o d coi'I'('SJHltHI to t.lu! wrl.ic·c•s iu v.

S=OUXUY,

where 0 is the all-ancestml wrt.ex, X is t.lu~ sc!l. of Vc!rl.ic·c•s corresponding to the edges in E, and Y is the! S('f. of Vc!t'tin•s 1Ji, 1 S i S d, such that 1/i has I 's in dt•u·adc!I'S I t.o i ami O's elsewhere.

Table 9: Reductions for phylogenetic parsimony dc!r:ision prohlc•ms.

Note that the reductions given for the Dollo and Chromosome! luwrsiou pruhh•111s differ from [D.JS86, DS87] in that the d = :JIVI instead of '2/\ + lVI· Tlw proofs given in [DJS86, DS87] still work for these moclifi(!d recluc:t.ious; lltorc!over, tllC'st! modifications simplify the transformation of these mauy·otu! re!dur:tious f.o lltdrit: reductions in Section 4.2.

48

Page 67: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

VC =::;~;, fJC()f / UQCI (atlapled from {DS87})

t1 = :JIVI, wlu~re r.haraders 2jVI + 1 t.o rl correspond to the vertices in v.

,c.,• = II U X U Y,

where II is the all-11 vertex, X is the set of vertices corre­sporuliug to the edges in E (for e = { u, v} E E, there are D's in t.he chara.dcrs corresponding to u and v and H's clse­whPre), and Y is the set of vertices JJi, I $ i =::; d, such that !Ji has A's in r.ha.rncl.erH 1 to i and ll's elsewhere.

n = 1\ + :JIVI +lEI

Tahl<~ I 0: He<ludious for phylogenetic parsimony decision problems ( cont 'd from Tahle !) ).

t.n·c•s for t.ht• rc·duced instances ([D.JS86, Theorems 2 and 3]; [DS87, Theorem

:~]). Tlw tr<'es will have the tlucc-level structure as long as the all-ancestral

(or a.ll-11) wrt.PX is included in 8; hence, these reductions simultaneously show

that. t.ht• daclist.ic and qualitative versions of each problem are NP-hard. For the

sanw n•a.son, the reduction for the Camin-Sokal problems also shows that the

1t' .wt•ight.t•d hinary Wagner problem is NP-hard [DJS86, p. 41].

The• noJI-I't'lirulate binary unwcighted problems are restrictions of all other

non-rt•l.inalat.e and rdiculat.e problems (set k = 0 (k-HX,k-RX) and R = 0

(SIIX,SHX)); thus, all Wagner, Fitch, Camin-Sokal, Dollo, and Chromosome

ln\'t'rl'ion pmhlems are NP-hard. As any ordered problem can be solved by an

appropriatc.•ly strurturPd instance of the Generalized parsimony problem, all Gen-

t•rctlizc•tl parsimony prohlc.•ms arC' also NP-hard. Hence, all phylogenetic parsimony

Page 68: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

problems considered above are NP-complct<·.

A proof that UBW and UUW <m.• NP-romplc•t.c• was ~iwu prt'\'ious t.o t.hat.

m [D.JS86J by Graham and Foulds [GFH2J, using a n•cluct.iou from X:tC. Th"

elegant reduction from UUW to WlJOWL giV<'Il ill [!lily~:~] dO<•s uot. work as

stated there, because Day uses a version of UFi t.hat. itl<'httks tilt' implicit. p;r11ph

in the instance and this versiou has uot IH'c'll shown t.o hP NP-c·ompi«'I.P (sc'l'

Appendix B). However, with slight. modificat.ions, this n•duct.iott <hH's work I'UI'

UUW as defined above.

The phylogenetic parsimony problems described a.lmw an• rlo~wly n•litl.t•cl t.o

the STEINER TREE IN GRAPHS (STG) and RECTILJNEJ\H STI•:INEH THEE

(RST) problems (sec Table 11 ). The phylogetll't.ir. parsimony prohh~ms 111'<' like•

STG in that the solution is drawn frotn a gr·aph, aud like H.ST in t.lw.t. this solut.iou

domain is implicit. The relationship is not exact. in <~it.her ntsP lu~causc~ nurw of

the phylogenetic parsimony problems den ned ahove iuc.ludc~ thdr· implic~il. gmphH

in their instances (cf. Appendix B), and ouly tlw si111pl<•st pllylop;c•tl('tk parsi­

mony problems are defined on d-dimensional rc~dilirwa.r spaC'c~s. Dc~spit.l' tlu•sc~

differences, certain of the STG and RST solution ;uul approximatiou algol'it.luns

[BR91, Ric8D, Sny92, Win87] can he modified lo solve particular pl.ylogmu~tiC'.

parsimony problemsj see Section 5.4 and Appendix B.

Page 69: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

STEINEB TJIEI~ IN GRAPHS (STG) (ND12]

Instance: Graph G' = ( V, E), subset .5' ~ V, positive integer /( ::; lVI- 1.

Question: Is there a Steiner tree T for S in G with length ::; /(, that is, a subt.re(~ '/'of G that includes all vertices inS and contains no more than /( l'dg(~S'!

R.E<:TJLINEAII. STEINER TRim (RST) [ND13)

Instance: Set fJ = { (a:1, Yt ), ... , (:en, Yn)} of integer co-ordinates in the plane; posit.ivP int.eger /.-.

Question: Is there a recti/wear Steiner tree T with length ~ L, that is, a tree '/' composed of horizontal and vertical line segments linking the points in P such tlmt the sum of the lengths of all line segments in that tree is no •nor<! t.han L?

Tahle II: Steiner Tree decision problems (taken from [GJ79]).

3.2.2 Character Compatibility

Each of these problems is given as input a set of d characters defined on a set of

111 obj<.•cts, and s<•eks the largest pairwise compatible subset of the given charac-

t.t>rs. Tlu! cladistic / o1·rlercd I qualitative and binary I unconstrained character

rt•st.l'kt.ions made in t.hc last section arc also applicable to character compatibil-

it.y prohlt>ms. A collection of ordered (qualitative) characters is compatible if its

<'harad<.•r-Ht.a.te sds l'an be polarized (ordered and polarized) to make the collec-

t ion a mmpal.ible set of cladistic characters [DS86, p. 225]. The five character

compat.ihilit.y problems so defined are given in Tables 12 and 13.

Earh of t.lws<.• character compatibility problems is obviously in NP, as solution

51

Page 70: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

BINARY CLADISTIC COMPATIBILITY (BCC)

Instance: Set of m objects; a colll'ct.ion C of d hinmy rladist.k chararl.c•t·s, as described by a d-by-111 chararter-hy-ohjl•cl matrix .\; and a pusit.i\'1• int.t>p;t•r B :::; d.

Question: Does the collection of rharadt•rs C ha\'t' a. rompat.ihh• rollt•ction ( '' ~ C such that IC'I ~ B'?

BINARY QUALITATIVE COMPATIBII,ITY (BQC)

Instance: Set of m objects; a collcct.iou C of tL hi nary qualitat.ivt• dt;u·adt•t·s, as described by a d-by-m chamc:tc>r-hy-ohj('d matrix X; and a. pusit.ivt• iut.c•p;t•r B :S d.

Question: Does the collection of cha.mci.Ns C hav<• a polari:mt .o11 sudt t.lwt. there is a compatible collect.ion C' ~ C such t.ha.l. IO'I '2: In

Table 12: Character compatibility dt•c.isioll prohl<•ms (ada.pt.t'd from [I>SH(ij).

sets of characters are subsets of the giveu set. of dmmd.Ns. Til<' n·dudious p;iv<'ll

in [DS86] which establish that BCC a11d BQC are NP-Itard are givc~u i11 Tahlt• 1 ~1.

The problems CLIQUE and BCC are very similar: hot.h problt•ms an~ looking

for the subset of largest size such that a particular wlat.io11 holds ildwc•t!ll c~vc•ry

pair of elements in that subset. Let /( IH! t.hc charad.t~r-pat.l.t ~l'll for a particular·

character, and /((x) be the chamdt~r-stat.t~ in 1\ of t.axo11 ;r ; for two l.iuary

characters Ki and /(i on the set of taxa .S', 1\i and /(i aw incotnpatibh ~ if aut!

only if all three of the clements ( 1,0), (0, l ), aiiCI ( l,l) art! in (/\i X 1\'i )( .'i) [ E.J M 7fi,

Theorem 2.3). By this result, pairs of characf.ers iu t. he redun~d iusl. iLIII:I~ t.laat.

correspond tv vertices not joined by an edge in (] an~ inmmpatil,)e. llmtm, ·il lY

52

Page 71: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lJNC:ONSTII/dNED CLADISTIC: COMI'ATIBILITY (UCC)

Inst.ance: Colh·C'tiou (!of rl d;ulistic dta.radt·rs dC'fiucd on a s<'t of m ohjccts; a positive iuU•gpr IJ S d.

Question: Do1•s t.lw coll<·rt.ion of charact.Prs C: have a compatible collection C' ~ ( .' sur!. that. IC'I 2: /J'?

UNCONSTIIAINEI> OtWEHED COMPATIBILITY (UQC)

Inst.ance: Collc•<·tiou (,' of rl onl!•n•d r.harad.Prs d<•fined on a srt of m oiJjects; a positive• int<•g!'l' I:J :5 d.

Question: IJo<'s tlw colle-ction of charad<•rs C have a polarization such that l.lu•rc• is" coll!Jiil.l.ihl<• roll('(·tion C' C C such that IC'I 2: 13'!

UNCONSTIIAINEI> QUALITATIVE COMPATIBILITY (UQC)

Instance: ( :ollc•d.iou ( .'of d qunlitative characters defined on a sC't of m objects; a posit.ivt• inlt'g<'r /J S d.

Question: l>oc•s t.ht• colft.ction of rharact<•rs C hav(• a polarizatiou and au order­ing such t.hilt I hl'l'<' is a COillJHitihlt• collt•ctiou C' ~ C such that. IC'I ~ B'l

Tahlc• l:i: Chararl.t•J' l'ompat.ihilily dc•cisiou prohl<'ms ( cont 'd from Table 12).

..

1 I

1 l t j .I

.'j

1 -• .

Page 72: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

C:L/Ql!E :S:!:. /JC(' [lJS8fi}

d= lVI 111 = :ld(d- I )/'2 X = [.r i.i]• I :5 i 5: d. I :5 j :5 m

X has a chilractt•r-collllllll for c•ach \'<•rl.t'x in \ ·. and t.hn•t• tCtxon-row:; for c•aclt llllordt•rc•d piiir of vc•rtin•:-; in \ : . For t•aclt t>dgc• { 11, "} llOt ill /.;, St'l. t.ht• row-c•lllriPs in col II IIIli 11

for t.hat. <'clg<' to 011, and t.lh' roW-Pilt.rit•s in column,. l.u I 111. All ot.IH'r <'lltric•:-; ill X an· 0.

H=.l

BCC <'' /3(,1(,' {DSHfjl -m ..: J

d' = d 111

1 = '2111

X'= [.r';,j), I 5: i 5: d', I ::; j ~ m'

wlwr<~ tlw I.Mm corn•sJHHidiup; t.o row:-; (111 + I) ~ i S 1111

t•xhihit. t.lw anrt>HI.t·al dwrai'I.Pt'-st.;ll.f•s of t.hc• dtanu·t.c•rs in X.

13' = IJ

IJQC <1' UCC {JJS~fjl -Ill J

d' = d m' = 111

X'= [.r'i.i), I $ i :5 rl', I :5 j :5 w'

wltt•rc• a rharadc•r's 111ost. fn•qllf'llt.ly fH'c·urritlp; sl.al.«' IH'r-orrws

that chat·adPr~s tli1Cf's1.ral stat.c• in X'.

/31 = 13

TabiP 1·1: Hc·duct.ious for cltaradf•J' r·umpatiJ,iJit.y dt•cisioll proJ,Ic•rrts.

Page 73: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,.,,ll,.,.t.iorr of painvist· culliJHt.l.ible drcp·adc·rs must. mrr<·srwrrc.l t.o a s1 :'. of vertices

i11 (,' t.lrat. forr11 a clique• [DSH(}, Proposition ·1]. c<mlplding the proof. The key

l.o till' n·rlud.iou from BCC to BQC is that hi11ary qualitatiw characters h<'have

lik'· J,irmry cladistic dwrart.1•rs i11 which the• sl.at.1• orcurring most frequently has

lw''" sl'l. l.o illlf'l'sl.ral [MrM77, Ll'lllllHt and Tlwon•m I]. This carl IH' forced by

addilll!, !.axil [I>SHfi, Proposition ~]. Till' n•durt.io11 from BQC to BCC holds hy

sirnilar wasolliiiA [J>SXfi. Proposition 4 As hiuary charact<•rs ar<' restrictions of

unm11sl.rairu•d ,.J,arad!•rs, prohh•ms UCC. liOC, and lJQC <HI' also NP-complet.c.

a.2.3 Distance Matrix Fitting

Each of t.III'S<' prohi<'IIIS is gi\'1'11 as input. a semimt•tric 011 111 taxa. Sonw pmh­

l«'rns s<'l'k ,•it.l11•r t.lw ult.ram<'tric or culdit.iv<' 1.1'!'1' that has tlw closest fit to this

S<'tllinwtric· arrording l.o that. pruhlem 's statistic; othl'rs seek the ultraml'l.ric or

addit.i\'<• t.r<'l' uf short.<•sl. h·11h; !• that. is dominant. to this seminwtric. The distance

nwt rix lil.l in~ prohlt•ms dl'lirwcl in [DayS:J, Day87, l\ri88. Kivl86) arc• gh·1•n in Ta­

!.lt• I;,, ~la11y of I ht'SI' prohlt•ms Wl'l'l' shown to lw N P-romp)C'te via redudions

from <'c•rt ai11 oft l11•ir suhprohii'IIIS gi\'l'n in Tahlc~ 1(),

As wit II tlu• phylu~t'n<'l.ic parsi111ony prohll'lllS, the distance matrix fitting

pwhlt·m~ an• in ~P ~tthj1·ct to t'I'J't.aill n•st.rictions 011 iustant·e \\'('ights. For an in­

sl ill II'<' I and assw·iat I'd solttliull L lt't W,w.rdiJ/(1, br) is th1• maxi mum difrerence

l11•l \\'1'<'11 any wc·i~ht in I and any \\'t•ight in }·'.

Page 74: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

FITTING Ui'CONSTHAINED ~IATBICES TO UI.THA~IETIW' THEES \'1.\ STATIS ­

TIC X (FUUT[X]) [X E {F,./·;}]

Instance: Sl't. s· oft/ t.axa; Hl'lllinll'trk /) E .\/11: alld a pw,iti\'1' illt.c•,~?,c'l' H.

Question: Dew:; tlll't'l'l'XiHt. anultnlllll'll'il' trc•c• (i E (111 sul'h t.hat X( /),Irf'(l l )) S

/3'!

FITTING UNCONSTHAINEI> MATHICES TO Do~JINANT lJJ.TIIt\1\H:TIW' TIIEES

VIA STATISTIC X (FUUT[X,2:]) [X E {F,./·~}]

Im:!.ance: Sl'l 8 of t1 taxa; s<•minwtric /) E M11 ; and a po:;it.i\'c' iut«',l.!;c'r /1.

Question: OoPs thc•n• l'Xist. an ult.ranwt.r·ic t.n•l' (I E 11 .. sul'h thai. X(/), Ir11 (1 1)) :.:, 13 aJHI11'p(ll)) 2: /J'?

FITTING UNCONSTHAIN ED ~v!ATUICES TO DISCH ETIZEI> A IH>ITI\'E TU 1-:I·:S \'lA

STATISTIC: X ( FUDT[X]) [X E { F,. 1·~. Jt'}]

Instance: S<>t ,C.,' of 11 taxa; sc•minll't.ric /) E i\/11 : a111l a. posit.ivc• iut.c•,f.!,C'I' IJ.

Question: Dol'H t.lll'n' Pxist. <Ill <uldit.i\'1' t.n·c· '/' E A;~ :·mch t.hilt. X(/), 1T' ,,('/')) :.:: B'!

FITTING UNCONSTHAINED MATUICES TO GIIAI'II-B,\SEI> DOMINANT ADDI ­

TIVE TREES (FUGT[2:])

Instance: Compll't.<• gt·aph (,' = ( V, E), I VI = n; :;c•minll't.ric· /) E /\/,. rlc·liuc•d o11

all pairs of wrt.ices in (,'; :·wt. of taxa ,'-.' ~ V; aucl a pw;it.ivc· illt.c•,f.!,c·r· II.

Question: Is t.hc>r<' a. suhtn•<• T of(,' that. iuclud<•s ,',' Hlll'h t.lwt. L{,.,y}O' O(.r·, !J) ::::

13 and (1r A('/')]8 2: IJ,.;'!

Tab It• 15: Distance matrix fitt iu,c; d<'l'i:·dou prohii'IIIS ( ad11 pf.(~d fro111 [ l>11yX:~, J\ M Xfi,

Day87, l'ri8H]).

!j(j

Page 75: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Lemma 6 A polynomirzl-liuu: 11011.dr:lf'l'minislir r·mnJHtlalion is grwranl n ·d /o find

all .o;o[uliou.'i }~ f.o rw in.r;lrwn I of a rli.r;lrwcc walri.r filling problun X .o;urh IIHLl

lop; w,.,wJaf!( I,}~) :::; p(lli), for !Will.(' polynomialp.

Proof: Ohsl'rVI' that all of tlwsC' proble•ms have• solutio11s in which the numher

of c•lc·r•u•ul.s i11 t.h" solutiou is polyuomial in tlw shw of the• iuslatiC<', i.e•. ult ra-

llll'l.rics llf sizP I·'W (FIJIJT[X], FtllJT[X,~]). t.re•c•s with at. most ~181- I wrtin•s

(FIII>'J'[XJ), t.n•c•s wit.h at 111osl. V•'l vr•rticc•s (FliC:T[;::::]). To complc•te tlw proof,

ol•sc•rvc· that. tiiC' costs of solut.ious }·' to i11st.m•n•s I of distance• matrix fi tting

prohll'rlls whosC' st.Ht.ist.ics HI"C' hasC'd 011 /.1 and L2 arc• 0(181( 1Vmrwlif!( I,}·'))) and

( J( 1·4..,'1:.!( w,,,IJ'rlif ,( /, )")) ), I'('SJH'cf.iwly. I

Corollary 7 A polyuomial-limr· nomlrlrnnini.<;/ir rompulalion is guara11lrrd lo

jif/(1 ull ,o.;o/uliou .... } .. fo rw ius/anr·r I of a dislntu·c mnll'it filling problem X ,<;Itch

llwl /1.\' p·) < 0(21'tl111), fm· .o;mnr polynomial p.

Eal'h dist.atJC'C' matrix fit.t.ing prohll'lll defined ahov<' has solutious satisfying t his

hound i.e•. ult.ramt•l.rifs with off-diagonal c•utrics = Dmax (Fll UT[XJ, FU UT[X,2:]},

any t.rc•t• l'Otltaininp; S surh t.hat. PVPry e•dge has \Wight Dmax (FUDT[XJ), any

\'a lid solution t.rc•c• ( FIJUT[~]). As ~olution size• is proportional to solution cost,

all opt i111al solut iutts satisfy t.hc• hound in Corollary 7.

Cot·ollary 8 A polynomial-liml' nollrlclcrmiui . .,fic compulalio11 ;,., guamntccd to

./i111l 111/ OJ1Iimal .... o[u/ion.o; nf auy ill . .;;fnncr of a di.o;fallcc malr·ir filling problem.

!ii

Page 76: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

FITTING Bt~ARY 1\IATHI<'Es To UtTHA~n~TIW' Tut-:Es nF H1m:11T 2 \'lA STATISTIC X (FBUT2[X)) [X E { F1• F.?}]

Instance: Set. S of u I ax a; :·wminJt•t. ric /) E /371 : and a posit in• i nl t'!!;Pt' II.

Question: Does th(_•rc· exist. 1111 ult.rnmd.l'ic· lrt't' (! E 1'11 ,1 such t.hat. X( D, 1ru( (!)) < 13'?

FITTING BINARY 11ATHICES TO DomN.\NT Ul:l'IL\~1ETIIH' TUEES OF

HEtGIIT 2 VIA STATISTIC X (FBUT2[X.;?:j) [X E {f1• /·~}]

Instance: S<•t. S of 11 tnxa: S<'lllinu•t.r·ic /) E /J,; and a pusit.i\'t' iult•gpr /1.

Question: OoPs t.ht~n• <-xist. 1111 ultram<'t.ri .. t.rc•c• { f E {/11 ;~ such tha t. X(D, 7rrr(U)) < 13 and 1r11(/f) > /J'!

Table !G: Auxilimy d<•rision proltlc•ms for NP-hard1wss proofs of tlisl.at~c·c· lllid.rix fitting derision pmhl<•1ns (adapt.c•tl from [1\i\l~W. Day~7. 1\riX~]).

llencc, all dislann• matrix fit.ling prohlc•ms ddinl'li al,ow arc· iu Nl'.

ProhiPm F ll \IT[ FJ] was shown l.o lw N Jl-hanl via a n•durt.iou from F B \IT~ I fo',}.

A rcduftion which PHtahlislws lllill. FBUT~[ /•'.} i~ N P-harcl is p;iVPll i11 Tal liP

I i . This r(_•dudion is adapt('d fmm a Tminp; n•tlul'l.iou from X:H! l.u SO 1.-M IN-

l~B lJT2[fd giV('II in [1\M~(ij. All iust.atl('(' of xac has (l solut.ioll if aud ouly if t. lw

graph (,'created hy t.his redud,iou ltas a VC'I't.(•x-pmtitioll into :~-VC'I'I.c•x t. l'iHIIJ!;I'"s

(KM86, Lemma oJ. It ran be showu t.ltat t.lu· ultrattwt.ric of ll' ;uirwd C'o:-;1. for t.lw

reduced iustaur<• of FBIIT[F2 ] will always ltaw sudr :1 partition if awl IJIII ,Y if

t!Jcre is all exact COV<.!I' for t}w origiua.J iusf.a11n• of X:~Cj IIIOI'C'OVc•r·, t. lw llliiXillllllll

nonovcrlappi ug (not uc·n!ssarily c•xart) wv,~r for t.lw orip;i nal i nst.;u.,-, · of X:J(:

can be easily derived frorll t.ltis ultranH't.rir. Au oplimalln•(• for auy iust.aurr· of

Page 77: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

FBII'J'~[Fd will always havP off-diagonal <•ntri<~s E { 1, 2} [l\M86. Lemma 3]. For

such all Ill t. rnnwtric tree f! = {{ {{ .o;.}, ... , {·"lSI}}, 0}, { {I 11 ... , lr}, 1 } , { { 8}, 2}},

)Pt ;,, = 11,,1 aud j,, = l{{i,j} E lrlrl;,j = 1}1, I~ p:::; 7'. By [KM8fj, L<>mmet ·IJ,

r

E L: lt!;,J - II + L: E L: ld;.j -~I 1'=1 {i,j}C/1, l~r'<r"~r iEr' iEt•11

(5)

r

+ l{{i,j}ld;,j=l}I-Eip r=l

No subpart.ition iu tlw st'Cond partition of an ultrametric (!that is minimalundl'l'

J.'1 ('iiJI group t.op;Pf.ht•r VPrt.in•s from diffl'l't'nt subgraphs G; and Gj, as the tree in

which tlu·sl' v1•rt.in•s ar(' groliJH'(I sPparately hy suhgraph would have lower cost by

t•quat.ion !j aiJOW. llt'IH'<', Pach suhpartition in the second partition must be based

011 Vl'rt.ict•s from tlw sa11wsuhgraph G0 • Note that. F1(/J,U) is minimal when the

suhpart.itiot1s in t.IH' S('l'otul partition spt•cify a partition of G (and thus individual

( ,·,.) into t.IH' larp.;c•st. possihll' compll't.<• suhgraphs. The reaclt•r can verify that the

opt.imal part.it.iun of any suhgraph Gn into c.ompletP suhgraphs under F; is either

iut.o t.ht• fom t.riaugll's

(6)

{ ·~'n ,:1, !In ,:l,l' l/" ,3,'l}' { ,1' o ,l,:l, !/o ,'l,3• Yo ,3,:J}

or t.ht' tIt J'l'(' t l'iaugll's,

.i!l

Page 78: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

plus siuglt• \'C'rtin•s drawn from tiw s<•t {.1".1 , 1 • • 1'11 ; 1 •• 1' , ,,:~}. <I<'IH'Il<lill~ oil witt•\ hc•t·

the single vcrtict•s in this st'l haw or ltm·t• not ht•t•n partitiotll'cl into(,',,. 1.<'1

thes<' two set.s of Go bt> dPtll>tt•d hy (,'.\1 and(,',: noll' that 1(,'.\11 + IU11I = ICI.

As singl(_•-vt.•rtt>x gwupings do not. alfttd t.ht• cost. of l r 1111dl'r 1"1• t lw cost of a

minimalult.ranwtrir { l is

I'

F1(D,U) - 0 + I{ {i,j} I d;,j =I }I 2: j,, 11=1

- ll::'l a( .qo,\/1 + :JI( ,',d) (H)

- lEI :l(I(,'MI + :ll('l)

A snbgraph Go is part.itimwd int.o four t.riangl<'s ir and ouly if t.IIC' cot'l'c'spollclillJ.!.

:1-sct. is i11 the maximalllolloVPrlapping •·ove-r of t.lw origillid im;l.atll'<' of X:U ~ i.P.

the clenwuts {.r.0 ,t,.r0 ,:z,.t0 ,a} art• part.it.iot!l'd into t.ltat. (,',.. llt•nc·c•, t.lw ori).!;iual

inst.anet.' of xac ha.'i au <~xact. COV<'I'if mrd only if'(,'"" = ,, i.l'. f.'r( /),II) = 11~'1--

:J(q +:JIG'!), completing tlw proor. As FBtiT:l[/•'t] is c·ompul.at.iunitlly t'<Juivalc·ul.

to FBUT[Ft], the rorr<•spoucling prohl<•m wit.h no n•st.rid.iorrs 011 t.lu•lll'i).!;ht. of tlw

ultrametric (1\M86, Lemma 2], awl its FBIIT[F1] is a rc•st.ric-t.ion of Fllti'I'IJ"d,

these problems arc also NP-Itard, a11rl thus N P-c·otnpld.c·.

Problem FUliT[F,~] can f>p shown NP-ron1pl<'l.t• i11 t1 si111ilar fHsllion. By il

variant of [KMH6, Lemma :J], the optimal tm·s for auy i rtsi.Httr(' of FB IJ'I':l[ /·~] will

always have ofr-diagoual cutries E {I,~}. As ltL -TJI = It! - 1f wliC'II d, p E {I, 'l},

F1(D,rru(U)} = .f2(D,rrlf(U)) for any/) E /3" aurl {/ E (/,.,~. Thus FBl J T~(/·~]

and FBUT2[F2] are arithmetically equival(~!lt.. As FBliT'l]/•i] and FBUT(/·i]

no

Page 79: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,/' "·' !Jn,l,'l .'lo,2,1

Fi,!!;lll'l' fi: Stnwtmc• of s11hgraph C:o usc•d in reductiotl from X:lC to FBUT2[Ft] (Fi,!!;•m· 2 rrom IJ\MHH]).

an· also f'ontplltationa.lly t'(ptivalt•nt hy a variant on (I\M86, Lemma 2], and as

FBliT! /•i] is a n•strict.ion of FlJUT[/;;], these prohlc•ms arc NP-complete as well.

Prohh•tn FUtJT[F,,;::] was originally shown to he NP-hard via a reduction

frotn a. n·st.rict.<'d wrsion of VEHTEX PAHTITION INTO TRIANGLES (Kri88].

TlH' NP-hm·cltwss of FBliT2[F., ~] can also hl' t•stablish<'d hy a reduction from

X:W nualogous to that. gi\'l'll allO\'<' for FBUT(Pt], whose proof is more iutuitivc

hc•ratlst• dominmtn• forn•s tht' partition of individt~al Go into complete subgraphs.

The• lnttc•r n•cludion will hl' usc•d in later sections of this thesis. As [I\M86, Lcm-

mas ~ and :1] can ht• modilic·d to work for tht• com.•sponding dominant ultrametric

prohlc•ms. t.hc• rt•llscming above• by which F'UUT[XJ, X E {F1, F2} was shown NP-

romplt•lt' al:m shows t.hat FllllT[X.;::]. X E { F1• /;;}, arc NP-complete.

61

Page 80: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

X.1C $~, FBUT!:![FJ} (ndaplrd fmm {A'M8r;j)

11 = :l(q + :JICI) 8 = X u { .'/n,JJ,')' I 0' E { 1, 2, ... , IC I}, !1' )' E { I'~. a}} D = [d;,1]

where D is tldlned n•lat.iw t.o a. graph(,' = (,..,·, /~) wnlpost•cl

of the union of t.ll<' graphs G..., = (\<" 8n ), I ~ n $ I( ·'I· Each subgraph Gn rom'SJ)()Ilds to an t•lt•Jilt'nt off,, C (.',

Co= {;r.o,l!;rn,·l,:rn,:l}, .l'n,{l,~,:l} E .\',and has t.IIC' st.rut'I.III'C' shown in Figme 6. Oiveu (;, tl<•riu<' /) as

ll;,j = 0 if i = j,

d;,i = I if {i,j} ~:: E,

cli,j = 2 otiU'rwisl'.

B = \EI- 3(q + :JICI)

FBUT2{ X}$~, Fl!D'I'{ X) (X E {F., I·~}) {JJayX7}

n' = n + c.p,

where c.p = t/J'ln-1 ruul t/' = I .!'in - I.

S' = S + Yi, 1 $ i $ c.p

D' = [di,j} = [ ~' ~~ l, where M = [mi,j]· mi,j = .,;, for all I $ i. $ n nud I $ j :S c.p, M' is the trauspose of M, and 1 is a S<JIIill'l! matrix with zeros on, b11t ones off, the main cliaguual.

B' = B

Table 17: Reductions for (lislarw<! matrix fitt.iug dt!dsiutl prolJit•JIIS.

(j~

Page 81: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

TIJ(' n•dudiou giveu iu [DayH7] which establishes that FUDT(X], X E {Fl! F2}

is NP-hard is giw~u in Table 17. Day requires that lSI be an even integer 2: 4

iu his wrliiort of FBUT2(X], whk!t ran I)(' ensured by replicating some Ci E C

iu tlw giw-11 iustatll'l! of X:JC. An optimal disrrctizcd additive tree T for there­

dtll'l'd installCP of FIIDT[X] c.au IH' t.ransfonned in polynomial time into a tree

rousistinp; of two suhtr<•c>s, an ult.mmet.l'ic. tree U of height. 2 on .5' attached by

au c>dp;c· of lc•up;t.h (),!) to a snht.n•c• rooted at. vertex v that is attach<·d to all vcr­

t.in•s .'/i, I ~ i < <.p hy Pclges of length 0.5, such that X( D, rru( U)) = X( D, 7r A (T))

[ l>ayH7, l'roposi t.ion :t]; morPovcr, an opti malul t ramctric for the original instance

of FIH IT~[XJ ca11 lw similarly transformed into an optimal solution for the re­

dtu·c·d inst.aun• of FlTDT[X] (Day8i, Proposition 5]. Hence, FUDT(X] is N?­

romplt•f.c•, and hy t.he arithmetic <'quival<>ncc of the F1 and F slatistics, F'UDT[F]

is N P-com pi <'f.<' as W<•ll.

A n•dufl.iou which t•stahlishes that FUGT[2:] is NP-completc is given in Table

ll'L This n·duct.iou is has(•d ou tit<' f'('durtion from VERTEX COVER to UBQCS

ilttd liBCCS giwu in SPct.ion :J.:U. F'or graph Gin an instance of FUGT[~]

rn•at.(•d hy this rc•durtion, dPfine a. calloTiicnllr'fC as a subtree T of G that contains

,..,. aud is c·umpmwd of edgc•s of t.lw types {•,ui} and {vi, ej},cj = {vi,x} E Eve·

Lemma 9 F:t•r·,·y ;,,.~tam·r of Fl!GT(~J crcalrd by the rcductio11 i11 Table 18 has

tl m i u i m al-lr 11 !If h ltu I hat i,'l ra" on ira/.

P1·oof: Lt't T ht• a minimal-length l.rt'(' of length 13 for a rt'duced instance I of

(i:3

Page 82: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

••• ... ., - ... ..,. .... ,... • ..,,.. 1/V ....,,_b•.;o.~ ··- . .... ~'""' .... _.....,...,...._,_.,.. _ _ ,..,. ._,_~- ........... ~ ... .. ,_ .. .-... - ...... ~ .. rJ<VV.,._A-.._...,...,.,.;:Jiof"l~~-'11•..,...---....,-', ......... "...,..._.......,.

VC ::;:;L FUG1[?.}

V = {*} U {Vi II::; i ~ 1\·\ .. cl} U{ f'J ll :=;j ~ l/~'\'(•1} D = [d;,j],

d(*,l';) =I d(*,Cj) =2

where• d(r,;,l'j) =·I d(v;,cj) =I if fj = {u;,.r} E /~\'(', d( v;, Cj) = :l ot.lwrwi:w d(c;,Cj) =2

8 = { *} U { ci II ~ j ~ I Ev(d} 8=1\+IEvcl

Table 18: Reductions for distance matrix fittiug d<'cision prohll'tlls ( nJitt.'d ft·om Table 17).

FUGT[~]. If T is not a cauouical tn'<', it contains oru• or lltol'l' •·dp;••s of t.l11' l.ypc•s

T' from Thy replacing each non-cauoniral edge X as follows:

that e; = { Vk, z} E Evr.:. Tlu~ fomwr rase! c.ar111ol. ocelli' lw•·attsP it would

create a tree with a length I:J' < I:J; tlw lat.t.<~l' cas<~ prorlttn•s a l.l'l'l ! '/'' of

. equal length.

2. X = { v;, Vj }: Assume without. los:-; of gmtPralit.y t.ltal. t.lll'l'l ! is aln•ady all

edge {*,Vi} or { *• Vj} in 1', and replaee X by tlw c!clg•! of this pair t.lw.t is

not in T. This cannot occur bec:ause it. would neat.(! a l.m• wit.l1 a lt·ugt.J.

(jtj

Page 83: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

IJ' < 11.

:J. X = { r;i, r;;}: Assume without los:-; of gen<'rality that there is an edge { VJo ei}

in'/'. If there is a vertex Vt E 7' such th<u Cj = {vt,z} E Eve, replace X

by the edge {vt,c;}; else, replace X by the edges {*,ut} and {vt,ei} such

that. r;i = { v,, z} E Eve. The former case cannot occur because it \\ould

nc•a.t.<• a t.re<~ with a leugth 13' < 13; the latter case produces a tree T' of

<·qual l<!ngth.

iJ . X= {vi,f~.i} :mr.h lhal Vi is nola 1!crlcx ofci: Ifthere exists a vertex Vk

iu '/' such that Cj = { vk, z} E Eve, replace X by edge { Vk, ei} to T; else,

rc!plac<~ X hy the pair of edges { *, vk} and { vk, ei }. Neither case can occur

lwcauso c•ach would create a tree with length B' <B.

Tlu~ n<•ated t.r<'P T' has the same length as T and still connects all vertices in S;

mol'<'oWJ', as '!'' contains no non-canonical edges, it is a canonical tree. I

Cauonkal t.n'<'S have several useful properties. The path lengths of a canonical

f.r<'<' '/'an~ su<'h tha.t [11'tt(7')]s 2: Ds. Moreover, the vertices in the second level of

a canonkai tr<•<' 7' correspond to a satisfying vertex cover for the original instance.

Theorem 10 FUG'/1'?.} is NP-com.ple.lr..

Proof: By Corollary 8, FUC:T[2:] is in NP. Consider the reduction from VER­

TEX COVEH to FUGT[2:] givrn in Table 18. This reduction is polynomial time.

~lort'0\'<'1', upt.imal solutions for an original instance of VERTEX COVER and

65

Page 84: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

- - - - .............. --.-·--~···"'•""'-·-~--..,. ... " ... ' '--....... , _ , • <'" 1_.,. .. .., • • - .,---·-· ........................ ,.,~~....,_~-...,_,.......,._fiMI'.I' ... J

~-. ' · its reduced instance of FUGT(;:::] can he Cl'l'alt'd from l'ach ot.ht't'. If t.ht' ol'ip;inal

instance of VERTEX COVER has a satisfying Vt't'tt'X rovc'r \''" ~ \'\'r' nl' si~c·

/(' ~ 1\, construct the canonical tt'l'l~ liukiug thl' V<'rl.in•s of \' • wit.h t.hc' \'c•ttin•s

{ ej} and *i this tree has length /\" + I Ere I ~ IJ, ami is t.lttts a solution t.o t.hc•

reduced instance. If the rrduccd instaun· of FlTGT(~) has a solut.iou t.n•c• '/' of

length B ~ 1\ + lEvel, construct lh<' canonical t.n•n T' cort·c•spollclinp; l.o '/' of

length B' :::; B. The vertex cover v· dcfitH'd hy the vc•rl.icl's in 1.111' SI'I'Otllllc·vl'l or

T' has size JV"'I = B'- I Eve I $ B -!!~vel ::::: 1\ ,; thus, \l"' is a satisfying Vt'rl.c•x

cover for the origiual instance. I

A Turing reduction from SOL-MIN-FBUT(P.] t.o SOL-MIN-FBIJ'I'~!J·~) is

given in (Kri86, Theorem ~1; howcVt'r, unlikP t.lw red! · ;,ion frotu X:J( ~ l.o SOL-

MIN-FBUT2[F1] given in [KM86], it. is not obvious how to ronvc•rl. this Turing

reduction into a many-one reductiotl. Several prohh~ms that. invo\Vt' lil.tiup; sc•mi-

metl'ics to dominant and subdomiuant nltmmet.rics using st.at.isl.ic:s 1~ 1 , J~'J., aud

L00 are shown to be sol vahle in polyrwmial time i 11 [ K riHG, 1\ riHX]; r<•la.l.<'tl pro It-

lems involving other statistics arc examined in [Day!):lJ. 'I'Iu ~ rc ·dud.iuu rro111

U UW to FUGT[> J given in [Day83] doe~; not, work for t.lw same~ t·c ~asoiiH as l>ay ':;

reduction from UUW to WUOWL (see Section :t2.1 ); howeVt'r, l.lu! fomwr J·c•tlcw-

tion cannot be fixed by using the implicit-graph versiou or ( 11 1W tldilwtl a how

because the reduction uses an intermediate problem (CONSENSUS PH.OBLEM

IN CLASSIFICATION~ which re(pJircs that. the implicit graph lw ind11dc~d iu t.lw

(j(j

Page 85: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

:J. 2.4 SumnHtry

Fi.l!,urt•s 7 and H show t.lu· various n•d1Wt.ions clc•scrilll'd ir• this sPrtion, and Table•

J !} .l!,ivc·s l.lw c·onc•sJu•wlt>llc·c· ht'I.W<'<'II prohh·ms in I hPsc· figun•s arul pruhlc•ms

dP~nilwd in 1111' litc•ntl Ill"«'. i\'olc• I lwt all IJllt. tc•u of I hc•sc• n•ductious an• c•itlH'r

''·" n·stril"l.iou or f,,\' ltrit.llltll'l.ir· c•qui,·alc•tJCP, All of llwsc• prohlcoms an• iutc·r­

l'l'rllll'ii,J" hy virtw· of lwing NP-complt'l.<•; lrmV<'\'t•r, til<' pal.t<'rn of n·ducl.iorrs in

litis rlia.l!,l'illll wiiiiH• sip;uificaul. in lalt'l' sPcl.ions of this l.lu•sis. Not<• that t•arlr of

tlwsc• n·dud ious n•quirc• only I !rat a solution c•xist that has <t gh·c·n cost. not that

a solution Ira\'«' a cosl. aho\'c• or h«'lo\\' a gin·u limit: IIIOITO\'c•r. I hP proofs of c•;.dr

of llrc·sc• n·dwfions II ~:;, II' p;i\'<' algorithms for c·otJ\'Prting solutions of cost r to

i11slarwc•s of II into solutiurts of rost fl for rc•dun•d iustann•s of II' and vkc• \'c•r·sa.

sudt that tlll'sc• 1' and 1, ar(' n•latc·d arit.hnwtirally. ThP fornH'r propc•rty. along

witlr llw rc·dudiuus for CLIQIIE and VEHTEX ('OVEH givc•n ill [G.Jin. Sertion

:L 1]. c•st ahlislll's I hat ;til mrrc•sponclinp; gi\'<'11-cost phylogc•uetic i11fc•n•nn· dc•risimt

prohh•rns il rc• N 1'-complc•t.c•. Both of t.ht•sc• propc•rt ic•s will also hC' useful in later

~wdions of I his t.lu·sis.

Tlw dc•dsion prohlc·ms gi\'f•n in this sc•c·l.ion do not answer questions typically

askl'd hy sy~tc•mnt.ir hiolop;ists. and arc• thus .. ,>t n•le\'ant in themselves. However,

I he• N P-rompl<'t.c•ru•ss n•stdt ~ for t hPst• prohlems do suggest that fast i.e. polyno­

mial timt• al~orithms do not t•xist. for tiH•se probl<'ms. and that efforts should be

()7

Page 86: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• ' . ' .. .. ' ' . • .. • • ~~ ~- . .. ¥ .. . , ¥ . . ..... . ... . ... . . . _

CLIQUE ~ H('(' I'

I:('('

t r BQ<' 11()(.

~ tlQ('

a - F I I J>'J'[F]

/ 1'(:11'1'[~]

~ , ~ ~ tJB{C,Q}CS, IJBW V~:.H I ~~X \r COVbH -..__, r

,., UB{C,Q}DI ~ l lB<:I'

""' /r ~ UB{C,Q}CI

Figure 7: Reductions amoug phylogmwtir. inf(~f"eJH'<! dP('.ision probl<'lliS. H<•rllw­tions n $f,1 fl' arc deuotcd by arrows from II to II'. Arrows uwrk<·d by a and r· correspond to rcduct,ious by arit.hmdic: <:quivaleuce and r<!sl.rict.iou, fi!SJWdiw·ly.

Page 87: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

I Problt•Jll

Tlll'sis Lit.l'I'Htllrt•

l'l1ylop,c•npt il' t!B{C.Q}CS {C,Q}CS (D.JS80] I' a rsi tnouy l!B{ C,Q} Do {C,Q}DO [D.JSH6]

IJB{C.Q}CI {C,Q}CI (D.JSHfi) l!BW SPQ [GF82, DayS:Jj lltr\V SPP (GF82, Day8:J] Wl TO\VL WTP [DayS:Jj

( ~haradPJ' B{Q.C}C B{ Q,C}C [DS8fi] ( ~umpat.ihility lf{Q.C}C U{Q,C}C [DS86J l>ist.anr·r· !\latrix FB UT[Ft] biJJCt [I\M86] Fit tin~ FBl1T2[Ft] 1'IIIC:d [1\M86J, Llft (KriS(i].

FUT[I] 1Day8i] F B t 1'1':! [ /·~] ~H [1\riSfi], FtTT[2] [Day87) Ftl t JT[Ft] ~~ t [1\ri86], 1//Ct [1\l\'186] FlllJT[F2] 6.2 t [I\ ri86J Ftrt!T[J.' >] "- P·l [Kf'i88] FtJDT[o], o E {F" F1} FAT[n], a E { l, 2} [Day87] FUGT[>) AET [DayS:J]

Tahh• I!): ( ~orn•spotul«•twt• 1-!'l.wc<'ll phylogenetic infNt'IICc problems in this thesis and pi'Ohh•IIIS i11 t.lw litr•ratm·t•. All solution problems are marked with daggers ( t ): ;dl ot.ht•r prohlt•ms M<' dt•cision problems.

69

Page 88: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

•" -..~ ' ~- ~--. .. ·~ · · · · "•"''' ••• ~ .. .,_.,.. -·u ••• , .. ~ ......... --.: .,...,.~,...._..,..

WIH'X \\'tt< ·x

WBQX Wt tOX

PBCX t rt l('X \\'t lQX

lTBQX (I POX

~!IIIQX Figurt• 8: Hestri rt ion rt•d uct. ions among p hy lop;1•nt'l. ic parsimony dc•c· is ion pwlt· lt>ms. Tlu•sp prohlt>ms an• st.at1•d !'l'htt.i\'!' t.o a phylo_gl'tlt'tir parsimony nitPriou X. NoLl• that t•ach prohlc•m aho\'1' is Hlso linkt•rl hy rc•st.t'il'l.ion rc·cludiolls to l'adt of its four corn•spotHiing rPtirulat.c• prohlc•ms (sc•c• Tahle• 1-l).

focused 011 t.ill' design of polynomial-tinw appt·oximation al,gmitl11ns whirlt p,ltill'·

anke solutions that. ar1• rlmw t.o optimal (DayH:J, GFH:l]. Th<'SI' n•dudiuns n111

also he used to ddenni nc• t.he C'Otll pul.at.ional hard nc•ss of mort' nu 11 ple•x prohlc•rns

(Section 'l) and to plaw limits on 1.1H~ kinds of polynornial-t.ittll' approximations

that can exist for pllylog<~llt'l.ir. iuferetH'I' prohl<·ms (Sed.ion G).

70

Page 89: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

4 The Computational Complexity of Phyloge­

netic Inference Functions

Ill t.lw last sc·c·tiou, n•rl.ai 11 dc·dsion proi,IC"ms assoriatc•d with \'a rio us phylogc•ut"tir

iurc•rc·rwc• nil«•ria wc·n· slrowu t.o be• NP-compldc•. By a folklor<" !'<'SHit iu tll<'o­

rf'l.ical c·ornpul.c•r ~wic•r~e·c·, c•ach of t.IH• rorTc•spor~eliug solutiou prohl<"ms is solvc•d

l1,\' a furwtiorr in F tnVI' [C:.J7!), Chapt.c•r !i}. llowewr, this says littl<• about the

hardrwss of IIIOI'I' wrupi<'X prohlc•ms lmsc•d 011 tht•sc• crit.<•ria. In this S('ction. I will

dPf'i\'c' Vrtl'iOIIS bounds Oil f.ht• COillplc•xif.i<'S of th<• ('\'aluation, SO)utioll, :o;pa11ning!

<'IIIIIIIC'r·;tl.iorr, awl rmulom-gc•ru•rat.ion fuurtions ilssoriated with the opt.irnill-rost,

,f.!;i\'l'u-wst., arrd givc•u-lirnit. wrsions of the phylogc·rl<'tic inf<'l'ellet' problems.

The• n•aclc•r should l'!'lltelllher that rt•sults given below for phylogenetic parsi­

IIIOn,\' and distance• matrix fit.tiug given-cost and given-limit problems apply only

to t.hos<' prohlc•ms in which tire cost-parameter ~: satisfies the restrictions given

111 L<'lllllla :J awl Corollary 7.

It will ofl.c•n lw convc•uient hdow to have a single binary encoded representa­

t.iou ( t·arwuiml rqn·n•cnlafion) of each solution to a problem, so that individual

solutions arc· not output more than onct>. Such representations exist for all prob­

lt•ms t'Xamirlt'd in t.his thl'sis:

• ('/ramdc'l' Compnlibilify: Reprt>scnt characters by character-state adjacency

mat.rin•s whosl' charart.er-stat<•s are in instance input order, and represent

71

Page 90: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• • • · • • - • • • •••• ·~·, ... _ ,_ ..... , ........ ..,. ., ..,..,.,.., ___ , ..... " ,.. .. .... .... ,..,..,... ..... _ .rn.._..,_ ..

sl't:-; of chamctt•rs by such matric·Ps in instann• input orclc•r.

• Phylogn1rlir l'nr ... imony: Ht•pn•st•ttl charadt•rs as al.u\'t'. mull'l'Jli'I'St'lll t I'C't'l'

hy \'t•rh·x-<ulj;u·t•ucy mat rin•s whost' \'t•rt in•s an• in lc•xicop,raphic· orclt•r n ·l­

at ivt> to dwractc•r-stalt• iust.atlt't' input orclt•r. Ht•l kulat iotts art• slurP.! in a

!-iC'parat(' list. hy :-;omn'·\'t•rtt•x st'l. lt•xicop;raphic ot·clc•r n•lati\'<' to dlill'ilt'lc•r-

stat<• instann• ordt•r.

• Di ... lancr Malri.r Fillin!J: Ht•pn'St'llt ultrarut'l.ric:-; l.y tlll'ir mrrt•spor11liup; ul -

tranwtrir mat.rin·:; whost' \'t•ttin·s an· in input. iust all<'<' orclt•r. lkpn·sPIII.

athlitiw trl't'S hy t.lu·ir iuonlc•r l.ra.\'t'rsal St'<JIIt'tlt't'S [St.aTHO, St•c·t.ioll al, wlll'l'l'

tht• tn·e is rool.t•d at t.ht• lt•ast. \'t'rl.c•x in iupul. insl.allt't ' ordc·r a11d t.IIC' ldt ..

right ordt•ring of suht.n•t•s is n·plar('d hy an ordt·rinp; o11 t.lt~ • "asis of t.ltt•lt·;•:;l.

Vt'l'l<'X j II a Sll ht I'C't'.

ThPst• ranonira.l rt~J>ri'S('JII.ations ran IH• t•nt·odc•cl and clt•nHI<·rl in polynotni;d l.illlt'

using slightly modifit•d stand<ml algorithms [St.aTHO, St•c:t.ion a]. i\11 'I'M solving

problems <'Xamiued in this st•dion will IH~ assiiiiH'tl t.o OJH!I'ill.<~ on ami to output.

canonical l't!prcsentations.

4.1 Fanction Complexity Classes

This section will give a brief overview of some fuuctiou da.ssc!s t.lmt. will IH! IISI'fi

below. T hese classes fall mainly into two regions ·- wit.hiu FJ'NI' aucl wit.hiu

7'2

Page 91: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

FI'SI'J\CE(poly). Till' rdat.ious lwt\wr•tJ all dass<•s <l<•fitwrl in t.llis section an•

sllowll iu FigurPs !) aud 10.

1.1. 1 Clnsses Within F fJN 1'

Tlwn• an• two )Jil'l'ill'l'llir•s of iutt•n•st. witllitl ppNl':

~. Tlw Opt.PIJ(11 )] hi<•rarchy, wlu•re f is smooth and f E O(poly) [GKR9~ ,

1\rc•HH]: For polyuolllial-tinH' NTM N E FN l~q, let O]llN (.r) be tile opti-

nwl valuP (ln.rg<•sl. for a m<tximizat.ion prohh·m, smnll('st for a minimization

prol,lt•rn) n>lllpul.<·rl hy N for input ;1'.

Definition 11 (adapted from [KreSS], p. 493) A funrtion f : E· -t

Z i.-: in OplfJ (oJJiimizalion polyuomiallimr) if lhcrr. i." a polynomial-lime

NTM N E FN /~? such thai. f(;r) = optN (.1:) for all.r. E E·. We say thai f

is i11 Op//)[::(11)} iff E OplP a11d /he lcnglh of f( :c} in biuary is bounded

by :; ( l·r I) fol' all ;l' E E•.

'l'lllillgh all Opi.P[f(n)] fuuct.ious arc contained in F pNP(_((n)J, each function

i 11 F JJN /'[J(u)J lll<'trically l'<'d uccs to some Opt P[f( n )] function [I< t'e88, Thc­

Ort'lll :J.:!(i)]. Thus, all OptP[l(n)]-romplcte functions are also ppNP[f(n)]_

romp I 1'1 l'.

Page 92: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

~ .•• , .. . ..... . ~ . ...... .. ..... ... ~ .. .... . · - .. . . . .. . ... .. ~ ....... -..1!-" ......... ~ ......... -~·-·' ., _ _, ........ _ ...... - - ..... , .,_ ,.,._,....,..

Not<' that nwt ric n·dul'! ions can st r<'tch input hy a polynomial arnuunt.

Out> consl'qnc•un• of this slrl'tcltiup; is thnt a fuur1io11 that is wmpldt• fur

OptP[f(u)j is also compldt• for Optl'[.f(11°( 1l)] ((:1\H!l:.!, p. iJ. It j., 111ost

uoticeahlt' iu t.lu• tlaJJll's for rt•rt.ain classt•s dt'litwd usiup; hi,l!;-0 not at ioat t•.p;.

Opt. P [ 0( log log 11 )j is 111on• propt•rly \\'l'i t It'll as 0 ptl'[t- log: lu~ 11 + 0( I ) J.

Th<' followiug class n•l<•t ious art• known:

• For PVt'l')' smooth fuuct.iou J,

- For j(11) $ ~ logu, FfJNI'[f(u)-l) C /t'J'N/'(J("llunlt·ss I'= NP [1\rc•XX,

Tht'ort•m '1.:>.].

- For /(n) $(I- t)logu, c E (0, I], Ff'N/'[f(u)-l) C FfJN/'[fl,llunlt•ss

P =--: NP [Hei88, Tlll'or<•m :ll J.

For /(u) E O(logu), ppNI'(J(n)-t] C FJJNI'If(,)J uukss ~!; = II!;

[ABCWI, ThPorem '12].

1 ppNP(O(Ingn)) C FfJNl' lllll<'SS P = NP [J\rt•XH, Tltt•orc•flt'l.l] .

., ppNl'(O(Iogn)] C FP1f~' uuless R = NP [Sd!JI, Tltc•orc•nt l:l] aud Fc·wl' =I'

[Scl91, Corollary 4(i)].

• FP1fP C ppNP ifaud ouly if pN/J(O(IugH)] C J'NI' [Sd!JI, Tlt1~orPIII 1].

Other separations hold under more exotic ass 11m ptious [ Bc!iHH, Bd!JI]. H.I!St!a.rdt

to date has focused on all classes below FfJNI'{O{Iuy,nJI aud t.IH! class FJJNI' .

Page 93: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

'J'II(JIIJ!,h rnauy n•:mlts haw IH'eJJ i111portcd directly from .language-based to function­

lmsPd rlassr•s, t.IH'I'i~ haw· IH·«·n sonw rwtable surprises, in particular the non­

r•quival«·rwc· uf Fl1

1f 1' aJJd FPN/'[O(Jugn)) and tlw separation of ppNP[O(log n)),

l-'11lf1', illlll FI'N''.

Mr•trk n·dudbilit.y suffirl's to show harduC'ss for most siugle-valued fuuction

l'lassl's. Fund. ions ar·c• shown F /1

1f1'-hard via the property of lmddabilily [CT!H ,

<:asHf;j. Hl'ntll that all prohll'ms an• hasl'd on relations R: I x 8 011 iustauces

I a11d solul.ioJJs ,C,'. LPt. SOL-II(:r) lw the s<'t of solutions for au instance .r of a

prohh·m II,

Definition 12 ([CT91], Definition 4.2) A problem 11 i.'l paddahiP if there is

11 pail' of pol,tJIWIIIitLI-IiiiH' funrlious lt.1 : '2 1 --+ I and h2 : 21 X ,C..' --+ 2"' Sllrh /hal

for· all jir~ilr ,<;f'!,o; {.r1, .r'l, ... , .rm} E '21 and all siuglc-valucd funclious f thai solJJC

II. if .r = lr.((.r,.r:.!, ... , .r"')) lhru h2((.r,,.z·2, ... ,:rm),f(;r.)) = (Yr.Y2, ... ,ym),

,,,,,.,.,. y, E .WJI..-ll(.r·;) for I S iS rn.

Pacldilhilit.y was dPiiH<'d implicitly in [Gas86]. Gasarch realized that if instances

of a paddahh· prohh•m n ('(\II f'JICode an X -hard problem then n is F P{- hard.

If X = NP, paddahle probll'ms are ppNf'[O{logn})_hard [Gas86, Theorem 8] (the

I'HSI'IItial idt•a is that a. hiuary O(log 11 )-depth NP query tree contains at most a

polynomial numher of NP queries). Chen and Toda defined and named paddabil­

it.y indt'IH'Ildt•nt. of Gasarch's work, and stated their results in terms of F P1fP­

Imrdm•:-;s.

75

Page 94: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

/ .. . , .- . . - ... ~~ , ..... .. .... ~ ... ~- · ·•- . ... .... .. ~· ........ ~ .. ···4~· ··• , ,, ,,, ... ·- ·· .. ., .... , _,.._,,,.. .. ,_,,..."',...,.. . ..,..,.,.......') ~t·.,,_...c....,• -.·~r.-.. ....... ,.. ,_ .... .,.~.,._,. ....

Theorem 13 ((CT91), Lemma 4.1) Lr/11 bt' a paddaMr lll'o/Jirm ll'ho.-;t' as.o;o.

cialcd decision problem Lu i.-; NP-Iwl'd. '/'hn1 ll1 i.-; F/'11''1'-hard.

Proof: (sketch): Dcfint• function Qu(.r·,, .r··z, . . . , .1' 111 ) =

NP-hard, any function in F/'1f 1' l'CIIl ht• :wl\'<'d usin)!; a sinp;lt• call to ({II; ho\\'t'\'t•r,

as n is paddablt•, any instann• of Qu f'illl IH' soh•t•d IIHillp; II siup;k call to any

solution function for n. I

Their interpretation is the mon• powc•rful iu light. of St'lma.n 's l't'slllts showiup; that.

F'P1j~' is int<.•rmediatc in hardtH'SH hPI.W('t'll i"J'N/'(O(InKu)J mul FJ'NI' (s«'t' ahovt•).

The following variant of paddahilit.y will I)(' mwfnl hdow:

Definition 14 ;1 problr.m. n is paddahh· with rt'SJH't't. t.o a. problc•((l II' if 1/nr·t·

. . 1 l . l I . J I . I " 1 I I l I . J I I .... •I .;I zs a patr o po ynom.za- .1/1/. r: rtw: .wu.o; '·• : ~ --+ tlllt l·z : ~ X •"' -• ~·

such tha/. for all finit e sd.<> {:r.'1,:l:1'l,···, ;':'m} E '2 11 and a/l .o;intJlr-mllunl fuudiou ,.,

Theorem 15 If problem II is paddablr: wil.h. n :spt:d /.o tlll N 1'-/uml pmblt·m II'

then fl 1 is F ljf 1' -hard.

Proof: The proofs by wbich tbe analogous result bolds fur padrla.hlc! pro!Jir!IIJS

([Gas86, Theorem 8]; Tbcorem 13 above) requin! ouly that CJ11 lu! llil .. 'it!d 011 soiiiC!

NP-hard problem, which ucecl nut he /-'II· I

7()

Page 95: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Theorem 16 If proiJlr:w II 1//,r:/rimlly 1'((11lC(; ,<; to pi'Oblcm fl 1 and problem n i!i

wuldniJ/r: with n :.'iJu·r·t to a problem II", then problem II' is padrlc'•lr: with rr.spr.cl

lo II".

Proof: By dPfinition, t.hen~ c~xist polynomial-time functions h1 and h2 such

that for .r = { .1:" ... , :r. 711 } E 'l.f, y = {!Ill ... , y11J E 25, and any single-valued

funrtiou f that. solvPs II, (y) = h2((:l:),J(h1((:r)))), and functions Th T2 , snch

!.hat. for c•vNy siup;hvahH'd funct.ion g that. solves fl 1, there exists a function j

t.lwt :-mlves II such that. .f(:r.) = 72(;r.,g(1'1(x))). Define functions h't :21 --./9 as

h' 1({.r}) = '/'t(h.t((.r))) a.IICI h'2 : 21 x 8.q-+ 2s as h'2((:r.) , y) =

h'J((.r), '/~(Tt(ht((.r))),y)) surh that (!I)= h'2((.r),g(h'1((a:}))). Functions h't and

h''l an• polyuomial t.inH· and show t.hat. fl' is paddable with respect ton". I

Not.<• tlwt. pnddahilit.y as dditl<'d h<'n' is distinct from paddability as traditionally

dPiitH'cl in romp11t.ational compl<'xity tlwory ((RDG88, pp. i4-75J; [BDG90, pp.

122 I '2:1]).

Four class<•s of nmltivahu·d functions will also be used below:

• NPJ\IV = FNP and NPM\Ig == FNP9 [Scl9l].

• N f> M V o F pN 1' is the set or all partial, multi valued functions that are

COiliJHlt(•d by polynomial-time NTM transducers that are allowed to ask up

to a. polynomial number of adaptive NP queries before nondeterminism is

in\'Oked.

Page 96: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• NPMVq o ppNP is the sd, of all fnlldions f E NI'M\' o 1-'1'·'11' snrh that.

the nondeterministic phas<' of t.IH• rompulat.ion is rcst.ridt•d toN I'M\;,.

• (NPJ\1\1 o ppNI')9 is t.ht> set of <t\1 functions .f E NJ>M\" o F/'Nl' :-;uch

that. graph(!) E P.

The NPMV-composition dass notation is adapl.t>d from t.hat. in [FIIOS!I~J. V;diant.

noticed that all solution prohlc•ms assoria.t.c•d wi t.h N P dc•cision prohh•nls a l't' i 11

NPMV9 ([Sel91, p. •I); [Valiu)). Class NI'A!\~1 o FI'NI' is usdul lll'causc· it.

corresponds to those solution prohlc•ms whost' associa.l.c•d dc•cisiou and <'V11.luat.io11

problems arc in NP and OptP, respectivc•ly. 'l'hc• followinp; f'lass rc•la.t.ious an•

known:

Lemma 17 ([Sel91], Proposition 7) Iff E F/'N/'IO(Iu14n)) and !/l'fiJJh{f) E I'

then f E: P.

Proof: Implicit iu the proof of [l<reH8, Tlwon~111 t1, I j. I

Corollary 18 The following hold:

1. (NPMVoFPN~').'1=NPMVy.

2. NPMV9 ~ NPMV, NPMVy o I•,PN1' ~ NI'MV o /•'/'N 1', N/'M~, C

NPMVy o ppNP, and NPMV ~ NPMVo l"f'N1' .

.9. N PM V ~c F pN I' [.'ielrJ 11 fJ. 1 0}.

78

Page 97: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

4- Ff'N1'1°11"11."ll C N PMV if and only if NP = co~NP {.S'e/91, Theorem 4,

/'art lil} .

. '"}. F f'N /'(O\ln,u)J C N J> M Vy if aud ouly if P = N P {.S'c/.91, Theorem 3, Parl

27}.

fi. N flM~1 ~r F/'N/'(O(Iogn)) if and only if P = NP {Scl.91, Theorem 3, Parl

W}.

7. F/'NI' =c N P MV,, 0 ppNI',

H. NPMV,, o ppNI' c NPMV if and only if NP = co~NP.

Proof:

Proof of (I): The left wards inclusion is trivial. The right wards inclusion

follows l1y this simulation: for any machine M corresponding to a function f

i 11 ( N I' 1\1\1 o F pN 1')!/1 nondctcrministically guess all possible ·1equences of NP

qtu•ry answPr:-~, compute nondetcrministically relative to these queries, and accept

a rompllt.(•cl out.put if it is is valid (which can be checked in polynomial time, as

gmph(.f) E P).

Proof of (.'1): Follows from the prefix-srarch technique (see Section 4.3).

I'I'Oof of ( 4 ): Th<• ldtwards implication follows from the collapse of the Poly­

tlomial II it•nu·chy. Til~· right wards implication follows because the characterisLic

fut~rt.iotl for any language in co-NP is in F pNP[tJ.

i9

Page 98: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

_. .... "'>'IIC_ff ... "fii'J"t_..,..._,..,...,,....,._.,. ... ..., •.• , . _,, ... ,,,.,...,,. ....... "''l'"•·-,.....,....~"'""""""~1'~--..,..-.... ""N" .... 'IIlHO, .... ~~;._~

Proof of (5}: The lcftwards implication follows from lltt• rollapli<' of l.lw Poly-

nomial Hierarchy. The rightwards implication follows !'rom Lt>mma I i.

Proof of (6): Similm to proof fot· (!i).

Proof of (7}: Follows from definitions itnd pr •mf for (:l).

Proof of (8}: Follows from parts ('I} and {i). I

The major relations that are still OJH'Il an• N I'M \';1 ~ •• /•' lj\'11' [Sp)!)J, p. '2:1J,

NPMV ~ NPMV" o ppNP, and NI'MVo FfJNI' C NfJf\J\~1 o Ff'N1'.

Note that any optimal-cost. solution prohl1~111 ran IU' simtda.t.<·d hy askiltA t.lll'

number of N P queries rcqui red to dderm i 11nthe opt.i mal cost. { St'l' S1•cl.iou ·1.1) mul

then using this cost as the input to the <'orrc•sponding giwu-cost. prohll'lll. ll<·ru·l·,

if enough computational power is availa.hle, auy fuudiou inN fJMV, o FJ'NI' rart

be reduced to a function in N PM V,., i.e. t.hn given-cost. solution prohl!!lll. This

simulation will be used in many of the proofs giveu hdow.

4.1.2 Classes Within FPSPACE(poly): Counting Clnsses

The classes of interest within FPSPACE(poly) hdoug to tim!!! 11i1~rarr!lil's of

C(lll!lting classes, which are based ou two diffcreut 111odes of r:ouutiug s•·,HI.ious.

For a polynomial-time NTM tramHJuo·.•· N E F'M P II, ld #N (:r.) lw t.lu! IIIIIIIIH'r

of accepting paths i.e. the total ttumbcr of solutions, aud Spill IN (;1:) lw t.lu~ 11111111wr

of different solutions computed by N 011 input :r..

80

Page 99: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

pp,NI' II

/ F pN P[O(l()gll)]

/

Nf'l\!Vo Ff>NI'

/ ' c· N Jl M \·;

1 o J.' /'•\' 1'

c/f /

Figure 9: Fu:1ction cla:-.;;es within F f>N 1' ( adapi.Pd fr<HII Fi,!!;lll'<' I of IS«'I!II ]). Inclusion relations arc denotc•d hy lltllllark<!d arrows ami ndiJJI'IIIt!lll. wlat.iotJs by arrows marked with c. Certain rdationships f.!tat. a!'<! not. markc·cl an· possilol«'; see main text.

81

Page 100: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

I. 'J'Iw #-llic~rarc:hy, #I'll. whmw hll levd, #(FE~), k ~ I, is th<' class of

furwt.ious .f(.r.) suc:h tht~t. f(.r) = #"'(.t) for sonw N E F'£.~. This hi<'rarcby

is <'qtJiwtlc•rJI. to tlmt dc•fined ill [Val79h] oil classes irr PI! instead of FPH.

~. Tlw Spau-IIi<'r<ll'«'hy, SpauPII [1\STS!J], whose kth level, Span( F:St), 1.: ~ I,

is tlw l'!m;s of fund.ions f(.7:) such t.llat f(:t:) == Spa11N (:r.) for some N E FEr.

:L Tlw 1'1111 rt.io11 bonne h•d # P qu<'ry h ierar•hy, F p#l'[f(n)).

LC'I. t.lw first. awl :wcollrllc~v<'ls of #PIJ (SpanPII) he written #P and #NP (SpanP

and SpanN P), and defi liP. llardlless of functiou s in the classes of these hierarchies

rc•lat.ivc• to nwt.ric: J'(•dudbilit.y. Th<' followillg class relations are known:

Corollary 19 '1'/H· followiu_q hold:

I. #I'll, ,','pan/'11. and pp#~' ~ FPSPACE(poly).

::!. F/'11 ~ F /'#/'[I] ['I'W.fJ:J, Thrm·rm ,5, 1} .

."I. If l'ilhrT #P ~ F/'11 or FPII ~ #P then PI/ collapsc.c; l.o a finite lcvrl

{'/' W.9::J. eoi'Ollni'JJ 5. 7, Pnd I}.

.f. #I'll~ Pf'#l'[l] {TW.fJ:J, Throl'cm 4· 1}.

,r, , For J.· ?.: I, #( F~D ~ Spall{ F'f:.rJ [KSTH.9, Generalization of Pmposilion

.f. 7}

fi. for~· 2: I. Span( F~t) £:: #(F'f:.r+t) {1\'81'8.9, Gcncrali: ation of Proposition

.f .• '"').

Q•) l)-

Page 101: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

1. For· ~~ ;::: 1. #( P'i.D = Span( F'i.n if and only if l : ~r = ~r f/,·s'/',"'.'1.

Gcnrraliznlinu of Thrm·cm 4..CJ}.

8. Fork ?.: I. Span(P'i.n = #(F~r+,) U' and on/11 if ~r = 11r f/,·,,,.-/'S.'J.

Gcnr.rolizalio11 of Thrornn 4.1 1}.

9. #PI/= SpauP/1.

Proof:

Proof of (I): Any polynomial-t.inw NTt\·1 <HT<•pt.or or lransclun•r ran l11• silll­

ulated in PSPACE [BDG88, Tlwon•m ~.X(h )]; h<'IH'<', hy n•sr•rvinp; spil•·•· fur

constant 1.~ + I such sinmlat.ions, any P'::.r conlput.ation can l11• siiJllliat.<·d in

FPSPACE. A countt>r of accepting pat.l1s ra11 II<' al.t.adu•d l.o any such sinlldn­

tion to cakulat<' #N(:r); to calrulatP SpanN(.r·), count o11ly tlio:w ;wn•ptin,u; paths

whose output. valu<'s have not. IH'<'Il <•rwollnl.l·n·d hdorc· in till' sillllllll.t.iorr i.r·. r•·s­

imulate N up to the current (U'f(•pt.ing path. As tiH· output. or any rulll't.ioll

in these hierarchies is polynomially horuul<•tl in t.lw l<•up;t.h of tlu• inp111., t.lu•s••

hierarchies arc in FPSPACE(poly).

Pmof of (5 - 8): The proof for (G) is a :-~tmight.forwanl uwdirintl.ion of !.hat. in

[KST89]. Lemma ,1.:1 in [KST8H] ran hP n•st.at<•d in I.<'I'IIIS of ora.dc•s A, A' E ~r if

tine tl in lhe algorithm on page :H>7 is deld<•tl, aut! t.lu• nuulit.iou "or (3i(q., z,) rj

Br' is added to the dC'fiuiliou or ORACLE Oil pag<~ am~. llsing t.lris 1<'111111<1, it. is

easy to prove gcncralizccl versions of P ropo:-~ition t1J> anrl Corollary 1.fi i 11 IJ< STH!J],

from which (6), (7), and (8) foltow. As}:~= 11~; impliPs t.lrat. #PII = #( VLt+r),

8:~

Page 102: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

tlw lr.ft.w<trds portio11 of (H) impli(•s t.h<• stronger rt•stdt that Span(F~~) = #PII

ll\oii!J~].

!'roof of (.'J}: Follows from (!i) and ((i). I

Tlu· COIIIJtiug fuuctions of intc·n·st in this I h<'sis (IJ'(' Clll in dass Spall ( N p 1\1 vq 0

F /'NI' ), wld('h is i11 the low <'11<1 of SpauPII. Til<' following class n•lations an•

klloWII:

Corollary 20 1'/n followin,q holrl:

I. S7uwl' ~ Span(NPMVo ppN1').

:1. Span(N I'MV o Ff'N1') ~ #NP .

.f. Span(N I'M\~1 o ppNI') ~ SprwP if aud o11ly if NP = co-NP.

Proof:

l'roofo; of (I · !:1}: By d<'rinition. As SpanP = Span(NPi\fV), relation SpanP

~ Span{N/'M\~1 o FJ>N1') is OJH'Il iu part hl•cause relation NPMV ~ NPMVgo

l·'f'NI' is opt•n (s<'<' St•cl.ion '1.1.1).

!'roof of (:J): ( 'onsidPI' a NOTM i\1 which computes a function f E N PM \1 o

F/''\'1', and ll't. MNI'.\1\ ' he t.IH' marhim• iu N Pi\1\1 invoked in the second phase

of t.ht• conlpllt.at.iou of M. D<'fine the following oradt• on input .r and output y

for ,\/.v 1'.\1 \ ·:

Page 103: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

~ .. ... .. ~ . · ~ l --. ...... .,. . ._~,·-- ~ ... ... . . ••4••' - ....... . - ......... ~. ~ ......... ..

A(.r,y) = {Th<'rt' is a romputatiotl path of .\ls1•.\f\ ' on input .1· that prudllt't'S

output. .11}.

Orarll' A is in N f'. Considc•r till' ~OTl\1 ,\.41 whirh rotuputc•s a fu11diuu !I iu

F~~: M' guc>sses an output !J of M.v/'.\1\ ' • pc•rfor111s the' initi;d FJ'NI' phnsc· of

th<' computation of Jl/, formulat.t·s input . . r to MNI'.\1\'• and usc•s a siu).!;ll• call to

oracle A to set' if MN/'1\1\ ' on input. .1· outputs y. If the• atts\\'t'l' to oracle• A is

"yes", 1\1' outputs y; dse, M' r<'jPrt.s. Ea('h distinct. out.put. of M is prculnn·d ll,\'

JH' exactly otiC('; hence, Span(M) = #(i\1').

Proof of (4}: The proof of th<' l'ip;ht.wanls part. is a \'ariaut. of that. for t.lll'

right wards part of [I\STS9, Thc•on•m ,1.1 I]. Lt't. 1- I u• a lilllp;llil).!;t' i 11 N 1'. 1>1'1 iII«'

machiue M iu N PM \1.? o P fJN 1' which asks a sinp;l<' cpl<'st.ioll t.o t.lw oradP i11 N P

for membership in L, and out.put.s '' I" 011 a.ll c·olltput.flt.iuu pat.IJs if I.IJc• orcwiP

rejects i.e. input .r ¢ L, and otherwise' has no an·ppt.inp; COIIIJIIII.al.iou. Ld . .f .....::

Span{M); note that .f(:r.) > 0 if aud only if ;r ¢ L. llowPvt•r, hy hypot.lwsis, .f is

also the Spall function of some marhilu~ i11 N fJM V. /\s t.ltis urarltiuc• I'OIIIIHIIc•s

co-L, L E N P and NP == co-NP. To prov<' t.h(! lc.ft.wards part., uoh• thai. if NP :::::

co-NP, then SpanP = #NP hy Part (H) of Corollary I!J abov(•; t.IH' waut.t•d f'<'sttlt.

then follows from (1), (~), aucl (:l).

I

Note that by the results of Corollary I!J, ('V<!IJ though #P is c·orrtaitwd i11 Spard',

the two are of equal romputat.ioual harchwss, i.(', (~V( !ry fuudiou iJ, Spall I'

Page 104: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

t F p#l'

~ I I I I I I

F fJ#I'f'lJ

t p p#l'(J]

~~ FIJI/ #PI/ OOO!"E:---~.,._ 871auPJI

~ ~ ~ I I I I I I

r l I I I I

FIJ'::.l' #N I' S'pa11 N P

t t~t /r' I) ---~ Sprm P

Fip;lli'P 10: Funl't.ion dassPs wit!Jill FPSPACE(poly): couutiug classes.

mdrintlly n•dt~n•s to a fuuction in #P; this parallels tlw relationship between

Opt.l'[.f(rl)] and P pNl'{f(1I)J.

4.2 Evaluation Functions

1\hu·h of t.IH· t•arlr work on l'\'aluation problems focused on decision problems

thai approximatt• t•valuation problems; sPP (Wagl\87, Wagl\88, \Vagl\90] for a

n·vi1·w of I his work. Two approaches to directly determining the complexity

86

Page 105: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

of t'\'aluat inn prohiPIIlS iu\'ol\'t' usin~ pacltlahility aucl till' Opt P llic•rarc·h." (st•c•

St•ction ·l .I I). \ 1 ppt•r hmnHis tlll prohlt•m romplt•xit,r wit hiu I ht• fuurt ion ilouuclc•d

i':P <flll'I'Y hit·rarchy Hr<' Pasil." t•stahlil-illt'tlul'iill,l!; tht> Opt!' hit'l'itl'l'il.\' .. \l'i llliiii,\ ' ­

OIIt' n•ductions oftt•n COt'l't'spotHl to nwtric n•tlut'lious. tlu• l'iillllt' also hultls for

compll'l.t•tu•ss rt•sult.s: irtdt•c•tl. (;asarrh, 1\rt•n\t'l. a11d Happop11rt I< :I\ H!l'2. p. lj

ccmjc•rturt• that. Opt.P[f(u)] -cotll[li<'fl'llt'SS is till' llol'lllallll'lta\'ior of t'\'aluatiull

prohlc•ms rotTt'SJHHHliug to NP-cumph•tc•dt•c·isiou proltlc•tns. llo\\'t'\'t'l', Jlillltlal,ilit~·

is still us<.ful ill t.host• rast•s wht'll t Itt• trausformal io11 fro111 lllilll~' - otlt' to 1111'1 ril'

n·d uct ion is uot ohv ious.

ConsidPr uppt•r bounds 011 I lw contplc•xit.y of t.lw t'\'alu;ll ion prohlt•IJIS for I Ill'

phylogPnt'tk inft•rt•nn• nit.<•ria t'Xilllliw·d iu this tht•sis . By ( 'orollaric·s I. .r,, awl

8, a II rita ntcl.t•r CO Ill pal. i hi I i t.y prohl<'rtls and 1111 \\'c•ip,h l.f'd ph,Y lop,t •nt•ti(' p;u·si lnuny

ami distann• matrix fitt.iug prohlc·ms havt• optimal c·osl.s that. itl'l' poi,YIIOIIIially

hotnul<•cl, and that all W<'igllt.<•<l prohlt•ms haVt' opt.i111al <'osl.s t.ltaf. art• <'Xpotll'll ·

tially holln(kd.

Corollary 21 All duzrarlfr r.ompnlibilily nud llflltwi,q/tln/ phylo!l' ndir· Jlflf',o;imouy

and dislancr malrir. jillin,q nmlualiou pmblnu.'> r·.ramhml in lhi ... /J, .... ;.'i ,,., in

Opti'{O(!og n)}. All weighted phylogf'tlf1if' Jmr .... imony fl11fl di.•dtlllf'(' Ulftlri.r jillin!J

evaluation ptoblun.r; r.uunin.ffl in 1/d.o; lhu;is tl1'f' iu Opl/1•

By dcfiuit.iou, weighted problems in which t.lu~ maguit.H<!t~ of till' lar~J;••st. WPi.t!,ht.

is polynomially bounded are also in Opt.P(O(Iog n)]; hell<:~! , 11'1. '' •tuw•~i.t!,ltl.c•d"

~7

Page 106: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

abo r••ff'r to sud1 probll•llls. By n·s11lt.s from [1\n•XX] ritt•d abov<•. OJIP CCIII n•nrl

"Opt.l'[f(u))" as ''F/''V/'!flu))" i11 th1• r<'IJl(tilld<'f of this st•C'Iiou.

( ~-msiriN uuw ~·ornplf't.l•rwss n•sult.s, sl.ilrtiug with tlu' 1111\\'<'ightt!ll prol,l<•ms.

\IAX -CLIQI IE ;uul ~~~~-VEHTEX COVEH an· hoth Opt.P[O(Iogu)]-compl<>l<'

([1\n·XX, 'J'IU'ol'l'lll ~.~J: [C:I\H!J~. Tlwon•ru :J.:J]). D1•filw 1\IAX-X:JC itS tiH' sizt• of

till' Jai').!,I'SI 11•111-o\'t•rJ;lppill,l.!,. r;lf.Jwr I !Jan I'XHf't. f'0\'1'1' hy it sUIISI'I. of t.IU' p;i\'1'11 ;J.

sl'ls. As X:H' is a ~"rwralization :m~l [<:.Ji!), p. !i:Jj, ~JAX-X:JC is n gl'rH'ralizat.ion

of ~tAX - :il>l\1: as 1\IAX-:H>~I is OptP[O(Iogn))-compl<'lt• [C:I\H!J~. Tlr<•or<·m :vi),

so is i\IAX-X:W. Till' n•ductions from t.h<'S<' prohh•ms l.o ch<mtct.er compat.ihility

aud IIIIWPip;ht.•·d pl1ylopprll'l k parsimo11y <lllCI dist a nee matrix fit.tin~ prohl<'ms

v,ivt>rl i11 St·•·tiuw; :J:.!.l. :t~.:!. aud :t:!.:! givt• arit.hnwtic n•lat.ions hPt W<'<'ll tltl'

rusts of optimal solution~:. aud tlms yit•ltl t.ht• following nwtric r<'clurt.ious:

• 1\IIN-VEHTEX COVEH(x) = J\11N-X(x) - lEI

(X E llBCCS. llBQCS, llB\V,llBC:t•)

• ~IIN - VEHTEX ('OVEH(x) = ~IIN-X(x) -(:JIVI +lEI)

(X E (I BCI>o, lJ BQDo, lJCCI. liQCI)

• ~lAX-< 'LiQtJE(x) = ~IAX-BCC(x)

• ~IAX-IH'C(x) = ~tAX-HQC(x)

• ~IAX-X:W(x) = ((IEI-i\liN-FBlTT2[F',](x))/:J)- :liCI

• ~IIi\ - F B t 'T:![ /·'1]( x) = ~liN· Ft 1 DT[F1J(x)

Page 107: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

0 0 1 ..... -.. --·~·--· ....... . . ...... . ..... ~ ~ .. -.-~ . ... V <o. .. ......... . ,. ...... . ""'-r

• ~~~~-FBUT~[/·~. ~j(x) = ~lli\-FBPT~[/·\. 2](x)

• ~tli\-VEHTEX COVEH(x) == ~liN-Fil<:T[~](x) -l/~'1

Theorem 22 All f'lwmdt ,. t ·omJwfihility tllllllltllt't i_qltlt d phylo.f/Uitlir· Jltl/'8iiiiiiii.'J

n11d tli.'i/anrr-m(l/l'ir jillin,q r·t•tdltllfion pmblt "'·" t.rami11ttl in thi.~ ll~t.o;i.~ ,,.,

o,,, P{O( log 11 )j-mm pld t.

Cousidt•r complt•I.Pil<'SS n·sult.s for t.lw Wt·i~lil.t•d prohlt'I!IS. Ordi11arily, a

,, ... ightl'd c•valuatiou pro"l•·m is showu Optl'-•·umpll't.l' hy " \';trianl. ul' t.l~t • r•··

ductious Usl'd to sl10w t.IH'ir umv<·ightt•d \'Prsious l.o lu• Opt.I'[O( lo,l!,/1 Jl -•·outpll'l.•·

[C:KH!}~, p. H]. llowt'VC'I', thP n•quin·d modilinll.ious itl't' !Jot obvious for t•i ·

ther· phylog<'IIPlir parsinwuy or disl.illlf'(' lllitl.rix rit.till,l!, prohlt•IJJS. For I'XH111pl•·.

VERTEX COVEH as t.lw problt•IIJ that assori;tl.t•s \VI'i,l!,bl.s wit.h t.lw vt•rtit·t•s of a

graph anrJ rd. II rllS lJJ«' Slllll of t.JIC' WPigftl.s of t.J.c• Ill ill j IIIII III Wt ·iF,JJI. VC'fl.t•X I 'IIVI'I'.

This prohlem is OptP-compiPlt• [C:I\ IHn, Tlwon•JrJ :J.:J]; ltmw·vl'l' , !.Ill' tliHinJII.y

with modifying tlJ(' mauy-ouP n•dul'l.ious t.u phylo~•·u('t.j,. 1wrsirrumy prol,lt•IIJS

H!J

Page 108: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,1.!,1\'t'IJ in ~~•·• · t ion :L:l.l is that opt i111al :;olut.iorrs to tiiC' n·dur('d inst etllC£' twit her

l'flllsist•·tJI ly rrrilliJniz<· nor 111axiuriz<· tIll' W<"ights oft lw \'t•rt in•s of tht• rcuulidatt•

t'o\'t'l', !.111. iusl.•·•·d JnirrillliZI' tlw Wl'iglrt. of t.lw wholt• tn•1• (st•t• Figurt• I I). This

,·orrrpli•·al.t•s I Ill' •·xl.raf'l.ioll oft II(' t'Psl. of t.lu• uspfu) portions of t}w solution from

t lw mst. of t.ht- wholl' solut iou. This difficul ty nm hP n•soh·t·d in t lw sanw wny

as for w•·i,L!,ht"cl :\lli\-STEINEH THEE IN (;HAPHS (CI\BB2, Tlwon•m :\..!]. hy

irwludinJ!, in tlw inslil!ll't' an t·xplici< w<'ight.iug functiou for all <'dg<'s in t.ht• im­

pli•·it. J!.l'ilph. This vt•rsioll of t•ttrlr wt•iglrt.t•tl phylopprH'lir parsimony prohlt•m is

Opt 1'-r·ollrpkl•·: lroWI'VI'r. it. violalt·s t.ll<' spirit. of tlw original hiological proh!C'm,

and t lrus will not. II(' rousidt·r••d h('rt• furtlwr. Similar diffkulti<•s orrm in attempts

to uwrlify tIll' lllitll,Y·OJI(' n•durl io11s for dist.atl!'t' matrix fitt.iug prohh•ms give11 in

S•·ct iou :L:!.:J.

By I ht• n·st.ril't.iotl n·dul'l.ious from all 1111\\'C'ighted to weighted pltylog(~JI£'lic

par:;iJIIolly aud •lisl.ii!H't• matrix rit.ting prohi('IIIS, t.ht• e\'aluat ion prohl£'ms cotTC'·

spuudiup; to t.lw lat.tw ar<' Opt.P[O(Iog 11 )]-hard. llowev<•r, it is possihl£' to do

lu•ll 1'1' usiup; paddahilit.y:

Theorem 23 'l'ht· Jollol.'iu,q hold:

I . . \IIS-H'HCCS a11d MIN-Wil(JCS arc paddablr with respect to \1ERTEX

('O\ ·1m.

:!. MIN- H'/1('/Jo and ,\1/N- JI'IUJ!Jo arc pnddablr wilh 1'csp rcl lo VEN.TEX

( ·o' ·mi.

!)Q

Page 109: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

(a) (:I) ( I ) ( I ) A B ,, B

'V v c ('

( I ) (:q

~H H~

AC BC AC BC AC BC AC BC AC BC AC IW

~ I ~I \a I :l ~I ~ :1 ~ a ~ :t \'It +a + I A B c A c " H c " c \3 I :J ~I \J I I \1 /1 r~ \I I:\

0 0 () () () (J

~ I vc =:!I vc =6 vc = '1 vc =:1 vc = ·\

c = 8 C =H C=H I c = !jl C = H 7

(a) (I,)

Figure 11: Difficulties with t.he redudiou from wt•igiJI.c•d MIN-VEHTEX COVEH to weighted phylogenetic. parsimony ( ~valuation pwhlc·ms. ( :raphs arf' showr1 oil top, and all possible trees for each graph tllllll'f t.lw rt•tludiou iu Tat.l" !) arrd the costs of these trees (C) and tlwir corr('sporlflillg VNI.(•x mwrs (VC) art • S-!,iVf'll below. The numhcrs in pa.reutheses i11 tlu! gmphs dt•llof.c• t.llf' wc•ight.s <•sstwi;ll.c •d

with particular vertices. Note that. for t.lu~ graph in (a) , I.IU' llliuirnal t.n•c· in l.lw reduced instance also yields a miuimal wri.(~X mver, which is uot. t.llf' t'Wif' for l.lw graph in (h).

!)I

Page 110: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

:1. MIN- Wll W is ru~r!tlrLMr. with ,.,.·"TJrr·llo VE/fl'EX COVBR.

,f. MIN- WJJr:r:/ and 1\1/N- WIJ(J(:/ arr pnddnb/r. "'ilh rr.'>prrl lo VERTEX

r:ov1m .

. ~. MIN- W!Jf,'r i.~ fuulrlaMr wilh rr.'>prrllo VE/l'I'BX COl'ER.

Proof:

fJroof of (I}: Assttlll<' wit.hottl. loss of p;etwrality that all gtvcn instances

.r1, .r·.l, ... , .rk of VEHTEX COVEH haw~ tlu• same numhcr of vertices, and can

I !nt;; lw lltilpp<·d l1y t.h<• n•durt.ion f in Tahle 9 into k instances of URCCS, each

uf wltil'lt has d «"ltaral'l.<•rs and 111; taxa, I $ i $ k. Ld m.· == max m;. Construct

fl rt iust.mu·<· .r' of !\II N-WBCCS on d' = krl characters c1, ••• , cd' split into zones

z; == ,·1,_ 1 )•rl+ 1 •••• , c;.,f, I $ i ::S ~·. with C'arh zon<' corresponding to one of the

p;i\'1'11 inslatt<'<'s of l!BCCS. L<'l ·"'' = U~=t ,C..';, with <'ach ,<; E .5'; being mapped into

its appropriat.t· ZOII<' as iu J(.ri). with ZC'roPs in the characters of all other zones.

<iiw Pa!'\1 dlilrad.t>r in zon<' z; W<'ight. W:,(m•ti+ l)(i-1). Note that the maximum

W<'i~ht. in .r' has a tlltlllhl•r of hits polynomial in ~·, w•, and d. Hence, function

/q i~ polyuomial t.imc•.

No pal h in au optimal t.t'<'t' T for instance> .r' ran include a. vertex v such

that I' has dtaract.c•rs wit.lt stat.<.• I itt two diffen•nt. zones. Suppose that such a

path I' «'Xi:;t s, and asstmw ,,;it.ltout. loss of gt•n('rality that there are no vertices

fwnt ,..,., 011 this path. Dt•rtol<' t.h(• two zones by z' and ="· the first two vertices

sttrruttuclin~ I' on I' t.hnt. ha\·c• 1-stal.t•s totally within one zone by ;r and y, and

H2

Page 111: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

the number of 1-stalPs in .r aJHl y hy IJ. and 111 • Suppo:-;t• .1· mul .'1 an• in t.lw

same zone; assume this ZOtlt' is ::'. Cn•att• a path p' by t.aldng t•ach \'t'l'lt•x in

p and retai 11 i ng only those edgt'S wholly i 11 zottt' ::' i.t•. pro jt•ct. pat. h I' unto t.lw

characters in zon<' ::'. Path p' still conrwrt.s .randy, and is shortc•r· t.han I'• which

contradicts the optimality of T. Alt.c•ruatiwly, snpposc• ,,. and .'1 art• in zottt•s ::'

and :;", respectively. Assunw without. loss of p;t•tu•ralit.y that. t.lwr·<' is a path frotll

;r to 0 in :;1• Any path from .r: to ymust. contain at lt•ast. IJ. + I 11 t•dgc•s and have·

length at least (! 3.)w:' + (1 11 )w:"· Considc•r t.rc•c• 7'' t.lmt. n•plan•s t.lw pat.h I' wit.h

a path from y to 0 of length ( 111 ) tllz" . Tn•c• 'I'' is short.c•r t. hilll l.rt•t• '/', wIt ich is a

contradiction. Hence, all edges in t.lw optimal l.t·c•t• ar·c· ht•l.wc•c•n wrt.ic·ps iu t.lu·ir

own zones, and the cost. of the optimal t.n~<~ for .1:1 cotTt'SI)(Jilds to t.hP sumnw•l

costs of an optimal tree for each zone tinws the W«'ip;lrt. for that. Wit<'. ltPI'all

from Section 3.2.1 that an \IIIW<~ight.ed hinmy Camiu-Soka\ t.n•t• on 111 t.axn awl

d characters has optimal length not greater limn mA; t.hus, t.hc• rosts of opl.inHd

trees for ea.cl. ~one cannot overflow into t.he c.ost.s for t.r<•es iu otll<'t' wru•s, a.rul

the cost of the tree corrcspondiug to a.ny :ri ear1 IH~ <~<J.o;i)y ext.rac:I.Pd froru t.lw f'osl.

for x'. Hence, function h2 is also polynomial time, Psl.ahlisltirrg p<uldaJ,ilit.y.

Proof of (2}: Given a set of instauces ;r 11 ;r 2 , ... , ;r.k of VEHTEX COVEH.,

constmct an instance x' as in ( 1) ahove with two adrlit.ions: ( l) t.IH!re an~ ( k + I)

zones, and the (k+ 1)-th zone of maximum weight is dPsiguat<~d z-t, arul (~) S' is

augmented by Yi, 1 ~ i ~ ( k + I )d, such that JJi has I 's iu posit.ious i f.o ( ~: + I )d

Page 112: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Cousid<·r au optimalt.r<·<· '/'for .1:'. All <•dg<•s { {!J.i, YU+I)}, I ~ j < (k + l)d} U

{ :'/fi:+IH~ 0}} an· iu 'I' l1y t.IU' n•asoning giv<·n in [D.JS8G, Theorem :1], and by

t~·asouirtp; si111ilar to that for (I) ahoV<', tlwr<• are no paths in 1' between vertices

ir1 dilf••r<•JJI. ;.wn<•s. MormVPr, t lrl'l'<' is no pat h p from any vertex u in a zone

::' l.o <tny !l.i· Snppos<• surh a J><tl.lr ]J <·xists; project p onto characters in ::', i.e.

I'J'<•a.f.<• a pat.h from u to 0. As !/j and 0 are already conncctf'd, this yields a tree

sltorl.t•r t.hmr '/', wltich is a contradiction. ll('liCe, the cost ofT is . de cost of edges

{{.tJ.,lU+d, I -5:) < (A·tl)d}U{!I(k+Jld,O}} plus l.iJCsurnmcdcor.tsofan optimal

11'<'<' for l'<tdr i':on<~ t.inws the W<'iglrt. of that. zone. By reasoning sirnila•· to that for

(I) a!Jo\'<'. funrl.ions h 1 mal h2 art' polynomial time, ('Stablishing paddability.

fll'oof" of (.'I · /j): Tlw proofs for (:J} and ('1) are variants of those for (1)

mul (~), rt'S!H'cl.i\'l'ly. As any or<lered phylogenetic. parsimony problem can be

silllulafl'd h.v au appropriat.C'\y-sl.ntct.tm•d iustancc.• of the Generalized parsimony

prohl,•tu, ( ,r;) ,·au lw prowd hy a variant on any of these other proofs. I

Corollary 24 :Ill wrf.qhlrd phylogrnrlir par,r;im.ony cvalunliou J1roblcms exam-

Simil<l!' n•sult.s hold for Sl'\'Pral of the distanc(' matrix fitting problems.

'rheorem 25 Thr fnllowiii!J hold:

! . .\1/,\'-Ff!l'T{/•'t} is paddablr with rrsprrl to X3C.

- ~·

l ·'

'l

j

,I ·l

1

Page 113: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

2 .• l!IN-FUU7[F,, ?:.} it~ tJaddablc wilh rr.'{twd lo X;JC.

3. MIN-FUGT{?:.} i:; paddablc wilh rc.o;prc/ In \"/~'UTE.\' CO\'I~'U.

Proof:

Proof of ( 1}: Assume without. loss of p;t'Jlt•ralit.y t.hat. ill! j-!;1\'t'll insl.fltH't•s

;r,, x2 , ••• , .T.k of X:lC have th<• s<um• tllllllh<•r of Vt'l't.in•s. ancl <'<Ill t.hus ht• llliiJIJIC'd

hy the reduction fin Table 17 int.o k iust.aun·s of FBIJT[Ft]. t•ach of which lws

graphs G; with c; edges. Let c* = max c;. Cousl.rud. tilt' insl.iltl<'<' .r' ul' 1\-liN-

FUUT[F,] as a distance matrix D' hast•d Oil illl lllld<•rlying graph (,'' = u7:.:, (,';

l l t l' 0 . f . . 1' ( * I )k ( * I )lw- I) 'f { . . } /' I suc1 t1a c i,j = 1 1. = J, t i ,J = r: + - t · + 1 I,J E ~.,, illll

ll';.i = (c* + l)k otherwise. The maximum WPight. in .r1 ha:; a Jllllllht•r of hit:-;

polynomial in c* and k. Ht'nrc, function h 1 is polyuotnial t.iuw.

No partition in an optimalult.n•I/Wt.rir t.n•p '/' for insl.an'''' .1·1 can joiu VI'J'I,icr·s

from different component. gmphs. Assunw t.ha.l. two ~nwh vt•rt.in~s u atul " mr•

joined at level/. Consider 'I'' that. inst.<•ad joins a and ., :d. l<•v,•l (t·• +I }k. /\'!'.

d11 ,v = (c* + I )k, T' is of lower cost thau '/', which is a. coutradict.iotl. IJ,•Jic''',

all partitions must join vertices within iudividual (,'i· 1\ pm'f.it.iotJ of r:i into

either three or four triaugles i 11 t. he IIHtiiiiC'r d<~sni IH'd i 11 S<~f'l. iou :t ~.a is opt.i nwl

at level ( c* + I )k - ( c* + l )(i-1). MoreovN, thi!J'(! nut lw 110 ul.h1~1' joi ui 11g of

vertices in Gi until level ( c* + I )k. Suppose t.h<•n! was a partit.io11 of (/i at. lc!Vtd

l, (e* + 1 )k- ( e'" + l)(i-l) < l < ( c"' + I )k, wltic:h joill<'tl two fJJ'( ~Viously s1 ~paral1•

groups of vertices X and Y. Let. G x, Oy, a uri Gx U y IH! tlw su bgrap!Js of r:,

Page 114: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

iudun·rl ],y tlu~ vert.c~x-sc~ts X, }"', ancl XU Y, respectively. L<'t. cr be the number

of c!dgps i 11 Ox and Oy, r:, be t!JC' number of edges in G xU l' less c11 , and Ct be

(lXI + lVI- I )(lXI + JYI)j~- (c1, + c"). Note that Cp is the lllllllhc>r of previously

usc!d c·dp;c·s, c:, is t.lw 11111111wr of llllm.;cd edges, and Ct is the n11mher of possible

f'dJ!;<'~. Figur<' 12 shows tlrat. t.he partition at. level l c.rm cxi:;t if and only if Ct $ C11 •

As ,·:1, is ~oi!IJH>Sf'd of co1nplete snhgraphs, e.P = IXJ(JXI- 1)/2 + IYI(JYI- 1)/2,

iliHI r:1 =-- IXIWI - c11 ; lumn~, this condition can he rewritten as IXIIYI/2 < Cu.

( :ottsich-r t.hl' followi11g t.lm•p r.ru;<'s for the simplest. possible partitions at level /:

of l.ltl's<' v<·rtin•s in Grr, c11 = 0. lienee, t.hc~ condition becomes ! < 0, which

is a. con1.ra.clicl.ion.

As I.IH•r·c• is at most Oil<' f'dgc joining X and Y in C:c., eu = 1. Hence, the

nllldit.ion lwmnws ~ < I, which is a contradiction.

• X llnrl F arc lria.ug/c.o; in either equation 6 or equation 7: As there are

at. most. t.wo c•dgc•s joining X and Y in G"' , Cu = 2. Hence, the condition

I C) ) I . I . I' . H•c·omc•s ~ < :.., w 11c 1 1s H contrcu 1dtou.

Using t.ht• arguttH'IIl ahow for the joining of two groups, the reader c.an verify

t.lrat. no joining of t.hn·<' or more' groups at level! can occur in an optimal tree.

Thl'rdort•, 110 part.it.ion at. h·vl'l/ ra11 <'xist. in a11 optimal trre.

9()

Page 115: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

l

(c.* + I )k­

(c* +I )(i-t)

---------r-------~------~---------

X y

Figure 12: Conditions for mult.iph· part.it.iou h·wls on subp;rapl: U.,. If l•·v•·l I is not used in the optimal t.r<'(', t.lw co~l. of p;roups X and }'" is qr(t/1 + tf..d; else, the cost is c1d1 + C11d2 • Thus, lc•vc•l I ntll c•xist. in an optimal l.t'l'c' only if Ctdl t Cud2 ~ Cu(dt + r/2) i.P. f:t ~ f',.

lienee, an optimalult.ranwtric. t.n•<• 'I'' for iust.atH'(' :r' will c·o11sist of l· llnllt.rivial

partitions at levels ( c* + I )1, 0 $ i $ (k- I) cotTI'SjmrtdinJ~; t.o sol11tiuus t.u l.lw l·

instances of X:JC, and will hav<· cost. <'<fiiHI l.o t.lw stllllltl<'d valtt~•s of /t'1 for l.lw:·w

solutions. Hecall from Sediou :J.2.:J that an optimal solution for ·''i in '/''will have•

in Gi that arc induced hy that solut.iou; titus, t.h<! msl.s of opt.ilnal ull.rarrwl.ril'

trees for each Xi camwt. ovcrnow iuto the cosf.s of opt.i111al ult.mrrtC'I.rit· t.rc·c·s for

other :r:j, and the cost of the solution for any .1:1 cau IH~ c•<t.sily c~xf.md<'d fro111 t.lu~

cost for :r'. Heucc, functiou h2 is polynmnial f.i nH~, c!sl.abl ish i 11,11; p;uldal,j lit.y.

Ptoof of {2}: A variant of tlmt given above~ for (I), rr•ad<• l<'ss t'OIIIJ'I'·x !,y

dominance.

~J7

Page 116: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

/ 1roof of (:J): AsstJIIU! wit.ltout. loss of generality that all given instances

.1' 1, .1:~, ... , ;r,k of VEHTEX COVE H. have the same uumher of vertices v and PdgPs

(·, Constnwt tlw iust.all<"P :r' of MIN-FUGT[~] as V' = {*} U {U7=• \fr- {*}},

,'1'' = { ..-} U { U7= 1 ,C..'i - { :t}}, and IJ' such that all cl ist.ancm; lwtwccn pairs of vcr·

tin·s iu I.Jw salll<' \li an• tltos(' giwll in the reduction ill Table 18 multiplied hy

( 11 + r· + I )(i-l), alii I all clisl.anr<'s IH'I.W<'Cll pairs of vertices in different instances

""'' s11111s of t.IH' <·dg<•s 011 th<~ path hetw('{m those vertices in a canonical tree for

.r'. Till' l'l'illh·r can VPrify that in an optimal tree for :r', there will be no edges

IH•I.w<'<'ll Vl'l'f.i('Ps iu difren•nt \'i, as tlu•s<~ will he forbidden by the constraint of

dominatln', 11<'111'<', tlw cost. of t.h<' optimal tn•c will be the sum of the weights

of all <·d~<'s in optimal t.r<'<'s for <'ach \1;. Note that the sum of weights for each

\!;will It<' l<•ss tball (11 + c)(v + c + l)(i-t); henre, the costs of optimal trees for

<'ach .1·; l'illlllol. owrflow into tlu• < ··~sts for optimal trt•es for other ;rj, and the cost

of tlw solution for any .r; rail h(' easily PXtractcd from the cost for x'. Henc<',

fuurf.iou h1 is polyno111ial tinw, establishing paddahility. I

II. is unfortuual.t• that. most of t.he wrighkd dista.ncr matrix fitting evaluation

prohl<·ms do not yidd to paddahilit.y proofs of the style above. The exponential

ill<'l'<'ast• in I lw ll'ngt h of t.ht• wc•ights r('qnir('d to S<'pamte optimal solutions for

t'arh inst.;llll't' undt•r· l.ht• /•; statistic. complicates proofs for MIN-FUUT[f2) and

~IIN-FliU'l'[/•1. :;:::]. and it is nut olwiuus how orw could show paddability for

Flli>T(F.]. FtrDT(F]. or FlTDT[F2].

~)8

Page 117: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Corollary 26 The following hold:

I. MIN-FUUT[Ft}. J\I/N-FUU11F1, ?.}. all(/ ,\1/N-fo'l iUT{?.} urt· Fl'ji'''-lull'll.

!1. JHIN-FUDT{FI], Jl1/N-FUD11F}, M/N-F{Tlf'/11·1}. MIN-f{Tf)'/'f/"·d. nnd

M/N-Fl/UT{F2, ?_} Ul'f. propf'rfy ppNI'(O(Iov.u))_/1111'1/.

4.3 Solution Functions

Solution problems have been studied indin·d.ly via. tll!'ir approximation hy ciPI'i­

sion problems [G.J79] and cva.luat.ion prohh~ms [G h: H!l2, K n•HH]. Mon• l'l'<'t>lltly,

these problems have been st.udic'd directly using paddahilit.y [CT!II, ( :asHfi] aud

multivalucd function classes such as N p M vq [SPI91]. Tlu· t.c•t'hlliquc•s di'VI'lopl'cl

in this lattct· work will be used in this s1~c.t.ion.

There a!'c several types of solution functions.

I. A function that complltf•s a single• solution [f:.J7!J, Claapt.N !ij.

2. A function that com pules hut cannot Pllllllll'ra.te all so\nl.ions i.P. 11 f1111t'l.iou

in NPMV {Sei!JI].

3. An index-driven funct.ioll !J(i, ;r.) that rompuf.c~s t.h<' i-1.11 solution ror· iu:..t.aflt'P

x uuder some polynomial-tinw orcle·ring /'em 1.-' llili',Y st. rings.

Definitions (l) and (2) will be clisf'IISSI'd iu this Sf~c:tion; dt-fiuit.iou (:~) tll's.:rilu•s

the enumeration fuur.tio11s iu Section ,U) aucl will lw discus~wd tlwrc•.

Page 118: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Cousid<•r functions of t.lte type iu ddluition (I) ahovc!. Following [CT91],

t.lw for.t1s will lu~ 011 hounds oJJ the mmplexit.y of SOL-X,, the class of singlc>­

valtwd fundions t.ltal. r.omJ>tllP solutions to problem X. Certain properties arc

kuown l.o i111ply upp,~r ho1111ds orr SOL-X,: problems that have a polynomial

llii!IIIH'I' of ft~asibh• :mlut,ions for auy insl.itiJC(' are in FP,f1' [Scl91, Proposition

!i], nrul probl<•ms t.l~rtf. an~ polynomial-invertible in the scnsP. of [Wagl\87] i.e.

all solut.icms of C"osf. k ca.u lw tmtmwrat.l'd iu polynomial time, arc of complexity

•·quiwdPrtl. to t.lteir rust functions. Jlowc•vcr, none of the problems cxamirwd in

this t.lwsis PXltihit. <•it.IIC'r of these propcrt.irs. Consider instead lower bounds.

By t.lw prdix·st•a.rch f.<~chniqut•, wlric.h builds an optimal solution hit by bit by

consulting an N P solut.ion-pr<'fix oracle ([BDG88, p. 61 ]; [G.J79, Chapter 5]),

<'Vt•ry prohll'lll X has at. h•ast. unP member of SOL-X, in F pNP; hence, the lower

ho11111l ca.u lw no harder t.ha11 F' pNI', As no optimal-cost solution function can

IH' c•asi<•t· than i t.s associa.t.<•d rva.l uat.ion fund ion, lower bounds can be derived

from tlw co111piPxit.y of t.lw associated evaluation functions. Such bounds can be

improvc·cl for phylog<'IIPt.ic infrrcncc problems by applying Theorem 16.

Theoretu 27 All.ooinfllr-t,alurd fuuclions solvin,q all phylogc11clic injf1'cncc optimal­

t'o.-;/ .~olrtlion f1robhm8 r;ramiurd in lhis lhr . .,is nrc F flfP -hard.

Proof: As not.('d in Section :3.2.4, all rc>ductions in Section :J.2 give algorithms

for I ransforming optimal solutions for original and reduced instances into one

anolllt'r. llc'IH't'. t.IH' nwt.rir n•ductions from MAX-X3C, MAX-CLIQUE, and

100

Page 119: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

- •• • ' • "lh ..... ...... ~ ........ .. ,. . ... ... .. . ..... ,..... . .. .. .. ~ ""'_,. ,..., ~···· ' • • , . , ; .. .. .. ,. :- • -·-·.-..<It-

1\IIN-VERTEX COVER to tm\r<'ightcd phylo~t·ut'lir iuft•n•ttn· t·,·aluat iott prol.­

l<•ms gin•n iu SPrtion ·1.2 cnu I><' modifit•d to gi\'t• nwt ric n•dud.ious ht'l \\'t't'll t hc-

corresponding optimal-cost. solution prohh·ms. SOL-l\1 A X-CLIQt IE is paddahh·

[CTHI, ThC'orcm ·1.2]. ancl SOL-1\IAX-X:JC and SOL-1\IIN-VEHTEX ('OVEU ran

lw shown paddahl<• \·ia funrtions that. simply colnhin<' I lw givt'll iusliiiiC'I'S iutu

one instance without. adding any II<'W conlpolll'llt.s. I

Consider now functions of t II<' t.ypt• i 11 dt•li 11 it. ion ( 2) <1 ho\'1'. t\ II phy I ow·nt'l. k

infC'J'Cnn.· optimal-rost solutiou prohlt•ms dl'lill<'d in this t.lwsi:-~ an• iu N IJ M \ ;, o

fpNP, aud all roJTc>spomling giv<'ll·cost. and giv<•n-limit. solution prohlPIIIs an• in

NPJHV9 • This cldinit.ion is mwful primal'ily for vhnwlb:iu~ t.hc• sl'l. of :;o)ut.iuus

asRociat.cd with JH\rt.icular inst.a.JH'!'S of a. prohl<'m, marl highliAiltinp; t.lw ('OIIIptll.il­

tional structures for different. t.yJWS of solution funet.ions t>.g. tlal' two-pha:-;t• na.l.tii'P

of N P MVg oF pNI' romput.at.ionl'i (s<'<' S<·ct.ion ,t.l.l ). llowt'V<'I', it is also pussilll<'

to dc•riw l'c>sttlts using this ddiuit.iou, such as tilt' followiup; low<·r lulltll<l ort t.lw

complexity of singl<•-va.lued functions for t.ll<' phyloA<'Il('l.ir iuff'I'<'JII'I' p,iVI'II·I'Ust

and given-limit. solution problems.

Theorem 28 A II .'>in!Jlc-valuf'fl june/ iou.o; .<wltJiTI.tJ ttll ph y/o!J''"" ir· i uj,.,.,.,,., . .rJillf'll­

cosl and f)ivrn-limit .<>olution proMfms f'Jmuin,.,f iu 1/li . ., lhr·.o;i,~; r1rr· Jn·opt·ri!J

F pN I'[O(Iogn)]_ha1'd unlc.c;s P = NP.

Proof: Cook's generic recluctiou from d<~dsiou prohh~ms iu NP t.o SAT [C:.J7!J,

Section 2.6] (sec Scc:tion !).J) is a geueric: rndrir: r< ~d1tdio11 fro111 <'VI'I'Y solutioll

101

Page 120: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

pro"I(•JJJ iu N fJ M ~1 t.o SOL-SAT. Tlw rt•adc•r cau Vl'rify that the reductions

from SAT to CLIQIJE, VEHTEX COVEH, a11d X:~C [G.Ji~J, Ser.tion :u], and

from t.lu•s(• problc•ms l.o th•· ·i·J;,yJogc•ud.ic iuferi'Jln~ cl<•cision prohlc>rns (seC' Sc>ction

:1.~) ;u·,. also nrd.rk J'(•ductions 1)('1.\Vt'<'ll t!IC' corrc>sponding given-cost and giveu­

lirnit. solution prohlt•ms. llc~rJn•, auy siugle-valuc>d function that solv<>s any of

t.ltt• pltylop;<'JJPI.ic iuf<'I'<'IICC' p;iwn-rost and given-limit solution problems can he

usPd t.o ('oJJsl.nwt. a singl<·-valriC'cl fuuctiou tlr<tl. solws any prohlc.•m in N PM\;;,.

To I'Oitlpll'l.t• t.IH• proof, J't'('illl that. N /'AlV11 £;r ppNI'(O(Iogn)) implies P = N P

[Sf'l!ll, 'l'hi'OI'I'nl :t]. I

4.4 Spanning Functions

( ~IIIIIJI.iup; prohl<•ms W<'l'l' first. dcfirwd and studied in (Vali9a, Val79b, Sind77].

This t•arly work rom-id<'r<'d t.h<• numlH'r of (not necessarily distinct) solutions en­

l'oclt•d hy a noJJciC'lt•rministk computation, a.tHl has lc>d via threshold-acceptance

m<•dmnisms to t.lw work on probabilistic computation [.Joh90, Section 4]. There

lws ht•t•Jl a n•ct•nt. n•surgenr<• of intC'rcst iu counting for counting's sake [Sch U90,

Tor!H. Wap;I\Hf)a, \Vagl\86h], including th(' counting of distinct solutions [I\ST89),

whirh will In• t.ll<' fuc·tts iu t.his section.

All phylop;t'llt'l.ir inft•n•nc<' .e;ivt.•n-rost and given-limit probl<'ms examined in

this t.lll'sis an· in Spa uP. and all corrrspontling optimal-cost spanning problrms

an· i11 Span( N J> .\1 \ ·~ o I•~ pNI'). At prt'S('Ilt, thl'rt.• arc> no lower hounds known on

102

Page 121: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

. . ... -.. ~ - -~ ··..-- ...... , .. ............... ' · ·-- ~... .. •.. ··-· ·~ ····· .... ~ ... -··· .~

tlw complcxitil'S of any of tiWSl' prohiPnu-;. Only tilt' n•tluction l'rolll ('LIQ\IE to

BCC .gin•s a ont•-to-mw solutio11 mappinp;. whirh yic•lc!:-; a nwt ric rt•c\urtinn hc•­

twccn the corresponding opt.imal-rost.. gi\'l'tl-cosl, anti gi\'l'n-limit spannin~ proh·

lt•ms. ll<'nr<'. all rharadt•r compat.ihilit.y spauniug prohlt•ms art• hartlt•r than tlw

corn•sponding prohh•rm; for CLIQUE. trnforluJtatt·ly, 110111' of tiH•st• prohh•ms fm

CLIQUE an• known to he ht• hard for t•itllt't' #I' or Sp;1111'. It is inl.t•rt'stillp; t.ltal

only the vcrsious of CLlQUE anti VEHTEX COVEH that. corutl. locally optimal

solutions have ht•eu shown t.o IH' #P-complt'tt• [Val7!}a, Tlwort'tll 1].

Scvt.•ral trivial but. iut.riguing honucls t'llJt•t·gt• for a11y spauuirtg prol,lt•nt X J,,v

applying binary search argunwnt.s. TIH' followiug hold for IIIIW<'ip;ht.c•cl prohlc•IJts,

• SPAN-SOL-OPT-X E , [JSI'AN-SOI,-VAL.EQ-X

e SPAN .. SOJ...-OPT-X E p pSI'AN-~·;()1,-VAI..IA·;-XJO(Iuj.\ll)]

• SPAN-SOL-VAL.LE-X E ppSI'AN-SOL-VAI,.EQ-X

weighted problems,

• SPAN-SOL-OPT-X E F pSI'AN- SOI.- VAL.I.E- X

and for all problems:

• SPAN-SOL-VAL.EQ-X E f[JSI'AN-SOL-VAL.LE-X(1)

Note that if either of the givcu-cosl or giveu-limit. spauuiug prol,l«~ms is iu FPII,

all three problems are in FPII ; however, the: optinml-r.ost Hpa:miug prohlc·m C'all

1 n:~

Page 122: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,,.. itl FPII without. implyiug IIIIYthiug about til(' r·omplc•xity or the other two

prohlc>llls (sc•c• S<·r·l.iou •L7).

4.5 Enumeration Functions

A II t•x isti ng dPiill i tio11s of en IIIIIPrabi lity in complt•xi ty theory ( sre [II HSY91, Sc~-

tiou t!J for 11 n·vinv) are rortC('rtl<'d with <'lllllll<'rating lang11agcs rather than th<.'

rallp,Ps of fuurt.ious for pa.rt.ic.ulm· inputs. The enumeration problem cousidcred

lwn• was riPfitwd lll<HI' for t.he ronv<•uienreof its IIS<'I'S than theoretical tractability;

ltowt•v..r, it. may still IH! of sontc• usc~ in pill'<.' c.omplcxity-thPorctic investigations.

Though may funrt.ion in FPII can he simula.t<!d in PPSPACE(poly), it is not

obvious that auy such fund.ioll rnn he Pllllmeratcd in FPSPACE(poly).

Theoren1 29 Gi11r11 a problem II in F!.% ami a polynomial-lime m·dcrin,q P 011

bintll'!f.~ll'iufl·"• ihr pmbif'm of romfntliu,q flu: 1.·/h oplimalsolulion under P for an

;,.;ftlluT .r ofll is in FPSPA CE(polu).

Proof: L('t N E Ff.~ h<' a polynomial-tinw NOTM transducer that computes

I.IH' solul.ions of II. LC't. p(n) he the polynomial bounding the running time of N

anti assume• that. all solutions ha\'(' h.•ngth p( 11 ). Define the foJlowing function:

RANK(N,P,x,y) = l{wl w is a solution to Non input ;r and w < y unde1·

orclt>ri ng I'} j.

HANK ran lw rumpnlt'tl m FPSPACE(poly) in the same way as functions iu

Spi\nPit {<'orullary l!l, Part (I)). lTl'iug RANK, a binary search can det<.'rminc

10·1

Page 123: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

. , ., ,. - - -.. -... -· .. . - .. .......... ·--- -.... .. ~--..... ........ - ~ ···- ... ·-· ..... ..... _..._ ..... - .... - .. ............ _.. ..... _ .... _._....., .......... _~

which of tht' 21'(n) po~sihlt• ~olut ion!' has PXilrl ly (i - I) solutions prt•n•diu~ it

under Pi.<'. t.lt<' i-th ~olut.ion undt.•r /'. Null' that this hinm~· st•arch is 1'1111·

ductt>d on t.IH• ord<'t·inp; of pos~ihlt• solution st.rinp;s unclt•r /'. This pron•clun• is in

ppFI'SI'ACI~(I·(IIy) = FI'S'P:\('f(l)()l,ll)· I

Theorem 30 Gil'ur n pmblrm II 111 F~~ and t1 lmlyllomial-linn orthri11,11 /' on

binary slriu_q.<;, lhr problem of t'OIIIJllllintllflf· ~·lit oplimul:·wlulion rmdr·r /' /11r an

instance .r of II i8 in pp#l,.

Proof: ~lodify NTM N in t.ll<' prt•n•clilll!; proof t.o takt• a:; addit.iunal iupul

a binary string !I and product's only t.hww solut.ious "' such t.lta t. ,, ~ !I und•·r

ordNing P. Pn,of follows hy ohst•rviug t.ltat. Ht\NI\(N,P,x,y) = SPAN(N'(x,y))

and that SpanP ~ F p#f>[t] by Corollary I H, I

Corollary 31 All phylotJf'ndit• injr'T'f'IH'f oplimal-r·"·"'• !l;,,.,,_,.o,.,f, 1111tl .t~it•t · ll-limil

rnumcml i on proiJlrms rmmiufll ill I hi.-; 1/,,.,.,;,., llt't' in 1-' J'# 1' .

The only known )ow<•r homul for I. IH':-w prohh·ms is i 111 pli•·d I •.Y t.lll' ol •s••r·vat.iou

that, for a prohlem X, SPAN-X E F /11~'NifM-X i.r•. 110 l'llllttH'I'<~I.ioJJ prol,l!'lll r·;111

he easier than its associatC'tl spa1111ittg prohlt•trl.

4.6 Random Generation Functions

Though ther<! arc many pap(1rs 011 the random g1~twrat.iu11 of part.k11lar typ•·s of

graphs, ge neral ranrlom-gcne rat.iou prol,lems lraVI! IU'I ~II forrnulat.l'd awl sl.11dil'd

Page 124: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

as a d;L~S only iu [.JVVXfi]. Tlw results of the previous section suggest that ran­

dour ~r·ur~rat.iou probl('ms are i11 Ffff-1#1'; howc~ver, .Jerrum, Valiant, and Vazirani

haw givr• tr a pron~durr~ of complexity FRPr.~ whirh us<:>s St.ockmeycr's F p"£~ ap­

proximation proc·<'dlll'l! for functions iu #P (sec Section 5.6) to generate outputs

of any N fJ JH ~1 machitl<' a.t raudom urub a uniform distribution. Recall that an

N P quc'I'J' ca11 Ill' simulat.('d hy a11 appropriate 1:~ query.

Corollary 32 All phylofJrnclic iufrrcncr. optimal-cost, given-co.<;/, and given-limit

rtwtlolti-!Jt'1U'I'ftlion pmblrm.c; rxamincd i11 Ibis lhc,r;is are i n F RPr.~.

1\ s H II.Y ra ndom·g<'ll<'rat.iou function is also a solution function, the lower bounds

ou t.IH• ro111plr•xit.i<•s of solution functions given in Section 4.3 also apply to

ra rrrlom-g<'llt'l'a t.io11 fuudions.

TIH'SI' results haw an irrevocably academic flavor because they depend on

acn•ss to a sourer• of truly random bits. Though it is impossible to obtain random

hits hy purt•ly arithmetical methods, there arc techniques for generating near

random hit. sc•qll<'ll<"<'S and for Pxpanding random "seed" seq nences into longer

!'a ndom ~t'<PI<'II<'t's. Thr• interested reader is referred to [LV90b, Section 1] for an

int rotlttl'l.ion 1.o matllt'mat.ical d<•finitions of randomness, and [Riv90, Section 7]

for <t ~11111111 ary of llll't. hods for g<'IIPra t i ng random seCJUences.

106

Page 125: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

-Optimal-Cost Unwcight.ed J Wl'ight.t.'tl G i \'t'll- ( ~o~t ( H\'t'n- Limit.

Decision - N p -COlli plt'l.l' Evaluation ppNt'IO\Iof!;n)LCJ FPI\"''-hanl t -Solution F flt'-hard, pt'OJ>l'rl)' F pN I'(O(lu~rtll.h ;ml.

E NPA/l' o ppNJ' ,'1 E N J> 1\1\';,

C:panning E Span(N Pl\IV, o ppNr) E Spall P 1-- E J.'J>#I-Enumeration

Random E F!fJ>'f-~

Generation

Table 20: Computational compkxit.it•s of phylopptll'tic inft•rc'fll't' fnndious.

t Most weighted distance matrix fit t. i ug 1waluatiou prohll'llt~ 11.1'(' only known to be properly F pNI'[O(Iogn))_hard (see Corollary :lG).

4.7 Summary

All complexity results obtained in this sPdioll fot· phylogtmdic inf<'l"<'lll't~ probl,•nts

are given in Table 20. Optimal-cost solnt.ion problems a.n• prova.hly hnnl.-r than

the corresponding given-cost and given-limit Holutiou problems lu~<'alliW of tht• N P

queries allowed to optimal-r.ost problems. However, this dilfi'WIIC'<' S<'<~lliH t.o dis-

appear for tnorc complex versions of these problems. 'l'ltongh tit is dilfl~l'l'll<'<~ lllii.Y

re~assert itself when completeness re.•mlf.s are avai lable, l.lu~ n•lat.ioJJs J,dw('<'ll t.lw

spanning versions of these problems suggest, ot.lwrwise (set~ s,~dirm ~.1). I fOil-

jecturc that for problems more complex than wmputing solut.ioJJs, opf.irnal-cost.

problems are easier than their corrcspondiug givc~n-r.osf, a11d given-limit. probi<!IWI.

Solution problems arc of greatest. iutcrc!sf, t.o hiologist.s, itS t.lu!S(! pro!,lellls ilr«!

concerned with the trees that dcfiuc evolutionary hypot.lwsl's. Morf!ow~r, t.lwy

107

Page 126: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

ill'(: t.h(! only proi,Jmus that have IJ<!CII investigated in the literature, albeit by

assc•ssing part.klllar algorithms solving tlwsc! problems [LPSTJ, Pla89]. Several of

l.l1ro otlwr proj,kms 1.J'Nited abow also have biological applications. For instance,

spannin_g: n!sult.s give low!'r hounds on the mnning time of branch-and-bound

algorithms that solve tlu~ r.orr<•sporHiing solution problems [Sto85, Val79a). Also,

iiS <'<lf'h phylog<'IIY iworpora1.es a different hypothesis of character change, all

sll<"h hypot.IIC'sc·s should he cousidt•rwl to get an accurate idea of what is implied

11hout pltylogc•uy hy a particular data set [Mad9l, .MHS92], which could be done

by <'IIIIIIIPI'a.l.i II.(!; a II phylogenies.

Tlw rc·sults giwn in this section do not directly put upper or lower bounds

o11 t.hc• l.inH· romplt-xitiPs of algorithms solving these problems; at present, it is

ouly kuowu that tlws(' prohlcms, hy virtue> of being in FPSPACE, can he solved

in l'XJHIII<'III.ial t.imc•. lloweVl'l', tlwst' res11lts do give the relative hardnesscs of

t.lu•st! prohl<'ms, and may sugg<'st guidelines for algorithm designers about whic.h

appmacht·s may uof ust'ful for· solving these problems.

108

Page 127: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

' • • ,,... - · ... . ..... ... ,.~. ' l" ·~ · ••• ~ ... . .. , .. \." ""''~' .. j.'• .~ ........... ....... _...

5 The Approximability of Phylogenetic Infer-

ence :Functions

The results in prPvious sections suggrst. that polynomial-t.inu• alp;orit.llllls provid­

ing cxacl :mlutions for phylogenPtic inft>r<'IJ<'<' opt.imal-co:-;1. prohlt•11rs prolmhly do

not C}:ist.. However, fast algorithms may <·xiHt. if orw is willinp; t.o s<'l.t.lt• for t~pprox ­

imatc solutions whose cost is withil1 sonw fixt>d iukrval or r.rt.io of t.IH• opt.irnal

cost. In t.his section, I will derive sonw limits 011 t.ht• typp:-; of approximatiu11s

that arc available to phylogenetic infer·enn• prohlt•II\S,

5.1 Types of Approximability

This section gives a hrid overvi<'w of typPs of approximat.iou al)!,orit.hrns atrd sotn<'

class-based approaches t.o proving t.lml. various of f.lws<~ t~pproxilllHI.ions nnu<ol.

exist for a given problem. For iu-<h•pl.h reviews of f.opks irr this Sl'l'l.iou, Sf'<'

[B.JY89, C:.J79, HS78, Mot92].

Given a problem X, all i nstaurc~ I of X, and an approxi rnal.io11 ;dP,orit.lun 11 x

for X, let OPTx(l) he the cost of tire optimal solution for/ , Ax(/) lw tlw C'usf.

of the solution for I found by Ax, a!HI MAX x (I) he f.lu ~ largest. of tlw rosf.s of

all solutions of/; further, let Y = OP'I'x(l) if X is a miuirui:w.l.io11 pw!Jieru, illld

Y:::: Ax(/) if X is a maximization probh~m. Tlu~n~ an~ sPve:;al rrwa.suws of t.IJe

quality of an approximation [OM!JO, p. fiJ:

I fJ!J

Page 128: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• Ab.<;o[ulr: !'J'rmr Mr·a,<;un; flrt(l) = !OPTx(l)- Ax(!)!

Th<~ clilr<~r<·ul. Y an~ appli<~d to map the error-measure values for minimization

and nmximiza.t.ion prohlc>tns into the same' interval, namely [0, +oo ), for easier

colllJHt.rison (C.J7!), p. 1::!8]. 'l'hcr<~ are several types of approximation algorithms

ddiltf'cl hy VHrious houuds 011 the quality of the resulting approximations:

I. Ahsolut<• ( Adtlitiv<') Approximation

• IOI"f~v(/)- Ax(f)l ~!(Ill)

~. Polynomiai-Tinu~ Approximation Schemes

• Polynomial-Time Approximation Scheme (PTAS):

For a.!l 1.· > 0, there exists an algorithm Ax such that IOPTx(/) -

Ax (1)1 ~ t Y ami the runtime of Ax is polynomial in III for each 1.:.

• Fully Polyuomiai-Time Approximation Scheme (FPTAS):

For all 1.: > 0, there exists an algorithm Ax such that IOPTx(/) -

Ax(!)! :5 tY and t.he runtime of Ax is polynomial in III and t·

'l'hP algorithm Ax nm he either a single algorithm (uniform PTAS) or a

family of algorithms ( 11011-1l1liform PTAS).

ll 0

Page 129: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

3. Rclativt> (1\lultiplirat.ivP) Approximat.iotl

• IOP7\(/)- Ax( I)! :S cY,c > 0

In the following, "a polynomial-time algorithm with a n•lat.ivc• (an cthsolutl') fiJI·

proximation c" will bP abhn•viatPd as ''a n•lat.ivc· (an ahsolttt.c•) approximation t•".

There arc sevcml variants on tht>sc• lkfinit.ions in t.lw lit.t-rat.un· \.hal. «lt't' nc•at.C'cl by

using different error mcasun•s or implyiup; asymptotic· ratlll'l' t.hatl ahsolut.t• t'rt'OI'

bounds. One such variant (indeed, tiH' ftr:;t. (.Joh 7•1] ami prPfPrrc•d uota.t.ion) n•p-

t 1 t . . t' f • I f t' I . Ad I) ( • . . . I'CSCII S rca IVC apprOXIllla lOllS itS 8 ,J'il!g I dOI' \VCIJ'( l'ilt.IOS OJ'},\:{/) lllltlttl\I~Ht.to\1

problems) and 0~~Ml) (maximization prohl<•ms). As sonw of t.lwsc• cll'finitions

arc not equivalent and it is uot always ch•ar whkh dc•fiuit.iott is lll'inp; IJsc•d, till'

reader must exercise cautiott iu compal'iug rc•sult.s front dilrc•n•ut. solll'c'c•s. In this

thesis, all approximability definitious and rc•sult.s will he• plll'ilsl'd as ahuvc• in

terms of the absolute error measure, lwcause (I) this IIH~it.'illr<' unilic•s t.hP t.ltJ'('I'

types of approximation algorithms dcr;crihed above, aud (2) this IIH'iiSIJI'I' is till'

formulation of choice !n the proof tedmicpws [ALMSS!J::!, 1\n!XH, PY!JijtJs<~d 111

this section.

Traditionally, the theory of approximaf.iou algorithms has lu~<m (:OII('I!I'!wd

with proofs that certain types of approxirnahilit.y did not c~xist for part.ir:ul;u·

problems, with approximation-preserving reduct.ions, aud with rwcc~ssary arul suf-

ficieut conditions for the existence of various approxirnatiou algorithms for a give11

problem; see [B.JY89, HS78, G.J7H, MoW2] for a rt!view of this work. Witltiu t.h(!

Ill

Page 130: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

last four Y~'<1.rs, two approaches ltit\'c emerged that arc based on hierarchies of

approximahility dass1~s:

I. 'J'ft,. Af.qorilltmir· Appmximnbilily 1/ir·mf'chy [CP91, OM.fJO}: Define the class

NI'O of all NP optimizatiou problems, aud its subclasses FPTAS, PTAS,

aud APX consisting of all problems that have FPTAS, PTAS, and rcla-

t.iw approximation algorithms, r<'spectivcly. Orpouen and Mannila [OM90)

dl'litw•l NPO, and showetl that several problems are NPO-complctc un-

d!'r an•latiw-approxima.tiou preserving reduction. Crcccnzi and Panconcsi

[CP!H] d<'fitu•d FPTAS, PTAS, ami APX, and gave artificial problems that

;u·<' wmpll'l.<' for PTAS and APX. lt. is known that FPTA8 c PTAS' c

AJJX C N 110 unless P = NP [CPHl, Theorem 6], and that a problem that

is luu·cl for a particular class rannot have an approximation algorithm from

a. loWI'I' dass uul<•ss P = NP.

2. 'l'hr· Lo!liral-Form Appmximabilily llicm1'chy [A'T90, 1\791, PR90, PY91}:

Tit" algorithmic approarh to defining approximability does not give in-

sight int.o why prohlems ar<' approximable [B.lY89, p. 220); moreover, it

is not cll'ar how one definl's it notion of ''approximate computation", let

almw dal'\s<'S of such computations, using the Turing Machine encoding of

prohh•ms [PYnt, p. 42()]. Building on the work of Fagin [Fag74}, Papadim-

it.riu11 and Yannaka.kis [PY!H] initiat('d the study of approximability classes

t.hat do not. involw romput.a.tion - that is, classes of approximable prob-

112

·> ' , •

Page 131: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lems defined hy the syntactic ~>t.ruct. un• oft IH' lo~k formulas t.hat. dc•scrilw

the solutions of thos<' problems. !\I any classc•s ha\'<' lwc•tl tlt•rirwd ill this

framework [KT90, I\T91, PRBO]; only l\IAX NP arul ~lAX SNP will lw

described here. Fagin showed that t.hc• class NP could he• rt'lll'c•sc•ntt·d i11

logic as the class of prohl<·ms whose• solutions s· (';Ill lw c•xprc•sst>d hy fol'lllll­

las with structure 3/i'V.!'3y¢J(.r,y, S') whc•rc• 4> is quaut.ilil'r fn•c•. <:i\'t'tl !.his

formulation, Papadimitriou and Yanrmkakis dditwd MAX NP as t.llt' c·lass

of problems whose solutions have tiH' fmm ma xsl{ .r: I'3.'Jtjl{ .1·, ,11, S)} I t.ltitl.

is, problems whose solutions 8 satisfy tlw maximum llllllllll•r of dilfc·rc·nt ,,.

rather than all of them. Papdimitriou nud Ynnnnlmkis nlso dc·litll' suhdass

SNP of NP of the form 3S'r/:rr/J(.r:, S') and suhdass MAX SNP of MAX Nl'.

The formulation of SAT, t.he hoolc•au fonnula sa.t.isfiahilit.y prohll'llt, in c'a"h

class is given in Tables 21 and ~2.

Class MAX SNP will be important. later in this sc•d.ion, as will l.lu~ followill).!;

reducibility.

Definition 33 ( [PY91], p. 427) Lei. II nnd II' br: lwo oplimi::alion (maximiza­

tion or minimization) pmblcm.~. Wr: tmy thai. II !...-reduces lo II' (11 -;S,,, ll') if 1/u:rr:

are two polyTu~mial-timc algOT·ilhms J,g and cmu>lftnl.'( n,{l > 0 suf'h. lhfll. fo7' f'ru·h

instance I of n:

(Ll) Algorithm J produces an instance I'= f( I) of If', .'>?u~h. lh.al llu: optima of

I and I', 0 PT( I) and 0 PT(f') , rcspccli11cly, satisfy 0 fJ'I'(I') $. a 0 fJ'f'(l)

II :J

Page 132: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

SATE NP

Instance: Boolt>an formula 8 in CotJjunctivc normal form i.e. clauses composed of variab!Ps linked by disjunctions (logical OR), which arc linked by con­junf'tions (logical AND).

Formula: 3'/'V(:3.r.[( P( c, ;r.) 1\ :l: E '/') V ( N( c, .r.) 1\ -{r. E T) )J,

whnc~ P ami N encode the instance 8 ( P( c, ;r.) means that variable a: appears urJnegatcd in clause c of S; N(c,;r) means that variable ;r appears negated in clause c of S') and T is the set of true variables c·orrPspotuling to a particular assignment for S.

MAX SAT E MAX NP

Instance: Booh•ar1 formulaS in conjunctive normal form.

Formula: max·r l{cl3:r.[(P(c, ;r.) 1\ ;r. E T) V (N(c, x) 1\ -{r E T))]}l

whc•n• P, N, and T arc as defined for SAT.

Tahlc• ~1: Formulations of SAT in first-order logic (adapted from [KT90, PY91]).

114

Page 133: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,..., J~, .. - ..... ,.., .o'Uo~• • •- ... ~ .. ~ · I'' '" • -•• - ........ . ,,-.. ...... -.,,~ 11'< '"'• • ' •- ' "' ~., ... ... , .... , . ,., . .. , ,, .. •, h oao ,, .... ., , , .,...., __ .,. .. ~.....-.,, .. -..~ ...... - _ __.._._...,..,111• .. --_,....,...,.,...._H"'/C',~

3SAT E SNP

Instance: Bool<·an formulaS in coujuncti\'t' normal forlll, in whic·lt 1'/H' h dmtsc• has at. most :1 variables.

Formula:

( (.r" .1·2, .r3) E Co --+ .1:1 E TV .r2 E '/' V .1·:• E '/') 1\

((:t:" .r.2, .1::,) E C1 --+ ·''• ¢. '/' V .r2 E TV .r;1 E '/') 1\

((.r" .r2, .ra) E C:2 --+ ·"• ¢TV .1'2 ¢ '/' V .1':1 E '/') 1\

((;r" :r.2, .r3) E C':, --+ ·''• ¢. '/' V .l'·z ¢ '/' V .1';1 rJ '/') j,

where G0 , G" G2 , and Ca encocl(• t.IH' inst.a11n~ S k = {.I'J• ·''2•·'':d E (.'1

means that variables :r.1, ... ,;r:i an• 1wgat.(•d and variaJ,It•s .r1 t 1, ... , .1·:,

are unncgated i!• r.lans<' e of ,C,') mHI '/' is t.lu• s('t. of t.ruc• vari11J,h•s c.orrcsponding to a pa.rtir11lar assignnwnt. for S .

MAX-3SAT E MAX SNP

Instance: Boolean formula 8 in conjunrt.ive tiOI'IIHtl form, iu wltid1 c•adt dausc• has at most :3 variables.

Formula:

(( :r .,:r.2,a::,) E q,--+ :r. 1 E 'I'V .r.2 E TV .t:1 E '!')/, ((xa, :r2,:r.:1) E U1 --+ :r.1 rf.'I'V :r.·l E 'I'V .r;s E '!')/\ ((:r." :r:.l, :r.a) E U2 --+ :r.1 ¢'I' V :r.2 f/:. 'I' V .r:t E '/') 1\ ((:r.t, :t2, :ca) E G'a--+ :r., f/.'I'V ;1:2 f.'/'V ;t::t 't 'I') J }1,

Table 22: Formulations of SAT in first-orcl(!r logic (c:ont'd from Tal,f<~ :ll ).

115

Page 134: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

(L2) (,'im·n any solulion of I' with r·osl c', algorithm !J ]Jt'nduccs a solution of 1

milh. t'O.'il r: sul'h lhal lr:- OPT(I)i :$ f31c'- OPT(I')I·

L-n~dudbllit.y is dosely related to most of the other defined approximation­

pn·sNviu.e; n•dncihilit.h•s [CP91, OM90]; indeed, a constrained L-reducibility ap­

plinthl<~ t.o pairs of maximization or minimization problems was defined inde­

JH'IHI<•ut.ly by II. Simou [Simii~H]. Following Simon, 7' = ofJ will be called the

r·;rpnn:·;ion of a giv<'ll L-redudion.

Lemma 34 ([PY91], Proposition 1) D-rcdudions compose.

Lemma 35 ([PY91], Proposition 2) If 11 S1, ll' with cxpanswn r, and n'

/w,o.; (I rflrtfi-llf' U]JJli'OJ'imation f 1 then fl has a rclafivc app7'0X1:maf.ion 1'L

Corollary 36 lfl I 'SL f1' with c.rpansion r, and "n has a relative approximation

l'" =}X, th.rn "II' has a rclatirJf~ approximation;"=} X.

Not.<• t.ha.t. n•st.rirtiott n•ductious arc trivial L-rcduct.ions in which a = {3 = l.

Two following t.wo thco•·cms give hounds on approximability using the rela­

t.ion:-;hips mnong classes in the function and language bounded N P query hierar­

<'h it•s:

Theorem 37 ([WagK89], Corollary 16) No evaluation problem A such that

lilt' mrrr.o;;po11din,q decision problem Aodd i.e. ''Is OPTA(X) odd?", is pNP{O(Iogn)L

ltfl1'd ('lUI ltn!lf 1111 nbsolttff' np]H'o.r.imafion ~ O(log n) unless pH= e~.

116

Page 135: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Theorem 38 ([Kre88], Theorem 4.3) For· n•n·y OJIII'{.f(u )j-lwrd n•altwlifllr

problem, .f(u) i,~ .'Wwolh and .f(u) E O(logu). lhrrr r·.ri.-;f.-. t '2: (} ,-;ul'h that 1'1'1'1'!1

absolute appl'orimafion 11111.'~1 hat'f Mlur ~ 41/(u') injinildy o/fru .

The proofs of each of tht•sc.• tlwort•ms noll• t.hat <Ill ahsulut.t• approxirnat iou al­

gorithm reduces thC' range.• in which 1.\H' cost. of t.lw opt.i111al solution liPs, and

that a lmfficicntly rcdnrl'd rangt> may ht• st•ardwrl wit.h ft•wl'i· N P qrwl'it•s t.han arr•

required to solw lht• evaluation prohlt'lll. 1\r<•nt.PI's t.ht•ort•m will lw llsl'rl in t.l11•

following sections. This tlworem implic•s almost. Hll known rount•t·t.ions hi'I.Wt't'll

approximahility and t.hc fund.ion houtldt•d NP ti'H'I'Y lli<•l'at'<'hy.

5.2 Absolute Approximability

There arc many elegant proofs which show t.lwt ahsolut.t• approximat.ious do nut.

exist for particular problems [G.J7!J, IIS7H, WWHfi]. llowt•vr•ro, swh n•srllt.s ntn

also be derived for classes of prohlerns.

Theorem 39 The followin,q hold:

I. No OplP{c log log n + 0( 1 )}-hard r:valualion 1n·oblnn f'flll hmu· tm ttb.•wlult·

approximation c :::=; o(logn) lllllr;!iS P = NJJ.

2. No OptP{O(Iog n )}-hard evaluation pmb/r;m t'tm hrwr· tm flb.•wbdt: rtppm.ri­

mation c $ o(poly) unlc.<;s P = NfJ.

117

Page 136: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

,'J. No V/'~1'-lwrd r;valuafiou pmblr:m ran havr· au ab.c;o/ulr: approximalirm c ~

O(Jm/:tJ) uulr·.o;,<; II. = N P wul Pr·wP = N JJ.

4· No o,,,,,_('07flplr:/(: mmlualiou problr.m. mn halJe au. absolute approximation

(' ~ O(Jmly) unh·s:; I'= NP.

Proof:

h·oofo; of (I ::!): Follows from Theorem :!8.

/'roof of (:1): Follows from (St'IHI, Tlworcm 12] and [Scl91, Corollary 4(ii)].

!'roof of (1): Follows from (l\re8H, Tlll'orcm 4.1]. I

Corollary 40 '/'h.r following hold:

I. No l'luLmdf'r f'ompalibilily, WI weighted phylogeuclic parsimony, or unwcighl­

nl tli.o;Jaua-malri.r. filling oplimal-cosl solution problem examined in this

th,.,o;;,o; /ta,o; flll alJ!wlulr appro.r.imation c $ o(poly) unless P = NP.

t!. No ll'f'i,qhlnl phylogruclic pa1'!~inwny oplimal-cosl solution problem e.r.amined

iu lhi.c; lhrsis ha.'i an absolute approximation c ~ O(poly) unless R = NP

aiHl I' = fi'rwP.

:1. Nour of SOI.~-MIN-FU/!7[FJ], SOL-MIN-FUUT[Ft, ?:.}, or

.WJI--MIN-Fl!U1[?:.} have au ab.~olulc approrimalion c $ O(poly) unless R

= NP and P = FrwP.

118

Page 137: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

4. Nonr of SOL-M IN-FUDJ/FJ]. SOL-.\1/X-Ffi 1>'/[f.'j. ,I.,'OI.-.\IIN-I·'l:I'T{F1}.

SOL-MIN-FUDT{f2}. or SOL-.\1/N-/·'(i{!'!fl•;. ?.} IHII't' till llb.·mlrth llflfll'o.r­

imafion c $ o(poly) unlr .... -. I' == N I'·

Note that r('snlts 2 and :J in Corollary ·10 can also IH' dt•rivt•clusin,l!; t.ht• p<uld<~hil­

i ty of tlw associatPd PValuation prohlc•ms and t hc•ort•Jns from [ N ig;i!i] ( sc•c• a lsn

[WW86, Section 9.1.2.1] ).

5.3 Fully Polynomial and Polynomial Time Approxhua­

tion Schemes

Consider fully polynomial tinw approximation sdtc•mt•s. ( :an•y ancl .Johnson

derived sufficient conditions for FPTAS non-appwximabilit.y usin,l!; till' nut.iou

of strong NP-complctcncss. An NP-rompldc~ dc•dsion probl«'lll is .o;/mn,qly N/'­

complclr. if it has an NP-completc subprohlc•m in which allllllllliH'rs arc· luJJ!IJdc•rl

by some polynomial of the instance length [C:.J7n, p. 9!)) . No solution pi'Ohlf'lll

whose corresponding decision problem is st.rougly NP-compld.c~ and whosP op­

timal cost is polynomially bounded cau haw illt FPTAS lllllPss P == N P [( U7H,

Theorem 6.8 and Corollary). These conditions are sat.isfi(~d hy alluuwdgltl.••d phy­

logenetic inference optimal-cost solution problmns examirwd ill t.!Jis t.lwsis. c:aJ'(~y

and .Johnson also defined pseudo-polynomial redudions, wltidt pn~s•~rv•~ sl.roug

NP-completeness [GJ79, p. 101]. The reader can verify that. all wdud.ious givm1

in Section 3.2 from unwdghtcd to weighted problems am also psi!Udo-polyuomial

IHJ

Page 138: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

n•tludionsj IH'IWl!, all Wf'ighted der.ision problems PXamined in this thesis arc also

!-il.r<mgly NP-complet.e. These• same recluc.tious also map into subproblems of the

solut.iou prohl(•ftls COJTl'SJ)(HJtliug to these weighted problems whose optimal costs

an• polyrwruially borutdl•d.

Theorem 41 No phylotJf· ndir~ iuj('f'£:1/.cr. opfimal-cm;l ~rwlulion problem. examined

in lhi.o; ff,,.,c;;,.., lw.c; tlll FP'Ii18 tw/r.c;,'l P = Nf'.

( :orrsitlcr polyuomial time approximation schemes. The traditional approach

l.o PTAS IIOII·ilpproximahilit.y involvc•s proving that the given problem cannot

lraV(' <t r<"l<t.f.ivc• approximation c, c. > 0 unless P = NP [IIS78, G.J79, WW86].

Sudr proofs typically derive contradictions by using polynomial-time graph ex­

pansious to "ctmplify, relative approximation algorithms such that cost-restricted

NP-n>lltpldc• suhprohlems l'an he solved in polynomial time. However, few

prohlc•ms ha.v<' tiH' cost-restricted NP-completc subproblems required by this ap­

pmadt. A morl~ wid<'ly-applicahle technique has recently emerged from the study

of int.«•t·act.iv<' proof systems (see [.Joh92] for an insightful review of the results dc­

srrilu•d IH'Iow). In what was initially thought to be an isolated result, Feige et

al. (FGLSSHI] showed that no constant relative approximation c, c > 0 exists

fur SOL-1\lAX-CLIQUE unless NP C DT IN/ E(nloglogn). This result has been

clr<uHat.irnlly improwd by Arora ct al. (ALMSS92]:

Theorem 42 ((ALMSS92], Theorem 3) If there is a PTAS for SOL-MAX­

,'IS..\ T lhrn P = NP.

120

Page 139: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

This result is significant because' SOL-t\IAX-:JSAT is romplt'l<' for t\IAX SNP

[PY91, Theorem 2], and many problem:!' ran lw shown l\IAX SNP-Imnl \'ia L­

rcductions.

Theorem 43 (Proposition 2, [ALMSS92]) Tht'l'c don• uol I'J'i.o;/ (l 1''1':\8 for·

any MAX SNP-har·d pi'Oblcm rw/rss P = Nl'.

Several MAX SNP-hard problems of JHU-tkula.r illl.t•r<•st. art•:

• SOL-MIN-STEINER TREE IN GHAPIIS [BPS!J, Tlwon·m '1.~].

• SOL-MIN-VERTEX COVER.-B, in which t•ach vt•tl.t•x in t.IH• p;iwn p;mph

has degree :5 B, and SOL-MIN-VEHTEX COVER [PY!>I , Tlwon•nt ~(d)],

• SOL-MAX-CLIQUE (CFS91, Theorem 6], and

• SOL-MAX-X3C-B, in which each clenumt. in the given sd on:ms in :5 IJ

3-sets, and SOL-MAX-X3C [Kan91, Corollary tl}.

Using these problems, it is possihlc to show many of tlw prohlt~tlls <~x;unitl<'d i11

this thesis to be MAX SNP-hard. In the following, defiue a mnonim/ ... o/ulion for

a problem X as a solution to an instance of X produced hy t.lu~ l't~dw:t.ious giv<!ll

in Section 3.2, e.g. the canonical trees for FUGT(~).

Lemma 44 The following hold:

1. Given a solution W of cost c to an inslrmcc of ,<j(Jf_rMIN-ff/J(,'(,~',' or SOir

MIN-UBQC8 derived by the reduction fmm VB/i'/'BX UOVI~U t;im;n in

121

Page 140: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

{lJ.JSHfi] (s('(; Tablr· .9), in polyumnialtimc we can fiwl a r:anouical solution

W' with r·ost r:' ~ c.

:.!. Uivr:n a solution W of co.'il c to an inslaucc ofSOL-MIN-UBCDo or SOL-

MIN-(1/J(J/Jo rlr:rivffl by 1/u: t·cdudion from VERTEX COVER given in

f/J.ISHfJj (.o;r·r· TaMr: .9), in polynomial time wr: can find a canonical .c;o/ution

W' with msl r:' ~ r:.

:1. (,'illr:n a solution W of r:m•l c to an iu.<;/anr.c of SOL-Ivi/N-UBCC/ or SOL-

MIN-ff/J(JC'I dr.ri11crl by the rrduction from VERTEX COVER given in

{/JS87} (.wT Tnb/r: 10}, in polynomial lime we can find a canonical solu-

lion W' with r'o8t c' ~ c.

1· Giur11 n ,o;o/uliou W of cost c to an in.c;fance of 80D-MIN-FBUT2[Ft} de-

l'ivnl by/he rcdudion fi'Om X:JC g1:vcn in [I\'A-186} (Sec Table 17) , in poly-

IWIIIia.llimr wr r·an find a canonical solttlion W' with cost c' ~ c.

.r,, (,'ir,r·n a :wluli011 W of cost c to an instance of SOL-MIN-FUDT[Fa] (a E

{I,~}) flrri11cd by lhc rcduclions from F:JUT2[a] given in [Day87} (see Table

!?), in polynomiallimr we ran jiud a canonical solution H' ' of cost c' :5 c.

Proof:

l'mof of (I): If c > 2IXI, then replace W by the tree 1V' in which each member

of .1· E .\',.l' = {lli,l'j} is co11necled to 0 by edges {{vi, vj},{vi}} and {{vi],O};

t.his lrt•t• is canouical. and has cost c' < c. Otherwise! create H' ' by trimming

122

'i -:~ . j

,. ·~ ~

:.l ·.:., ..• .' -l

J •,

I ,. .I

l

Page 141: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lV to remove all leaf vcrlict>s not. in .\, and applying t.ht• t.rt't' t.rmtsformat.ious in

[DJS86, Lemma lJ to the leaf farthest. from 0 until tht• t.n•t• is fanouiral. Tlw~w t.n•p

transformations do not incrcast• tr<.•e rost.; mon•ov<·t·, as <'adt such t.rausfortn11t.iun

removes at least one non-canonical wrt.cx and t.ht•n• CCIII he a t most. t'- (lXI + I)

such vertices, this algorithm is pol.ynotnia.l t.inw.

Proofs of (2- :1): Analogous to t.ha.t. for (I), using t.hl· t.n'l' t.rausfurmat.ious iu

[D.JS86, Theorem :3] ( Dollo) and [DS87, Lt.•mnHt ~ aud TIH'ot't'lll :J] ( Cltrontosouw

In version).

Proof of (4): CrC'ate W' as follows: if tlww a.n• p11rt.it.ions t.ltat. p;roup vc-rl.ic·c•s

not connected by edges iu the creat<•d graph (,', t.lwu hn•<tk all SIIC'It p<trti t. io11s

into partitions that only group wrt.in•s contwd<'cl hy t·tl~t·s in U. For t'ill'h group

of vertices corresponding to a suhgraph U,., if t.lu~ 11('orflr•rs" { :r., , 1 , .•·,. .~, .,. ,. ,;i)

arc all included in partitions of Grr, then replact~ t.ht• st•l. of part.itious for t.lw

vertices of Go with the four triangle-partition:; in t'lfllill.ion G; <'Is.-, n•plan· wit.h

the three triangle-partitions in equation 7 ami t.ltt• appropri<tl.t• sullsd or siuglc•­

vertex partitions drawn from the set. {;t,v,1 1 ;1:n,:l,;r.r.,a}. Tlu•s!' t.ra.nsforlliitl.iow; rio

not increase the cost, as these triaugk'-part.it.ious an! opt.imalutHic!r t.lw /•', st.at.ist.k

(sec Section 3.2.3); moreover, as there ar<' only ICI such p;roups l.o l.mnsfonu , l.lw

algorithm is polynomial time.

Proof of (5): Create tree W' by applyiug t.ht! tree t.rausfornHttiolls iu {l>ayX7,

Proposition 3) to W to create a discretized additive~ t.n ~c ! W" wit.J. <ut ult.t·<urwt.ri• ~

Page 142: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

subtn~e W(J, and theu applyiug t.he transformation in [Day87, Proposition 4] to

reduce W(f to au ultrametric subtree of height 2. These transformations do not

ill<T(!ase tit(~ r.ost of the tree, and can be performed in polynomial time. I

Theorem 45 7'/u: following hold:

I. HOlrMIN-VeH'I'BX CO\IEH-B '5:1. 801..~-M/N-UBCCS and

SOL-MJN-UB(JC~'

2. SOL-MIN- VERTEX COVE!l-B '5:D SOL-MIN-UBCDo and

SOL-M IN-l!flQ/Jo

:l. ,C.,'OL-MIN- VEU'J'HX COVBR-13 '5:1. U/JW.

4. SO/rMIN- VERTBX COVER-B '5:D SOL-MIN·UBCCI and

80/.,-M I N-1 I BQCI.

5. S(J/.,-J\1/N- ~'ERTEX CO VER-B '5:L SOL-MlN-UBGc.

fl. ,)(J/.J-11/AX-CLJ(Jl!E $L SOL-MAX-BCC.

7. SOL-11/AX-/JCC ~~- SOL-MAX-BQC.

8. SOL-MIN- VHU1'EX COVER-B ~L SOL-MIN-F'UGT{?.] .

.tJ. ,c.,'OL-!1/AX-X:JC-U '5:I.J SOD-MIN-FBUT2[Ft}.

10. SOL-MIN-F/3/IT!!{FrJ '5:L SOL-MIN-FUD1'[F('(} (o E { 1, 2} ).

124

Page 143: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Proof:

Proof of ( 1 ): Consider the reduc.t.ion fmm \' C t.o C: S' ( = tJ HCCS aud tJ BQCS)

given in [DJS86] (see Table 9). As 0 l'T\'Cu ;::: I Bl/ H ami 0 JJ'I'c·s :5 ~11~'1. tht•11

0 PTcs :5 280 PTvc8 , satisfying condition (L I) with n = '2H. J\:-; hut.h pruhiPms

are minimization problems, condition ( L'2) ran lw n·writ.t.t•n a~ cn~11 :5 () V/'n·, +

{3( ccs - 0 PTcs ). For any canonind solution for U BCCS, n •c·11 = c·r .•s - lf•JI;

moreover, such a solution is guara.nt<.'<'d by Part I of Lt·mnw. ·1·1. St'tt.inv; {~ = I

makes condition (L2) equivalent to cvr.·11 :5 rr:s- ! E'l· llc•un-, t.hiH l't'durl.ion is

an L-reduction.

Proof of (2): Consider the reduction fi'Oill \IC l.o Do(= liBCI>o aud IJBQI>o)

given in [D.JS86) (see Table 9). As OPTvc:11 ~ lEI/ /J, OJJ'f'tJo 5 :ij\lj +~l/~1. and

lVI :5 lEI, then OPTvo :5 5BOI'Tvr:111 sat.isfyiug romlil.iou (LI) with n =fill.

The remainder of the proof follows that for (I), suhstit.ut.i11~ t.lu• appropl'iai.P part

of Lemma 44 to obtain f3 = l.

Proofs of (3 - 5): The proof for (:J) is ideilt.ieal to ( 1 ). The~ proof of (•I) is

a varianl. of that for (2) which uses the r<!clur.t.iou giv<~ll in Tahle 10 to yidcl i\11

L-reduction with a = 58 and f3 = I. As I.Ju~ Gmwralized pMsilJiouy aitc!riou

can simulate any ordered phylogcudic parsimouy probl(!rtJ, {!'i) cau be! provc!d hy

a variant on any of the proofs for ( 1 - ~ ).

Pro-:ifs of (6- 7): By the reductions given iu Table 1-1, solutions to SOL-MAX­

BCC (SOL-MAX-BQC) yield solutions to SOL-MAX-CLIQUE (SOL-MAX-BCC)

12!)

Page 144: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

of tlw sanw c:ost .. lienee, these reductions yield L-reductions with a = /3 = 1.

l'mof of (H): The reduction given in [BP89] which shows the MAX SNP­

!JardrH~ss of SOL-MIN-STEINER TREE IN GRAPHS is actually from SOL­

MIN-VEHTEX COVE.R-B t.o SOL-MIN-STEINER{I,2), a version of SOL-MIN­

STI~INER TH.EE IN G Ri\PIIS whose input is complct.c gra}Jhs with edge-lengths

E {l,:l}. llowPwr-, SOL-MJN-STEJNER(I,2) is a subproblem of SOL-MI N­

FIJOT(~J. As all solutions to any instance of SOL-MIN-STEINER(1,2) will

satisfy tlu~ dominance condil.iou, SOL-MIN-VERTEX COVER-B L-reduces to

SOL-M IN-FIIC:T[~) wif.h o- == 2B and /3 = I.

l'mof of (.9}: Consider the r<.•dudion from X3C to F'BUT2[F,] given in [KM86)

(s<·<~ Ta.hle 17). Note t.hat iu a B-hounded instance of X3C, 3(8- 1) + 1 is the

m<~ximum llllllllwr of 3-scts that can share one of the values of a particular 3-set.

llenn·, tlw sel< ~c.t.ion of any :J-set call prevent the selection of at most 3( B - I)

ol.lu•r:J-st'l.s inC; thus, OP'l'X3c8 ~ ICI/(3(8-1)+1). As OPTFBUT2[Ft) s; IE!,

1111d lEI = 2JICI, then OPTFBUT2(Ft) s; 21(3(8- I)+ l)OPTxJc8 , satisfying

nmdit.ion (LI) wit.h a= 2I(:3(B -I)+ I). As X3C is a maximization problem

mul FBlJT2[/t'J] is a minimization problem, condition (L2) can be rewritten as

cxac·11 ~ Ol''l'x3c11 - (3(cFHli1'2(F!)- OPTFBUT'l[Ft])· For any canonical solution

for Jt'HllT2[Fr]. cx:1c11 = (lEI- CFBUT2[F1J)f:3- 3ICI; moreover, such a solution

iH guaranl.l•<•d hy Part. 4 of Lemma 44. Setting /3 == I /3 makes condition (L2)

<'quival<•nt. to cs3r·11 ~ (lEI- cFBUT:l[F1J)/:3- 3ICI. Hence, this reduction is an

126

Page 145: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

""\''~,., ~ · ·r-no\ ~,. •.• ,, - ... , •• .. _ .. . . ... .. ~ .... , ~~ .... .--... • ..,....,.-..,..., ... _.,...,,.._ • .; , • .-~ ...... , . ...... ... -~...-~,,_.._............_,o1(,...,,,..._._,. ___ .,.,_._. ._,,_~......,...

L-reduction.

P1'oof of (10): Considct· the reduction from F'BU'I'2[/;:t] t.o FUJ>T[t·:t] o E

{1,2} given in [Day8i] (sec 1~1blc li). As OPT/o'Hil'l''l(I·;,J = O!''f',.·llll'I'(I•:,J• condi-

tion (Ll) is satisfied with o- = I. As hoth prohl<•ms an~ miuimiimt.iou prohkms,

condition (L2) can be rewritten as CJ-'f1H7'2(1·:,) $ OI"J', .. ,,H7'2(1·:,) + ;'l(t't·'l'llT(I·:,J-

0 P1'pu DT[F ... j)· For any canonical solution t.o FU DT[ /•:,], r!-'1 'll'/'[1·:,) = t'FIIII 'I'<![I·:.rl·

Yo+ Z,.., where Yo= 0 and Zo > 0 [Day87, p. '1(}1)]; mon•ov<·r, such a solutio11 is

guaranteed by Part 5 of Lemma t\'1. SC't.t.iug {J = l nwk<•s corulitiou ( L~) Pqlli\'·

alent to CFlfBT2[Fn) $ CFUDT[F ... ]· Hence, this wdudiou is <Ill L-r<'dlll't.ion. I

The arithmetic equivalence rcdudions from F'BllT[Fd t.o FBll'l'~l/·~1] and

FBUT[F1 , 2::} to FBUT2[F2, 2::) given in Sc•diou a.~. :\ m<• L-n·cluct.ious wit.h n =

/3 = l. However, t.he reduc.t.ion from FUDT[Fd t.o FODT[F] tloi'S no/ H<'<'lll t.o II<•

an L-reduction; though condition (Ll) is sat.isfh·tl (n = 100), condition (L~) (lo<'H

not hold under any constant {3.

Corollary 46 No phylogenetic infr:1'r.nr.c 01Jlima/-(·osl ,.;o/uliou problt:m f'lfl/IJ.i71.('(l

in this thesis (e:ccluding SOL-MIN-Fl/IJ1/F}) Ita .<> a fJ'f'AS 1tulf'.~s fJ = Nl).

Less dramatic but nonetheless intriguing PTAS non-approximal,ilit.y l'l~slllt.s

can be derived using Theorem :J8. A PTAS for an OptP[f(n)]-c:ompl(!t.(!evaluat.ion

problem X implies t.hat there <!xists for cac:!J c, 0 < (: :5 I, aurl all iust.aun! I of X,

I ' I . I . h A I tl t > IAxU!-OI'TxCI>I(~) ' a po ynomm -tunc a gont rn sue 1 ta c _ OI'Txll) Ax(l) ~

1~7

. ,~'·

Page 146: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

IAxU>;f/.~;TxUll. By Theorem 38, there exists an c > 0 such that IAxUl;,?,?;Tx(lll ~

~ 2}1\';:/ infiuitdy often for each such A unless P ::: N P. For certain f this lower

llouud is <L posit.ive~valued function, which implies that polynomial-time algo-

rit.!tms for certaiu c, and hence PTAS, do not exist for X.

Theorem 47 If a smoolh/unclion f(n) E O(log n) is such that g(n) = :~;;<:/- and

lim g(u) > 0 for all c > 0, then no OptP[f(n)}-complctc problem has a PTAS n-+r.'(j

unlr·.t.;,o; /1 = N P.

Corollary 48 No Opt Pfr~ log log n + 0( I )j-c'Ymplclc problem has a PTA S unless

/1 = Nl'.

Though tht• rc•levant. levels of the OptP hierarchy in these results are too ]ow to be

of cous('qll!'ltc<' in this tl)('sis, these are the first results which show that specific

porl.ious uf l.lw houuded NP query hierarchy are PTAS non-approximable.

5.4 Relative Approximability

SI'Vl' l'al of !.he phylogenetic inference problems examined in this tl1esis have rela-

t.ive approximat.ious derived from rtpproximation algorithms for re]atcd problems.

STEINEH THEE IN GHAPHS has a relative approximation of 1 -2/ILI, where

/, $ 15'1 is t.lw number of leaves in the optimal tree [I<MB81). The algorithm

guanmtt•t•ing this approximation is given in Table 23. Note that the two crucial

o1wrat.ious in this algorithm (finding the length of, and producing, a shortest

128

Page 147: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

·· ~ ,.........,._..........,.., __ ... .. , ••• , .. •• .. . ,.,..,..r~-,-.-.. .,'-' .. '9•-. • .., - ..... -.,_-,_ .. _4-'I~•"""'"~"'•~••NJ!rb'""'..,."..,.._.,..... .. ...,....,.......,..,. ., .. __. ... _..,

path between two gh·cn vcrtict's) can lw donl' in polynomial t.inH' ttsinA !'t.au­

dard shortest-path algorithms [CLH91, Section 2(ij 011 a rha.nu·tl'r-hy-dwrart.t•r

basis in an implicit graph, providt•d tlwr<' arP no n•st.dct.ions 011 rharad<'l'·st.at.<•

transitions.

Theorem 49 All n.on-1'cficulatc ~Vag11r1· Dhlf'(ll', W flf/llt'l' (; ,. nrl'al, and Fit l'h

phylogenetic parsimony optimal-cost .~olulion Jlmblrm.-; r.rmuiun/ i11 fhi.~ lhf.-;i,-;

have relative approximations of I - 2/ILI.

The application of this algorithm to phylogt•nctic: parsimony prohh•ms w;ts dis­

covered independently by Gusfield [Gus~)J, Tlu.•on~m 2.1]. lnd(•<'d, I.IH' algoril.hru

and result above also apply to thtl prohlmn of const.ruding Ill i 11 i lllal-ll'llp;t.h t.J·<•t•s

on molecular sequences, as long as the fuud.ion comput.iug miuimal t•volut.ion­

ary change (edit distance) between pairs of seqltel\r.t•s is a nwt.ric. (<:nsH:J, SPr'l.iou

3J. Unfortunately, this algorithm does not. s<•em applkablt~ t.o otht•r plJylog•~Jwti• :

parsimony problems, a.o; the proof that the ratio ahove holds clt'JH'II<Is on t.lw exiH·

tence of a path between each pair of vertices iu X in f.IH~ i111plic~it. gmplt (1\MUHI,

Theorem 1]. All such paths may uot exist in cladistic problt~rusj Jnon•ov1~r, it.

is not obvious how a character-ordering aucl ol'i<!JJI.af.iou wuld IH~ chost!IJ for a

qualitative problem in polynomial time such that all requirecl pat.las exisl.l!d, ld.

alone how such a character-ordering or orieutatio11 could lm (!rtfom!d irt suhsP·

quent stages of the algorithm. SOL-MAX-CLIQUE has a rdat.iw! approxiruat.iou

of 0{ 10

; 2 n) [BH90] which, hy the L-rcductious in t.he last :wdicm, yidtls ici<HJI.it:al

l~W

Page 148: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Algorithm H:

Input: illl undiredecl weighted graph G = (V, E, d) and a set of vertices 8 ~ V. Output: a Stciucr tree, 7il, for G' and 8.

Steps:

I. Construct the complete undircct.ed weighted graph G1 = ( Vh E~, dJ) from (,' and ,C,',

~. (•'ind t.he minimal spanning tree, Til of G1; if there are several mininud spanning trees, pick an arbitrary one.

:J. Coust.rur.t the subgraph, Gs, of G by replacing each edge in T1 by its rom•spoucliug shortest. path in G; if there are several shortest paths, pick au arbitrary one.

·1. Fiud tlH' minimal spanning tree, Ts, of Gs; if there are several minimal SJ>a.uning t.rees, pick an arbitrary one.

:>. Constt·ucl. a Steiner tree, Tu, from Ts by deleting edges in T8 , if necessary, so t.hat. all l<'aV'~s in 1il arc in S'.

'nthiP ~:J: A polynomial-time relative approximation algorithm for STEINER THEE IN GHAPIIS (adapted from [KMB81])

130

Page 149: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

I .!· f !

~ ..... _ .,._,, __ ...._....,,,_ ... __ , ___ ......... - _,_ _ ___.,,,,>N~---... ........ .............. ..., ... .. - ............... ~.,_.,.,.,....~~

approximations for all character compat.ihilit.y problems.

Theorem 50 All chamctc r compatibility opt imal~t·o.-;f ,.,ofut im1 proMrm.~ r.rtllll~

incd in this thesis have rclalivr approrimalious of O(;-:-:r.1 " ). n~ II

No relative approximations arc known for any of t.lu' dis\.atH'C' matrix lit.t.in~ proh-

!ems examined in this thesis, though t.lwre are rdatiw appmxima.t.ions for rt'lal.t•tl

clustering problems; sec [Day92] for a wvit•w of th<'S<' n•stdt.s.

Theorem 43 actually st.at.es that. M AX-:JSAT has 110 n•lat.i V<' approxi lltaf.ioll

( for some l > 0 [ALMSS92, Footnote•, p. i] ; thus, hy Corollary :u;, t.ht• L-

reductions in the previous section imply hounds on t·t•latiw approxima.hilil.y ns

well. Unfortunately, values of (derived to dat.P using 1.\11' rotts1.nwtioll in Tlll'oi'PIII

4:3 imply only trivial lower bounds [.Joh92, pp. rll!Hi'lO]. Ot.h(•r l'SI.illHlf.t•s for

these bounds may be derived from t.lw hcsl. kuown t'<•lat.ivf' approxinmt.iotls oil

SOL-MIN-VERTEX COVER-Band SOL-MAX-X:JC-B:

• SOL-MIN-VERTEX COVER-B has re!Rt.ive Rppmximett.ioll (: = {O.:.ll'i , O.'l!i,

0.50, 0.56, 0.60, 0.64, 0.67, 0.69 I, 0. 71 } for :J $ H :5 II, aud r: $ 2( u1:~~- 1 ) -

1 for B > II [MS8:J].

• SOL~MAX~X3C-B has relative approximaticm r; = ( IJ- I), /J 2 :J IJ'B!)(),

Theorem :J].

The only nontrivial lower bound on relative approximalJility is for l.lw dtaradt•J'

compatibility problems, and is based on a result from [FGLSSUI] as irnpruw••l by

Arora ct al. [ALMSS92]:

l:Jl

Page 150: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Theorem 51 ([ALMSS92], Theorem 5) There cxisl.r; an E > 0 stLch lhat, if

.wn-MAX-CDIQIJB ha.c; a relative approximation of n.t, Own P = NP.

Corollary 52 Tlu:rr: exists an { > 0 ,r;uch that, ~f any character compatibility

Jn·oblf:m rxtuninr:d in thi.'l 1/u:sis has a rclati1Jc appr·oximation of n(, then P = NP.

As a fiual uote, relative approximations arc known for certain counting prob­

l<•rJJH. By Theorem :1.1 of [Sto8.5], all # P funct.ions f( x) have approximations

J,,1,1,(;r) i Jl Fll[; sudt that ( l - ( )]app( x) < f( x) < (I + f )lapp( x) for a11 polynomi­

als 71, wh(~re t = I /TI(I;rl); Theorem 7.1 of [KST89) extends this result to SpanP

prolll( ~rns. B.1•call that. an NP query can he simulated by an appropriate}:;~ query.

Theoren1 53 AlltJhylogcnclic infet'cncc optimal-cost, given-cost, and given limit

:·qmr111ing problrm.<~ r.xaminrd in this thesis have r·elative approximations of t: m

I-'Ll~, wlurr. t = lfT!(I/1) fo,. auy polynomial p.

5.5 Approximability by Neural Networks

Thc•t·c• has ht•c•u much inten.•st in recent years in computing approximate solu­

t.ious l.o optimi~alion problems using instance-specific neural networks [HT85a,

ll'l'Sr>h). In t.lu.• disct·l't.l'-time vrrsiou of this model treated by Bruck and Good­

man [BGno], a nema.l network is d escribed by a set of two-state nodes V, a set

of arrs with W(•ights W;,j that specify the input from node i to node j, and a

~t.al.t•- rhaugt• t.hreshold value 1i for each node. Let the state of node i at time t

132

Page 151: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

• _..,......_,& .. _,.__r,. ____ .,..,..,.~~----... -. ......... __ ._ .... ~~-~11 ............... 1 *"'''"~ .....

be Vi(t), and let the stair. of the network at. timet lw t.he Vl'rt.m· F(l). At. t'<Kh

time t after initialization, the stateR of l'arh Vl'l'lt•X \.'i in HOtnl' s11hst•t 8 ~ F i\l't'

updated by

V;(t+ I)= l WI if 2: Wii\lj(l) ?. 'Jj

J=l (B)

-1 oth('rwhw

A state V(t) is stable if V(t) = V(t + 1). Such n<'l.works m·c• always gum·;ml.t•t•tl

to get to a stable state (BG90, pp. I :J0-1 :J I] which ror·r·t•spouds to sonw sulul.iou

to the associated problem. Consider tht• following l't•st.rirtt•d class of sudt lll'lll'fll

networl<s that have symmetric weights and satisfy th<• followiug JH'opt•r·t.it•s IB< :uo,

p. 132).

• Each stable state corresponds to an optimal solut.iou of t.lu~ <~nrtHI<·d insl.atlf'<'

I of the associated problem X, aucl that. solution t:an I)(• tft•r·ivc•tl fru111 this

state in polynomial time.

• The network's description is of size polynomial iu 111.

A problem X is said to be solvable hy a ncural11dwork if t.ht~l'<~ t~xists ;m alp;ur·ith111

Ax which can, for any instance of X, gcrwrate the r.om~spontliug twnral 11dwork

in polynomial time. Note that s11d1 a ru~twork may pof.(!fll.ially t.akt! c•xpoii('IJI,ia.l

time to reach a stable state.

Theorem 54 ((BG90], Proposition 1) If an NP-Iwnl pmbif:m i.'l .c;ollltLIJif· /,y

a neural network then NP == co-NP.

Page 152: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

Corollary 55 No TJhylogcnctic inference optimal-cost, given-cost, or given-limit

JH'tJblr;m r::u.1.minul in this thesis can be solved by a neural network ttnless NP =

f'o-NP.

Alt.<>ruat.ively, l!ar.h stable state can correspond to an approximate, rathet' than

au optimal, solution . Many traditional proofs of approximability [HS78, GJ79]

l'illt IH~ trivially modified to show that certain approximations by neural netw01·ks

fot' NP-hard prohlems arc not. possible unless NP = co-NP [BG90, Yao92]. Indeed,

auy pi'Oof in whieh an optimal solution can he derived in polynomial time using a

giv<m t.ypc• of approximate solution can be so modified. By analogy with PTAS,

dl'line a Polynomial-Time Neural Approximation Scheme (PTNAS) for a problem

X as an algorithm A which, given an instance I of X and an integer k, k > 0,

produ~('S in polynomial time a ncmal network that produces solutions whose

('m;t. iH wit.hiu a fnclor of k of optimal. The following results stated above can be

n·phnt.•w<l in f.(~rms of approximability by neural networks:

Corollary 56 Thr following hold:

I. If thr1'(' is a PTNAS for 80L-MAX-3SAT then NP = co-NP.

:!. TIH' I'f docs 110! c.r.isl a PTNAS for auy "'/AX SNP-hard problem ttnh.ss NP

= ro-NP .

. ·1. Th rt·r r .risl.~ UTI f. > 0 sud~ that, if SOL-MAX-CLIQUE has a relative ap-

pro.rima.liou of 11r by 11 neural network, then NP = co·NP.

134

Page 153: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

•••......--•-••,, ..... _. ____ ,_ ,. ~ .. ".,....,...,.,-...-'llilo-IM--PIIIl-111·-·l'dl-l ___ llt .... D44UI_ ..... _"""_'_lM,_~·-·-~~~~----...............

Proof: Proofs of (1) and (3) (sketch): Construct. t.ht• t'Xact.-solttl.iott JU'III'al

networks for every problem in NP from the Hssllml'd PTN AS for SOL-l'v1AX-

3SAT and SOL-MAX-CLIQUE hy using the gt'tlt'ric n•dud.ions from all \au­

guages in NP to MAX-3SAT and MAX-CLIQUI~ giwn in tlw urip;inal proofs

in [ALMSS92, FGLSS91J as neural netwm·k dt•srl'ipt.ion-t'IICOding atul solution­

decoding functions. As NP-complctc problems are by clt•linitiou NP-Iw.rd, t.hc•

results hold by Theorem 54.

Note that unlike the proofs given in [BC:90, Yao!J~], tlwst~ proofH clo nol invulv<'

dcriviug optimal-cost solution ncut·a] networks for SOL-MAX-aSAT and SOL­

MAX-CLIQUE from their respective PTNAS.

Proof of (2}: Note that L-rcdudions preserve PTN AS-approximahilit.y as Wt•ll

as PTAS-approximabi1ity. I

Corollary 57 No phylogcneUr. iujcrcncr optimal-t·ost .'inlulion prohlfm f:r.mnitu·d

in this thesis has a PTNAS unlc.o;s NP = co-NP.

A construction similar to that in Corollary !)6 can also bt! used t.o show t.hat. uo

MAX SNP~hard problem has a randomized PTAS (RPTAS) (BSn, J{LX:J], i.e. IL

PTAS which for each t:, 0 < f. < 1, guarautccs a soluticm with t.lw requiwd msl.

with at least probability (I-t:), uuless R. = NP [.Joh!J~, p. 51!J).

The results in this section apply ouly to the restricted r:lass of ltf!ltralrldworks

considered in [BG90). Less constrained types of ueural rtel.\mrks may ~~xist for

13!)

Page 154: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

tlwse pmblems; for r)xample, .Jagota [.JagfJ2] has designed asymmetric-weighted

ru•tJral udworks for MAX-CLIQUE that perform extremely well on average.

5.6 Summary

Tlw kuowu t.lwordic:al and algorithmic. lower limits on approximability for the

pltylogt'llf'f.ir: infr•re11ce problems examined in this thesis are given in Table 24.

'l'hough t.he logie-formulat.iou of approximability has produced the most dramatic

r<•:mlt.s, t.hc! various theorems derived using the work of [G.J79, Kre88J should

uot IH' dismissed, as these theorems establish a teutativc connection between

vario11s t.yp<•s of approximahility and the levels of the function bounded NP query

hir•rarchy. Though the correspondence is not. exact ([Kre88, p. 492]; [CP91, p.

~·I:Jj), t.lwr<' is a. pat.t.cm of approximahility and non-approximability (see Table

~!>). This pat.l.l'rll may assume gr<.•atf'r significance in the light of future discoverit>s

of lo\Vl'l' limits on approximability.

ThP n•sult.s ahov{' imply that polynomial-time algorithms whose approxima­

t.iou bouuds hold OV{'f all inst.arH'l'S do not exist for any phylogenetic inference

optimal-cost. solution problem for any of the closest types of bounds (i.e. ab­

solut<' approximation, FPTAS, PTAS). These results do not invalidate either

<'Xist.ing phylogt•twtir inf<>n·nce approximation algorithms or phylogenies pro­

dun•d hy t.hl'S(' algorithms - oth{'r kinds of fast approximation may be possible

(t'.~. asympt.ot.i<- approximations. whose bounds hold for all but finitely many

136

Page 155: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

· -· • • - - ~ ... . . ... ..... -~ · - •• ••• - • • • .. ... .. 'y • • • ~,, .. ' ~ . ..... _. _ _ . , , ,, . ~ .. ..... .. . . . . .................... ....... . ,-........... ~ ... _, ""' ___ .. _ .. _"_ ... . ... ~ . ......... ... ~- - ·--'~

Approxima hilit.y Tht•orC'tical Alp;mit.hmic Lom•r Limit. LoWt'l' Li Ill it.

Phylogenetic WL, WG, Fi 1( I - l~l)- I Parsimony t n•l. app., l't'l. app.

CS, Do, Cl, G<' { > 0 .

Character 11' t't•l. app., 0( Jl ) f.,;r,; Compatibility ( > 0 n•l. app. Distance Matrix FUDT[F] 110 0(7wly) ahs. app., Fitting uo FPTAS .

All Others ( rd. app., ( > 0

Table 24: Approximability of phylogcnetir iuferetln' optimal-cost so)ul.iuu fuuc­tions.

Ahsolut.<~ A pproximat.ions o(log n) o(poly) O(poly) FPTAS

F pNP(cloglogn+O(J)J X - - X F pNP(O(Iogn)) X X ., X FP,NP

F~Nr X X X '{

X X X J

X - Whole da:1s is IIOII·approximahln. J - Members of class are approximahh~.

- Approximabilit.y uot. relevant.

P'I'AS X t J J J

Table 25: Non-approximability of various lr!vds of I.}J(l Fundiort Bottllrlt!d NP Query Hierarchy.

t Applies only to complete problems.

Page 156: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

iust.<uJc:c!sj snlH~xponent.ial-timc approximatiou), and phylogenies derived from in­

~.t.;u1c:c~s «!IIW11n1.1~t·c~d in practic:e may he among those that arc close to optimal.

llowc~ver, in <LilY application such as plr logcnctic inference in which the degree of

optimality of approxi mat.e solutions is important (see Section l), no approxima­

tioll algorithm or solution produced by such an algorithm should be trusted until

aualysis lms shown exactly how good an approximation that algorithm gives.

Page 157: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

6 Conclusion

In this thesis, I have establislwd a fnmwwork t.hat. itu·orpor·att•s all phylugt'tll'l.ic in-

ference decision problems studied tu dat.t•. Wit.hin this fra.lllt'\\'ork, I ha\'t' dt•rivt•tl

various bounds on the evaluation, solution, spanuiug, t'llllnwrat.ion, and muclo111-

generation versions of the optimal-cost, given-cost., and giV<'n-limit. phylog<'llt'l.k

inference problems. I have also derivC'd low<•r hun tuls on t.lw apprm:i mahi I ity ol'

phylogenetic inference solution problnms. Tlwst• rPstdt.s a.n• sllllliiHiri~t·d iu 'l't~hlt·

20 and 24. These results show yet again t.ha.t. t!Pcision pi'Ohlt•ms <'OIIn•al lllilii,Y

facets of the complexity of their underlying opt.imiimt.ion prubiPms. Tlu• romplt•x-

ity of more complex versions of optimizat.iou prohl('ms should hP inwsl.igal.<'d nul.

only to better assess the true dirficult.y of t.he utult~rlying prohlt•ms, hut. itlso lw-

cause such complexities may h<wc ramifications for how dosdy t.lt<•st• prohl••ms

can be approximated by fast algorithms.

Future directions for research arc:

• Determining the precise com plcxity of phylogenetic: i uferc•IJ('t! evaluat.iou ancl

optimal-cost solution pwhlems. If tlws<' pwhlems art! provably <'il.sic!r l.lmu

F pN P, more classes will need to he clescrihef! hdWI'«!II F fJN /'(()(lu~~,u)) aud

F pNP. Such a set of classes might bdoug t.o t.ht! fuuc:t.iou a.ual()gtw uf t.ht•

hierarchy developed in (CS9~].

• Determining the precise c:ornplexit.y of plrylogmwt.ic infc!f(!rtt:c! spamrirrg <UIII

J:m

Page 158: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

cmumemt.icm problems. Tltis may be possible using classes from the hicrar­

dties of fuucticms defitw<l in [KrdJ2a, LadR9, WagK86a, Wagl\86b].

• Firuliug algorithms with guaranteed relative approximations for the dis­

tance~ matrix fitting problems aud the remainder of the phylogenetic par­

sinumy problems. The latter may he possible by recently-developed algo­

rithiiiS t.hat improve ott t.hat given in [KMHSl]; sec [BH!Jl] and references.

• DNiving approximahility results for phylogenetic inference given-limit and

given-fosf. prohlems, based not on algorithms that guarantee solutions of

a pa.rtifulm· r:ost hut. a.lgol'ithms that arc either polynomial-time or· correct

on all hut. sonw polynomially-boundcd subset. of their instanc<:>s. Such a

ft·amc•work is dt•scribed in [SchU86, Section 3] and [BDG90, Section 6].

This fmnwwork is also applicable to optimal-cost solution functions.

Ht•sults from t.lw growing litt•ratur(' on computational learning theory [Kea90,

LV!JOa, M IJ.IHB] and the> computational complexity of local search heuristics

[.J FYHH, YanHO] may also he applicable to the further analysis of phylogenetic

in fc•n•tu't' prohh•ms.

140

Page 159: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

- -- • • ·:~'"' . ,. . . ........ :r .. _ ,.._.u .... ~-· ... - -- .... - - ""'·"' ............... ._ .. ·-~-.. ......... .-. .-.---......... _ .. _ ___.. -~-·

Referenees

[ABG91)

[ALMSS92J

[ADS86]

[ANI90]

[Ax87]

(BDG88]

Amir, A., Bcigt>L H., and GasHr<'h, W. I. .'iomc· C'onrtr·cliou ...

brtwcrn Bormdrd Qurt•y Clas.•;c·s awl Non-lluiform ('olll/daif!J.

Manuscript, H)91.

Arora, S., Lund, C., Mot.wani , H .• S11dan, l\L, and Szc'p;l'tly. ~1.

Proof Vaification and lnlrarlnbilily of Appro.rimaliou /'I'Ob/nn ....

Manuscript, 1992.

Ausiello, G., D'Atri, A., and Sacca, D. Miuirual H(•pn•sc•ut.al.iou

of Directed llypcrgraphs. SIAM .Jormwl ou ( .'ompulillff, lf1 ( ~),

419-4:31, 1986.

Ausicllo, G., Nanni, U., amllt.aliauo, G. F. Dynamic Mainl.t'llfllll'l'

of Directed llypergraphs. 'l'hrordim.l (.'ompuln Hr~ ic · t~cT, 7~, !17

117, 1990.

Ax, P. The Phylogcndif' Syslrm: 1'/u· Hy.'llr·mali:rzlion of (h:r;rl/1-

i.qms on the Basis of lhf'ir l'hylotJt:nir·,q, Tr·; •. uslat.«'rl by H. P. S .

. Jeffries . . Jol111 Wiley, New York, )!}87.

Balcazar, .J., Dfaz, .J. , arul Gaham), .J. Slnu·luml (,'omplr"J'il y

I. EATCS Mouograpl1s CJII Tlwordic:al Com put.r•r Sdl•ut:l! uo. I I.

Springer Verlag, Bcrliu, I !JH8.

141

Page 160: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[BDC!JO]

[Bd!J IJ

[BSH:l)

[BP~!J]

Bald.zar, .J., Dfaz, .J., and Gabarro, .J. Structural Complexity

1/. EATCS Mouographs on Theoretical Computer Science no. 22.

Spriuger Verlag, Berlin, I ~J88.

Bl•eri, C., Fagin, R.., Maier, D., and Yannakakis, M. On the Desir­

ability of Ac:ydir. Database Schemes . .Journal of the Association

for Computing Marhinrry, :lO(:J), 479-.513, 198:3.

B(~igd, H .. NP-hard Sf'ls arc p-supcrl.crsc unless R=NP. Technical

Hl'port. 4, The .Jolms Hopkins University, Department of Com­

put.(•r Science, 1988.

Beig(•l, B.. Bounded Queries to SAT and the Boolean hierarchy.

'/'hrorrliml Computer Science, 84, 199-22:3, 1991.

BNm<m, P. and Schnitger, G. On the Complexity cf Approximat­

ing the Independent Set Problem. lnfonnation and Computation,

BH, 7i- 9•l, 1992.

Bl•rn, M. and Plassmann, P. The Steiner Problem With Edge

L(•ngths I and 2. Information Processing Lcll.ers, 32, 171-t76,

tmm.

B(•rgc.'. C. Graphs and l/ypc1:qraphs. Translated by E. Minieka.

Nort h-llolland. Arnstc.•rdam. 1973.

142

Page 161: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[Ber85]

[BR91]

[BSS89]

[BH90]

[BG90]

[B.JY89]

. -~·- · .. --~·· · ·---·--- -~ ---~ --~-.. ---.. ------·----.. -·-·--·'111-J> ...... _

Berge, C. Graph.~. St•cotal rt•\'ist'd t•dit.iou. Nurt.h-lloli<IIHl. Ams­

tcrdatn1 1985.

Berman, P. and Ramaiyc•r, V. lm]II"OI'fll Appro.rinwlion.o; for lhr

Steiner 1'rrc Problem. ~'lanuscript., 19!)1.

Blum, L, Shub, M., ami SmaiP, S. On a Tht•ory of ( ~omput.nt.iou

and Complexity over t.lw H<'al Numlwrs: N/'-romplt•t.Ptlt'ss, Ht't'lll'­

sive Functions, and Uuiwnml Machint•s. Hulldiu (If lhr t\rrwr-ir·r111

Malhcmal.ical Soddy (Nnt' .'Jf'rif's), ~I (I), I ·Hi, I !II·!H.

Boppana, R.. aud llall<l<;rssou, M. M. Appwximat.itl~ Maxit1111111

lndependenl Sct.s by Exr.lu<ling Subp;mph:-;. Ill .J. H. c:illlt'rt. (IIIII

R. Karlsson (cds.) SWAT .90. Ll'ct.un• Not.•:-; i11 Cotnpul.c>r Sdt•un•

no. 44 7, Springer-Verlag, Berliu, I mm. I :l -:!!),

Bruck, .J., a11<l Goodman, ,J. W. Ou the~ PowPr of Nr~ural Nc•t.works

for Solving Hard Pro"lems . .lon1"7utl of (.'omp/,.Ji/y, fi, I:W I:~:,,

19HO.

Bruschi, D., .Jos<~ph, U., aurl Youug, 1'. J\ St.rw:t.ural 0Vf•rvii~W

of NP Optimization Prohlc111S. lu II. Djidjev (1~1!.) Oplirtuzl AI-

gotilhms: JJmcr.cdin!J·'~ of 1/u: lnlr:t•tuiiimwl Sympo.o;ium, L1~f'f.un•

Notes in Com puler Scieuw rw. ~0 I, Spriu.e;c~r- V<~rla.e;, u~~rliu, I !J~!I.

205-231.

Page 162: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[CS!J:l)

[CE(i7)

I< ~TH 1]

[CLH!HJ

Camin, .J. II., anrl Sokal, R. R. A Method for Deducing Branching

Sequences in Phylogeny. Bvolution, 19, :311-:126, 1965.

Castro, .J., and Seara, C. Characterizations of Some Complex­

ity Classes hct.WCC'Il e~ and 6.~. In A. Finkel and M . .Jantzen

( <·ds.) STA CS '.92: .9th Annual Symposium on Theoretical Aspects

of Compul.rr Scic1u·c, Lecture Notes in Computer Science no. 577,

Spring('r-Verlag, Berlin, 1992. 305-318.

Cavendt>r, .J. A. and Felseustcin, .J. S. Irrral'iants of Phylogenies

in a Simple• Case with Discrete States . .Journal of Classification,

·1, r>7-71, I !>87.

Cavalli-Sforza, L. L. and Edwards, A. W. F. Phylogenetic Anal­

ysis: Modds and Estimation Procedures. American .Journal of

llllman Gr11dic.r;, 19, 233-257, 1967; see also Evolution, 21, 5.50-

!iiO, 1!)()7.

CIU'n, Z-.Z. and Toda, S. On the Complexity of Computing Opti­

malSoluliou . .;, Manuscript, 19!H.

Cornwn, T. H., Ll'iserson, C. E., and Rivest, R. L. Introduction

to :\lgor·iflnrr . .;. MIT Press, Catnbrhlgc, MA, 1991.

l.t.l

Page 163: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[CFS91}

(CP91]

[Day83]

[Day87]

(Day88)

[Day92]

[D.JS86]

-----·---. ...-............... "' ................. _. ------........ -------

Crescenzi, P., Fiorini, C., and Silwst.ri, H. A Noh• ot1 t.h(' Approx­

imation of the l'viAX CLlQUE Prohlt•m. Information l'l'or·r.~.~ill!l

Lcllcrs, 40( I), 1-5, ln91.

Crescenzi, P, and Panmnl'si, A. Cumplt•l.t>Ht'~s in Approxima1.inu

Classes. Informal ion flllfl CIH!/1'(1{, 9:\, 2·11 --:W2, lmH.

Day, W. II. E. Comput.at.ioually Difficult. Parsimony ProhJ,•ms

in Phylogenetic Systematics . .lournlll of 1'/tr·ordi('(ll /liolo!/!J, IO:l,

4 29-4 :38' 198:3.

Day, W. II. E. Computat.hmal Cornplt•xit.y of lnf<'l'ring l'hylog<'llil's

from Dissimilarity Matrict•s. I.Jullrliu of Alalhnmzliml HiolomJ,

49(4), 461-467, 1!}87.

Day, W. H. E. Cla.t.ts Notes, Compuft:r- Scir llt't' fi7,'jH: Sw·t·inl Top­

ics in Compttfc1· A ]J]Jliralimt.r;, I B8H.

Day, W. II . E. Complexity TIH'ory: Au lut.rodudiou l'or l'ml'­

titiouers of Cla.<;sificatiou. l11 P. J\rahi<~, <L dt• Sol'f.l', a!lfl L. .J.

Hubert (cds.) C/w;tr.ring mul Cla.o;,r;ifimlirm, Worlrl Sdmtl.ilk l'uh­

lishing, Teaneck, N.J, lHU2.

Day, W. H. E., .Johnson, [), S., arul Sarrkoff, D. The~ Corupula­

tional Complexity of lrrferring H.ooted Pl1ylogmtiPs l,y l'arsimouy.

Mathematical I:Jioscicncr~s, 81, :J:J -42, I mw.

I 4.1

Page 164: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[DS~W]

[I>SH7)

[I >THO]

[DukH!l)

(ECHO]

[E.Jf\liH]

Day, W. II. E. aud Sankoff , D. Computational Complexity of

luferring Phylogenies hy Compatibility. Syslcmatic Zoology, :15(2),

22~-22!), I !J86.

Day, W. II. E. and Sankoff, D. Computational Complexity of

lnf(~rriug Phylogenies from Chromnsomc Inversion Data. Journal

of 'l'hwrctiral Biology, 124, 21:1-218, 1987.

J>lai:, .J. and Tonin, .J. Classes of Bounded Nondeterminism. Malh­

('1/Wiiml System." Theory, 23(1 ), 21-32, 1990.

Dorado, 0., Ricschcrg, L. H., and Arias, D. M. Chloroplast DNA

lnl.rogn•ssion in Southern California Sunflowers. Evolution, 46(2),

!)(i{i--!) 72, 1992.

Duke, R. Types of Cycles in Hypergraphs. Annals of Disct·ct.e

Alalhf:malirs, 27, :J99-418, 198.5.

Eldredgt•, N. and Cracraft., ,). Phylogenetic Patterns and the Evo­

lulionary Prorr.c;s: Mrlhod alld Thrm·y in Compamtivc Biology.

Columbia University Press, New York, 1980.

Estahmok, G. F., .Johnsop .. C. S., and McMorris, F. R. An Alge­

hrair Analysis of Discrete Characters. DisC1'ele Mathematics, 16,

1·11 - Hi, Wi6.

1·16

Page 165: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[EMii] Estabrook, G. F. and ~lcl\lorris, F. H. \Vhl'll an• Two Qualit.a-

tive Taxonomic Chari\clt•rs Cumpat.ihlt•'? .10111'11!1/ of Malhrmalifal

Biology, 4, 195-200, IBii.

[EM80] Estabrook, G. F. and McMorris, F. H. Wlwn is Out• Est.imal.t· of

Evolutionary Hclationships a lh•fini'IIH'III. of Auol.ht•r'! .lounwl of

Malhcmalical /Jiolo!JY, I 0, :W7--:Ji·l, I !ll'\0.

Fagin, R. C:cncraliz<'d Firsl.-ordt•r Spt•t·Lra and Polynomial Tinw

Recognizable Sets. In H. M. 1\aq> (t•tl.) Complf'J'ily of em,rm-

lations, SIAM-AMS ProcC't-dings no. 7. Anll'riran Ma.tlu'IIHtl.ical

Society, Providence, HI, 197'1. tt:J- 7:J.

[Fag83] Fagin, R. Degrees of Ac.ydidt.y for llypt·rgra.phs IUtcl Ht•llttional

Database Schemes. Jo1lrnal of t/u: As ... of'ialion fol' C'ompttl.iii!J Met-

chin cry, 30(3), 511- !)!)0, I !J8:J.

[Far72] Farris, .J. S. Estimating PhyloppiiC'f.k Trl't's from Disf.;tlll'l' Mat.ri-

ces. American Nafumli.'ll, lOH, ()~!">-fiHH, )!)72.

[Far77] Farris, .J. S. Phylogenetic Aualysis undt!r Dollo's Law. S!J.~lnnulif·

Zoology, 26, 77- 88, I H77.

[Far78) Farris, .J. S. lnfcrriug Phylogerwt.k Trt'PS from Chroruoso1111~ Ill·

version Data. Syslr:maiic ZoolofJY, '27, '27!'i -'2X~, I !J7X.

117

Page 166: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[FarH:l]

[ F{: I ,SSH IJ

[F'IIOS!)2)

Farris, .J. S. The Logical Basis of Phylogenetic Analysis. In N. I.

Platuick and V. A. Funk (eds.) Advances in Cladistics, Volume

2: Prorrf'tliu!Js of llw 8cro11d Meeting of the Willi Hennig Society,

Columbia University Press, New York, 1983. 7-36.

Ft·ig<', U., Golclwasser, S., Lovasz, L., Safra, M., and Szegedy, M.

Approximating Clique is Almost NP-Complete. In Proceedings of

lhc Tl!irty-Sf:cond Annual IEEE Symposium on the Foundations

of ComJittlr:;· Sricn('(:, IEEE Computer Society Press, Washington,

D. C., WHI. 2-14.

Ji(.Jseustein, .). S. Evolutionary trees from DNA sequences: A

maxinmm likelihood appmach. Joumal of Molecula1' Evolution,

1;, :\68-:nn, 1 ~>81 .

(i(•ls<•nstein, .J. S. llow Can We Infer Geography and History from

Gene Frequencies'? Journal of Theor·etical Biology, 96, 9-20, 1982.

Fdsenst<•in, .J. S. Phylogenies from Molecul;.r Sequences: Inference

and Hrlinbility. Aunual Review of Genetics, 22, 521-565, 1988.

Ji'Pnlll'r, S., Homer, S., Ogiwara, M., and Selman, A. L. On Using

Orarlr!i That Compule Fuuclion.'l. Manuscript, 1992. To appear

in S1itCS '93: lOth Annual Symposium nn Theoretical Asprcts of

Compt, ! . ,. Scir.rlcr..

1·18

Page 167: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[Fit71)

[Fun85]

[FB90)

(GW8:3}

[GGJ77)

(G.J79]

(Gas86)

(Gas92]

• -· ····------.----.-·-,-~,..,-,..._. ,.~·~.,•·1 ...... _._,_.,...._._..,_ • • ,.,, ,_,, .. .,. .. -J\otal-""""'f~~---"'''*"'.,.,...~\fWf"l~

Fitch, W. M. Toward Ddining tlw Courst' uf Emlnt.ion: l\linimal

Change for a Specific Tree Topology. S,rJ.-;/rmnlic Zoolo!f!/, :w, ;l()(i -

41 c, 1971.

Funk, V. A. Phylogenetic: Pa.tt.ems aml llyhridhmt.ion. A una/ ... of

f.he Missouri Botanical Gardr.n ... , 72, G~ 1--7 I fl, nl~r).

Funk, V. A. and Brooks, D. H. Phylo.tJt'llrlit· Sy ... lf'lnalit·,.. a ... lhf /Ja­

sis of Comparalitlc Biology. Smit.hsouiau lusLit.ut.ion Pt·t•ss, W<~sh ­

ington, D. C., 1990.

Galpcrin, H. and Wigdcrson, /\.. Snrrind. H('PI'<'S('III.a.t.ions ur

Graphs. luformal.ion and Con/.rol, !W, IH:l - l!lH, 1!1!-l:L

Garey, M. R., Graham, R. L., and .Johnson, D. S. Tlu~ Corupl(!:<it.y

of Computing Steiner Minimal Trecli. SIAM .Jmmutl tm t1pplnl

Mathematics, 32(4), 8:15-8!)9, 1H77.

Garey, M. R. and .Johnson, D. S. Comrmh!1's ami htlmdaf,ilily.

W. H. Freeman, New York, 1979.

Gasarch, W. I. The Complexity of Optimization F'lt7u~lionH. 'J',!dl­

nical Report no. 1652, University of Marylarrd, Depart.uwrrt. of

Computer Science, 1986.

Gasarch, W. I. Personal cornrnuui«:al.iou, .July II, I!J!t~.

149

-·~.

Page 168: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

{I h•i!lO]

(:mw.rch, W. 1., l<retJI.el, M. W., aud Rappoport, K. .J. OptP

as lhf' Normal !Jclwnior of NP-Complclc P1·oblcms. Manuscript,

I !J91. To appear in Mnlhcmaliral .'lJJ.-;Icms Theory.

(;raltarn, It L. and Foulds, L. R. Unlikelihood That Minimal

PhylogPnic~s for a BPalistic Biological Study Can Be Constructed

i 11 Heasonable Computational Time. Mathematical Bioscicnces,

Gntnt, V. Piau/ Speciation. Second edition. Columbia University

Pn·ss, New York, I!JSJ.

Cusfield, D. The Steiner Tree Problem, 1/istoricnl Rcconslruc­

liou, and /1hylo,qruy. Manuscript, 1991. This is a revised version

of Tt'rhnical Hc>port :t32, Department of Computer Science, Yale

Uuiwrsity, l!l84, hy t.hr same author.

C:usliPld, D. Efficient tvl(•thods for Multiple Sequence Alignment

with C:uarault•(•d Error Bounds. Bullrtin of Mathematical Biology,

.1.1( I), 1·1 I-1!).1, I 9H:J.

I h•i11, .J. R(•conslruct.ing Evolution of Sc(tUcnces Subject to Recom­

hiuat.ion Ue;ing Parsimony. Mathematical Bio.<;cicnccs, 98, 185-

:wo. I!HlO.

1.10

Page 169: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

(HHSY9l]

[HP84)

[Hen66]

[HT85a]

[HT85h]

[HS78]

[IIum8:J]

' • ' - - • '" ~ · . , ~ .. _ ,..,....,_ , ,,,, .,..., .... ...,,_ ... ~.,....,.. ..... ,..., , . ... ..... .. ~ . .. . ' ' · "" • ~ ._ ,,.,..,. .. ,. .... _., •• • . v .. ,.. •• ,,.. _ _,_ .... \ ,, ~. ,...,..-~ ... , •• - .. ... . . , , .,...,,,.. .... ,. ~ ,. • • , .,. • .., ,.,-.,..--.,...._..,.

Hemachandra, L., Hm•tw, A., Sic•rkc•s, n., anti Yuunp;. 1'. On S<-ts

Polynomia.lly E11utnc•rahlc• hy lkrat.iun. Tlrn•ITfic,r/ ('o"'l'"fn· Sr·i­

rncc, 80, 20:J-22f>, ImJI .

Hendy, l\1. D. and Pc•nii,V. D. Cladogrm11s Should Bt• Callt•tl 'l'rt•t•:o:.

Syslrmatic Zoology, a:J(2). 2·1!l -:!·17. nlS·I.

Hennig, \V. Phylo!Jfllrfil' Sy.o.;lfmalif' .... Tnu~slat.t•tl hy 1>. 1>. lla\'is

and H. Zanger!. truiv<•rsity of Illinois Pn·ss, llrt.ana, JL, l!lfili.

llopficld, .J. .J., and Tank, D. W. :·.:,,Jral <'ompul.a l.iuus of l>t•ci­

sions in Optimization ProhiPms. Hiolo.f!il'(/1 ( 'ylu·rndit'.", !"1:!, HI

1.12, 1mm.

llopfi<•ld, .J • • J., mul Tmtk, 1>. W. Cornput.in~ wit.l1 Nt·und Cin·uit.s:

A Modd. Srirllf'f·, 2:1:~, (i:g, n:J:I, I B~.C).

Horowitz, E. an<l Salllli, S. Frmtltlmr·llfa/ ... of (.'owpul,.,. Al!Jorilhm ....

CompulN Sci<•nr<• Pr<•ss, Hockill<•, MA , I!JiH.

Humphries, C .. J. Primar·y IJH!.a in Hybrid Awdysis. lu N. I.

Platnick and V. A. Funk (I-ds.) Adl!twr·r·s iu C'lfu/i,o;lit·.o.;, Volumr

2: PT'Ocr:cdinfJS of llu. St'('(J/Id A-lu·liu,tJ of lhr· Willi llu111i!J Soddy,

Columbia Uuiversity Pn·ss, N(•w Yul'l< 1 J!JH:J. X!J J(J:J.

I !j l

Page 170: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[.JS71]

[.JVVHfi]

[.Joh7·1]

[.J 0 It! 10 l

[.Joh!l~]

.Ja~ot.a, A. E'j]ir:int.tly Ap1n·oximaling AJAX-CLIQUE in a

1/opjir:lrl-!;/y/r· Network. To app«>ar in htlrTnational .Joint Co11frr­

,.,l.l'f: on Nf·"J.ml Nf'lworh, IEEE Computer Society Press, Wash­

i u~l.ou, D .. C, I !HJ2.

.lal'flim~, N. aud Sibson, It Malltrmnlical Ta:ro11omy . . John Wiley,

Lonclou, I !171.

.IPrrtlllt, M. H., Valiant, L. G., and Vazirani, V. V. Random Gcn­

c•r·al.ion of Comhinat.orial Structur<'s from a Unifonn Distribution.

Th('(Jrdiml Computer Sricnl'f, 4:J, 169-188, 1986.

.Johnson, D. S. Approximation Algorithms for Combinatorial

l'rohlt•ms. Jonrnal of Cmnpulrr aud Syslrm .'Jric11crs, 9, 256-278,

1117·1.

.Jollltson, D. S. A Catalog of Complexity Classes. lu .J. van

Lc•pm\'('11 (t'tl.) 1/audbook of Thcorclical Compulrr .'Jcicucc, Vol­

tt/1/t · :\: Alf/OI'ilhms and Complr.rity, MIT Pn•ss, Cambridge, MA,

IH!IO. :q - I (il.

.Jollllson, D. S. TIH' NP-complett•ness Column: An Ongoiug Guide

(:!:Jnl Edit.iou) . .Journal of Al,fJorilhms, I:J, 502-5:H, Im)2.

1;'12

Page 171: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

•• • • - ·· .,. .. ~ ... .- · -·~·.._ ,"-,. • ,~,.., ,.,. • . , _ _ ~ · ••-••• · -·- • ••• • • .. ~ _..v ,. ... ....-- ·-•• _ .. ... _. _ ... _ ,,.._ , _ ..., _ __ ,._....,._.._._..._......~-~-· .,.,._...~,.__. 'W~ ..

[.JPY88}

{1\anH I)

[1\arn]

[I\L8:J]

(l\ea90]

[KF6H]

[l\o91]

.Johnson, D. S., Papmlimitriou. ('. II .. ancl Yannakakis. ~1. llow

Easy is Local St•arch'! .JourTial of COIIIfJlllrr t1111/ ,C.,'ysln11 Hl'irllt'f.~.

:n, ;n-too. w~8.

1\ann, V. 1\laximum Boumlt•d :1-dinH'nsional ~lat.chinp, is l\1 AX

SNP-rompl<'l.<·. !~tfol'malion l'rm·t· . .;,.;in,t~ Ldlt'l' .... :Ji. :!7 :Jr,, I H!J I.

I\arp, R. M. Bl'durihilit.y Among Cornhinat.orial Pwhlt•rns. In

H. E. Millt•r and .J. W. Thatdu•r (c•cls.) C'ompll'.rily of ('olllfJlllrl'

Compulalion.o;, Plc•nurn Pn•ss, NPw York, I !17~. ~r. I o:t

1\arp, H. M. arul Luby, M. !\lonl.t·-( ~arlo Al~orit.luns for Enu­

nwr·ation and lkliahilit.y Prohlt•ms. In l'ronnliu!/.'i of l!tl' '/'tl'nii!J­

Foul'lh Annual/BEE Sympo.o;ium on lh Nwllflflliou.o; of ( .'olllfJtthl'

Sdrurf, mEE Comput.f•r Sol'it'l.y Prc·ss, Waslriugt.urr, 1> . C., I!JX:t

.16-64.

1\carns, M .. J. Tlu: CoiiiJIItlrtlioual Complt·.rily of Ala.drirtt· /,mrrt­

iug. MIT Pn•ss, Caruhridgt~ , MA, l!mO.

I\ lug<·, A. (L aut! Farris, .J. S. Qumrt.it.at.iw l'hylt·t.it-s arrd t.lw

Evoluliou of ;\uurarrs. Sy.o;/r·malif' Zoolo!J!J, I H( I), I :J~, l !Hi!J.

1\o, K .-I. ()mnplt·.cily '/'hrm·y of Null Frwd io71.o;, Bi rkllii llsf•r,

Bostou, ~lA , I ~mI.

Page 172: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[I~STH!Jj

[1\T!HJJ

[1\THlj

[1\f\IBSI]

1\ijJ,Ier, ,J. Personal commuuication, .July 2:1, 1992.

1\i)t,J .. r, .J., Scfl()uiug, tr ., aud Tcmiu, .). On Counting and Approx­

imation . Ada lnformal.im, 26, :m:~-:nn, 1989.

1\olait.is, P. G., and Thakur, .l\1. N. Logical Defiuability of NP

Oplimi::aliou Pmblr.m!'i. 'J(•rhnical report. UCSC-CRL-90-48. Uni­

VI'rsi t.y of (:ali forn ia at Santa Cruz, Dcpart. meut of Computer aud

l11format.iou SdeJJf('S, H>DO. To appear in lnfol'malioll a11d Com­

f>lllnlion .

1\olait.is, P. G., and Thakm, .l\1. N. Approximation Properti(•s

of NP Minimization Class<'s. In PrOl'f'fding.c; of 1/u Si.r.l!t Anuual

('onfrl't' flt'r 011 Slrurflt1'f' in Complr.rily Thr01·y. IEEE Compul«~l'

Sorh•t.y Pn•ss, Washittgl.on, D. C .. H>91. :J!):J-:366. To appear in

.Journal of Com puler a11d Sy,c;lcm Sric11Cf.".

1\ou, L.. Markowsky, G., and Berman, L. A Fast Algorithm for

Stt·int'r Tn•t•s. Aclalnfomuzlica, l!i, 1·11 - 1·15, 1081.

1\n•nt.t•l, 1\1. \V. Tht• Complexity of Optimization ~roblems . ./oru·-

1111/ of Compulrr and Sy..,lcm Srinu'f,c;, :J6(:l), 490-50n, 1988.

1\r<•nt.<•l. ~t. \V. G('lll'ralizations of OptP to thl' Polynomial Hil'l'·

at'rhy. '/'lnol'dical Compulrr Scir11rr. Bi(2). lS:J-198, HlB2.

I !).l

Page 173: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[Kre92b}

[l\ri86}

[l\ri88]

[I\M86}

[Lad89]

[Lak87]

[Lat82]

... .... , .. ., ........ -... ~-· .. ~ ·· · . . ,._ .. ... ~ ··· ·· .... ........ -.... _, __ ... , ... _._ .... ,_ , __ . __ ~ .......... -,.; ____ .._.

1\r<•ntd, ~1. \V. PN~onal rommunkatiou • .lui,\' I :t 1!1!12.

1\fivant>k, ~1. On tlw Computational <'omplt•xit.y of <'lu:-;t,•t·iu~.

In E. Diday, Y. Escoufit>r", L. Lt'hart, .J. Pap;1•s, Y. Sdlt'kl.rwut,aud

H. Tomassofl{' (<'ds.) /)ala ,lnal,tJ . .;i,.; and lnformalit·.~ II': l)mnnl­

ing,c; of lhr Fourth lnlrrnalioual S,IJIIIJIO.'iium on l>ala :lualy . .;i.~ 1111d

lnformalic.o;, Elst>vit·r Scil'lll'<' (North-llollnnrl). AJJt:-;knl<llll, l!l~(i.

8!)-Hll.

Krivanek, M. Tit<• CompiPxit.y ur tllt.rauwt.ri<' l'artitio11s on

Graphs. Information Prm'f',o;,o;in!t /,1'1/t'r,o.;, :n, 11i!) :no, I!IXX.

Krivanek. M. and Mor<i.v<•l,, .J. N P-llard Prohll'ms irt llic·r·mddnd

Tre<' ClustPriug. Acln luforma/i('(t, 2:1, :J II :1:!:1, l !)Hii.

Ladner, H. E. Polyuom ia.l Sp;u·<· Count.iug l'rohlc ·ms. Sf tl/\1 ,/o" ,..

nnl on Com1mling, JH((i), JOH7- I O!J7, I!}S!J.

Lak(•, .J. A. A Hat.('·lrllh~pc•uflc•ut 'J<•dllliquc~ for At~ a lysis of Nurlt·ic

Add S<•qu<'lll'es: Evol11t.iouary Parsimouy. Molu·ular /Jiolo!J!J ami

B1wlulio11, 'l(:l), 1£)7-IHl, l!JH7.

Lathrop, G. M. Evolutiouary Tn•(•s aud Ad111ixt.un•: l'lrylo~t·rwl.it·

l11fereuce WIJeu Some Populatium; arc· JlyiH·idbwd. Auunl .... of /Iu­

man Gnulir:.r;, •H), 11!i· :lT>!i, I!JH:l.

Page 174: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[LWH~J

[LV90;1]

[Lip9~J

Lc~t!, A. H. BLIJDGEON: A Rlunt Instrument for the Analysis

of Conl.amiuatiuu in Textual Traditious. In Y. Choueka (eeL)

r:ompulrrs in Liu,quistif' and Lilrmry Compul i11,q: Literary and

UntJni.<~lic Compulin,q 1.98& P7'0cadi1lgs of ihf'- Fiflcenth Annual

Confnrn('(', Champiou-Siatki ne, Paris, I 9BO. 261-292.

Lc•ngaw·r·, T. and WaguN, K. W. The CorrC']ation hetwrl'n the

Complexitic•s of tilt' Nonhierarchiral and llirrarchical Versions of

( haph ProhiPms. ./ o urnal of Comp ulcr and System Srir.uccs,

Li, M. awl Vitanyi, P.M. B. lnductiw Reasoning and Kolmogorov

Comph•xity. Jout·nal of Compulrl' and System Sciences, ·H, :J·I:J­

:HH. I !l9~.

Li. M. ami Vitanyi, P. M. H. 1\olmogorov Complexity and Its

Applkat.ion:-~. In .1. \'an Ll't'llwen (l'd.) 1/andbook of Thcorclical

C'ttiiiJHt.ffl' 8dnu'f', l'o/umr A: Alyot·ithm.c; und Complc.rily, MIT

Pn•:-;s, Camhridg<', ~1 A. I mJO. lSi -25·1.

Lipst·muh. D. L. Parsimony, Homology, and the Analysis of Mul­

t.istatP Clraraf'ter:;. Cladislil's, 8(1), ,15- 65, 1992.

Lurkow, ~1. and Pimc:>ntl'l, R. A. An Empirical Comparison of

N lllllt'rical Wagnt'l' C'om1 .utt•r Programs. Cladi.<il ir8. I ( .t), 4i-66,

Hili

Page 175: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[McD90]

[McM77]

[l\·1 ad!J I]

[M RS9~]

[ME8rJj

[MS72j

.. -.. , '·•· · --~- -~-··----·-·--- -'·---·---~·~---,--·· ---

W8!l.

MrDadt'. L. llyhrids mul Phylogc•nt•t.k Sy:-;tt•mat k:-; I. Patlt•ms of

Charartc•r Expn•ssiou inllyhridsaud t-hc•ir lmplicatiuns for Cladis­

tic Analysis. B1•olutiou, .J.I(ll), lm"r> liOn. I mHl.

Mrl'vlorris. F'. H. On til<' Compatihilit.y of Binary Qualitat.h· .. Taxo­

nomk Charaders. l.Jufldin of Malhnnalintl Uiolo!llf· :m, 1:1:~ I :lS.

)!}77.

f\.,laddison, D. H. Tlu· Disc·m·t•t)' and lmport.allc'P of 1\lult.iplt• Is­

lands of Most- Parsimonious Trt•c·s. Sy.-;/ t ///(//it· Xoolo!flJ. ·I 0(:1). :m.t

:114, I!J!)J.

Maddison, D. H., Huvolo, M. awl Swofford, I>. L. <:c•op;raphir Ori ­

gins of I!nmau Mitorlaoudrial DNA: Pl1ylo~t'lll't.k Evidt•un• fro111

Control H('gion Sc·q•wun•s. S!p;/fmnlir· lliolo.tm, ~11 ( 1 ), Ill l :l ~l,

Jn!J2.

Meacham, C. A. and Estabrook, ( :. F. Cotnp;ll.ihilit.y Ml't.hculs

in Syst<'lllat.irs. 11nrwal Uroi, w of l~'f'olo.tf!J a11tl Sy.~J, 1111din;, lfi,

Meyer, A. H.. awl Stoc:ktllt'Y('r, L .. 1. The Equiwdt•un· l'roltl«'lll for

Hegular Expressious with Sqnariug luvulw•s Expolll'llt.ial Span•,

In tlrf~ /.1/h Amnwllt::I·.:H ·'hJmrm.~ium w• Smill'hi"fl ami AttiMmllfl

Iii7

Page 176: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[MicX:l]

[MIJ.JH!))

[MS~tlj

[ 1\1 II! HI I

! l\ lotH:!]

Tltr:o1'y, IEEE Computer Society Press, Washington, D. C., 1972.

1 :l!> -J:w.

M ick(~virh, M. F. Trausformatiou Seri<'s Analysis. Syslenwlic

Zoolo.IJ!J, :J I ( '1), ,Hil --·178, W8~.

Milosavlj(!Vic, A., llam;sl(•r, D., and .Jurkil, .J. lnfornwd Parsimo­

tlious luf<'I'<'IIC'<' of Prototypical C:(•twtic Sequc!llces. In R. Rivest,

D. llaussl('r, awl M. K. Warmuth (cds.) COLT '8.9: Proca·diug.r; of

fhf· Srrond Annual 111ork.r;/wp on Compulalional Learning Theory,

Murgau 1\aufmarm Pnhlisltc•rs, San Mateo, CA, 1989. 102-117.

Molli(•u, B. and SpeckC'nmeyu, E. Some Further Approximation

Alp;orit.luns for t.hP Vc~rtex Cover Problem. In G. Ausiello and M.

Prot.asi ( Pds.) CA A P '8,1. Lc•rt urc Not.c•s in Computc>r Scicnct' 110.

l!i!L Springc•r-Vc•l'lag, B<'rlin, 198:3. 3·11-:1·19.

1\lorit.;~., C. and D. M. Hillis Molecular Systematics: Context and

('ont.rm•c•rsit>s. In D. l\1. Hillis and C. Moritz (eds.) MolcculaJ'

H!f . .;/nnalit· ... , SinauPr Associates, Sunderland, l\IA, 1990. 1- lO.

~lot.wani, H. Lrl'llll'f' Nolr : 011 Appm.riuullion Algorithms: Pari

I. i\lar111srript.. l!m2.

Nt•i. l\1. Molrcltlar El'olulionary Gcurlics. Columbia University

Prt•ss, Nc•\\' York, 1!)~7.

lf!H

Page 177: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[NeiS:J]

[NP81]

[Nig75]

[OM90]

[PR90)

-- ..,,.,..._.._.., ', .. .-~- .... .,., . ... -~, .. ..,._. . . ,.,. • ;o.o,--,-~_,._.._,,.--.-..... _ .. , .. - ,..,..,_•--- ·--~__....,.._.,......_~_.,.-,.,__,~~.,_--_.....,. ......

Nelson, G. Rl'tirulation in Cladop;ram:-;. Itt l'lat.nick. N. I. and

V. A. Funk (<'dK.) Adr'allt'r,.; i11 Cladislif' . .;, \'olumr ;!: l'mt·r.,-tlill!l·~

of llir Scrond Mrflill!l nf thr Willi llrllll(q Sot·idy, ( 'ulumhia t lni­

V<'I'Sity Pn•ss, N<'\\' York. I m~:l. I()!) Ill.

N<'lson, G. and Platuirk, N. I. Sy.o;fnnalic.<; llllll lli<•!II 'H!/I'trJih!f:

C'ladislics and l'icariant·t·. Culumhia lJni\'t•rsity l'n•ss, Nt•w York,

W81.

Nigmatullin, R. C:. Comph•xit.y of t.ht• Approxiutal.1• Sulut.io11 ul'

Combinatorial Prohlt•ms. fJoklndy Ahuhmii Nt111~· SSS/1, ~2-1.

289- 2!)2, I !)7!) (in Hussiau ). Enp;lislt l.ranslatiuu (iiH·oqmral.iup; au­

thor's corrl'rlions) in So11ir .; Alalhnualic.o; /Joklt~tly, W, II!I!J I ~ll:t,

1915.

Orpoueu, P., a11<l Maunila, II. On Approrimalion-l)n.o;t·rrrin.'l u,..

ductio11s: Complr.lr. Pro1Jlr·m.'4 aud llobr11d Ahwwrf·.'4, Miiiiii!'Wript.,

IH90.

Pancon<'si, :\. aud Haujau, D. Quaut.ifiNs a11d Approxirnal.iou

(Extended Abstract). lu Prot:tf'fli"!l·o; of lh.f· ::!:Jml 11 (.'M .'i!JIIIJJO.o;irwl

ott 11tcor·y of CompulintJ, ACM Prt~ss, Wasl.iugl.ou, f> . C., I !J!JO.

t146- 456.

1 !j!J

Page 178: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

jPYHfi]

[PY!H]

[I' JIS!)~J

[PhiH·I)

[ Pln~m)

[PW7li)

lHk~!l]

Pap;ulimit.riou, C. II. and Yannakakis, M. A Note on Succinct

H«•prt'sent.at.ions of Graphs. Information and Conlrol, 71, 181-

1 X!i, I !}8().

Paparlimit.riou, C. II. and Yannakakis, M. Optimization, Approxi­

mation, aud Complc·xity Classes . ./ott7'11al of Compulcr and System

,c,·,·if11t'r·.-., 4:1, ~t~5-- ,Jtl0, 1 9fJ I.

p,•uny, D., IIPndy, M. D., aJI(I Stell, M. A. Progress with Meth­

ods for Coust.rudiug Evolutionary Trees. Trrnds in Crology and

l!.t'11oluliou, 7(:1), 7:1-79, 1992.

Phipps, .J. B. Problems of Hyhridity in the Cladistics of Cratacgus

( Hosa.rPa<•). Jn W. F. Cerant (eel.) Plant Biosyslcmatics. Academic

Prc•ss, Toronto, 1984. ·117-4:38.

Platuick, N. I. An Empiri(_al Comparison of Microcomputer Par­

simouy Programs, II. Cladislirs, 5(2), 14f)-161, 1989.

Prag('l', E. l\1. and Wilson, A. C. Congruency of Phylogenies Dc­

ri\'t•d from Dilrt>rt>nl Proteins: A l\lolccular Aualysis of the Phylo·

gt>twtic Posilion of Crarid Birds. Jotu·nul of MolcculaJ' Evolution,

9 •. If) - !i7. HJ7(i.

Hichards. D. Fast llt•uristic Algorithms for Rectiliuear Steiner

Trt•t•s . ..\1!/0I'ilhmira, ·1. W l - 20i. 1 9SH.

HiD

Page 179: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

(Riv90]

[SC8:3]

[SchR86)

[SchU86]

[SchU90]

[Sel91]

Riv~?~t, R. L. Cryptography. In .1. \'an L<'t'll\\'t'll (t-tl.) 1/nmllwo~· of

Thror·climl Computer SciruCf, \'nlumr :\: t\f.qol'ifhms 11111f ('om­

pic .ril y, MIT Pt·t•ss, Camhrid,!.!;<'. ~I A, I !)!)0. 71 ;.. ifi!i.

Sankolf, D. D. and C<•rl(•rgt'l'll, H . .J. Simult.aru•ous ('olltpnrisou

of Three' or Morl' Sl'qtu•nn•s H(•lat.l'd h.r a Tn•p, l11 D. Snnkolf

and .). B. Kntskal (eds.) 'J'imr H't~r·p.o;, Slrill!l J~t!it .... a111l Mtu'l'oy

molecules: The Thcol'y ami l'r •rdin of Srqurnn ( 'tHIIJIIII'i.o;oll,

Addison-Wesley, B<'ading, MA, I!}H:l. 'l!i:l :W:L

Schoch, R. M. Phylo,qrny Urt•nn ... trudion in l'u/,.onlolo!fy. Va11

Nostrt~.nd R<'inhold, N(•w York, I !}~fi.

Schoning, U. Comp/c.rily ami Slr·udttrt·. Ll'ri.UI'I' Nol.1•s in Cotll­

putcr ScictJ<.'(' no. 211, Spriup;<•r-V<•rlag, BPrlitt, lm'lti.

Schoning, U. Tlw Pow<•r of Counting. In A. L. SPim;ut (Pel.}

Complexity Theory fi.t·lro.'ipt·t·lim:, Spriug1•r-V<•riHJ!;, BNiiu, I!IHO.

204-22:1.

Sclmau, A. L. A TaxmuJm!J of C'mnplt-Jily (!/as.'if.'i of Ftwdion.o;,

Technical report, St.at1~ tJuivcrsity of N<~W York at Buffalo, l>c•­

partment of Computer SdPun•, IH!H. To ilJ>JH'ilf iu ./otl7'11fll of

Compulcl' and Syslt:m Stir: rU'f,o;. Au ahlm•viatt•d wrsiou apJu•;trPtl

lfil

Page 180: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[SimWWJ

[Sim.J7i]

[Srll'in]

[Sny!l2]

[St.a(~i.'i)

[St.aT~OJ

in Uulldin of the E1Lropcan Ast~ociaiion for Theoretical Computer

Scif:ncc, ~!5, 111-I:JO, 1991.

Sdmau, A. L. Personal commuuicatiou, .July 6, 1992.

Sirnou, II. lJ. Coutinuous Hcduct.ions Among Combinatorial Op­

timization Problems. Acta Informatica, 26, 771-785, 1989.

Simon, .J. On t.ht~ DiffcrPnce between One and Many. In A. Sa­

lomaa and M. Steiny ( cds.) A utomat.a, Languages, and Program­

miug - ,flit Col/oquim, Lecture Notes in Computer Science no . .52,

Spriuer-VPrlag, Berlin. 480-490.

Sru•at.h, P. II. A. Cladistic. Heprcsentation of Reticulate Evolution.

8y.<;/nnalir Zoology, 24, !160-368, 19i.5.

Suyd(~r, T. L. On the Exact Location of Steiner Points in General

Dinwnsiou. SIAM Journal on Computing, 21(1), 163-180, 1992.

Stan·, C. A. Hybridization. In C. A. Stacc (ed.) Hybridi;;ation

and the Plora of the Brili,qh Isle,<;, Academic Press, London, 1975.

t-no.

Standish. T. A. Data Structure Techniques. Addison-Wesley~

Ht•adiug, ~1A, 1980.

162

Page 181: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

~ - ~-.-~ .. --.. - ...... _ . __ ......__ ____ " ___ f¥ _ ___ _.,.....,. 't t>.- • ...,.1 1

,. [Sto77} Stockmcycr, L .• J. The Polynomial Ilil'rarchy. Tlu·orrlical Com-

ptlfcr Scie11cr, :3, 1-22, 1977.

[Sto85} Stockmcycr, L .. J. On Approximation Algorit.hmH for # 1'. Sf,\ M

Journal on ComJmliu.q, 1·1, 8,19-S(il, J!}H!i.

[SSV92} Stoneking, M., Slwrry, S. T., ami Vigil<lllf., L. Gt•ographic Origiu ol'

Human Mitochondrial DNA Rt•visit.f'd. Sy.~tr·mafit·/1iology, ·ll(:l),

384-39 l, 1992.

[S090} Sworrord, D. L. and Olsen, G .. J. PhylogPIIY Ht•(·oust.,·uctiou. In

D. M. Hillis ami C. Moritz ( cds.) i\4olrcular Su.~l nnal h·.<i, Sina.IJ('I'

Associates, SundcrlaJJd, MA, 19!JO. tlll -.'iOI.

[Tho82} Thorpe, R. S. Rct.iculat.e Evo\ul.iou a.11d Cla.Jis111: 'l'l':;t.s for t.IH'

Direction of Evolution. B:r.pcricnlia, :JH, 12tl'l- l'ltl ,l, I!IH'l.

[TW92} Toda, S. and Watanabe, 0. Polyuominl-tinw 1-Turiug n~dudious

from #PIIlo #P. Thcorclir.al Cmn]mlc1' Sdt:tu'f', I 00( I), :lO.'i ::!21,

1992.

[Tor91} Toran, ,J. Colllplexity Classes Ddiuerl by Counting Quaut.ifi(~r~.

Journal of the Association Jot· CouqmiinrJ Mru:ltiur:1'y, :SH(a), 7!J:I

774, 1991.

Page 182: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

fVal7fi]

[Wagi\X(iaJ

f\Vagl\~7]

[\ Va~l\ ~~~

Valiant, L. (;, Tlw H(•lativc~ Complexity of CIJCckiug and Evalu-

at.iug. Informal ion l)rot•r,.;sin,q Lr llr'l's, !l. 20--2:1, I 9i6.

Vali<ntt, L. C: . TJ, .. Cornph•xit.y of Ermnwrat.ion autl Reliability

Problt•tns. SIAM .JouT'tud o11 Co111puliug. S, ~ti0-~121, 197~J.

Valiant, L. (l. Tilt• Comph•xity of Computing tlu• Perrmuwnt.

'l'l~tol·l'fiml ( .'ompulr r Srif nrt', 8, 189-201, 1!)79.

\Vap,twr, 1\. W. Til<• Complt•xity of Comhmatorial Prohl<'ms with

Surcinct Input H<'IH'PS<'tltat.ions. Ada Jujo1'mnlira, 2:! , :n1-:J!;6,

l!lS(i.

WaAtwr. 1\. W. Somt• OhsNvatiorJs 011 til<' Conu<'ctioll betW<'<'Il

('ounting a11d Ht·rursion. Thrordiral (.'ompulrl' 8ch·ncr, 47, I:JI-

1·17, I !lXfi.

\Vap;rwr, 1\ . \V. !\Jon• Compliratt•d Qut•stions about Maxima and

~lininw. 1111d Sonw Clustll't'S of NP. Thm1·f'!icnl Compulf'l' Scit 11cc,

rd. :;:t so. 1 !187.

Wav;uer, 1\. W. Bouudt•d Qut•ry Computation. In the Pm('(ffl­

iU!/8 of lhr Thil'd A111wa/ Con/fl'r tl cc 011 Structure in Complc.rily

Thr ·oi'!J, IEEE Computt•r Socil'ty Press, \Vashington, D. C., 1988.

:wo ·'li7.

I().!

Page 183: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[\Vagl\89]

(vVagl\HO]

[\V\VS()j

{\VogW68]

[\Vag\V80]

[Wil81]

[\VSBFBl]

.... -- ---·-· .. - .. . --- ·-""" ·--~- .. ~~- -.-.......... -... _.-.. -.---~·-··,...---JW'· ................ --·-4-

Waglll'f. K. \V. 1\ 'umbrr-of-Qun·y lliaan·hir.... 'l't-t·hnkal Ht•­

pmt no. ·1. lust 111 fiir Informal kk. Baynisdll' .Julius-~laximiliaus­

lln i \'l'I'Si t a I . \Vii rz hu rg. I!}:-\!}.

\Vagn('r. K. W. Bonndt•tl Qm·ry ('lasst•s. 8/.-\.\1 .lormw/ on ('om­

pulirrg. 19(!l). ~t\:\ - ~-W. lll!l\1.

Wagw·a-. K. and Wt•dtsuup;. <:. ('ompufnlional ('olllfllt.rily '1'111111'.'1·

\Vagtu•r .Jr., W. fl. llyhriclizat.ion. Tit:WJIIIIII_\', and Evol11tiun. In

V.ll. llt•ywoocl (c•tl.) Mmhrn Mdlwds in 1'/on/ 'lluoiiOIIIJJ. Bol.ani ·

cal Sucit'l.y of til<' British lslt•s { 'tmfi'I'C'IIl't' Hc•ru,rl. no. Ill. Anull'lllic­

Pr<•ss, London. I !)(iS. I I:~ I:~~'-

\V<tgll<'r .Jr., W. II. Orip;ill a11cl Plailosoplty of t.ll<' ( :rourad(llilu­

diverg('ll<"<' Md.lwd of Cladist.il's. S,tj.•dt 11wlir· /Jofony, !';(~), 17:1

WiiPy, E. 0. fJfly(o_qf'udif·.o;: 'f'J,,. Thwry and l 1rtJ.dit'1 of l'hylo!Jt­

nf'lic Sy,c;lr·malit·,<; . . Joltu Wilc•y, N('w York, I!JXI.

Wil<•y, E. 0., S<·igei-Cawwy, D., Brooks, ll. H., aJHI l-'1111k , V. 1\.

The C:omfJ/cal C/adi.'il: A [11'tltl.f' T' of [JJ,ylo_qnu:l i t· flr·ot·rdru·u;, S1w·

cial Pu hlication IICJ, I !J. 11 IIi versit.y of 1\allSiL<; M 11S(~IIIIl or Nat. ural

History, Lawrence, J(awms, I!HJ I.

l(j!j

Page 184: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

[WiuH7]

[Wra7iJ

[ Yau!}'Lj

Wiut1·r, P. St.eirwr Problems irr N<'l.works: A Sun·c·y. Nf frrork.'>,

17, J:.W Hi7, I!JH7.

Wnttllall, C. Cornpll'l.<· S1•t.s arrcl f.lu• Polynomial-tinw fli<'ratThy.

'/'hrordi('(ll ( .'ompufr r· Sr·ir 11r·r, :J, ~:J -:J:J, lnii.

Yarlllakakis, M. Tlw Aualysis of Lncal S(•arch Prohlems and tll('ir

I !f·urist.ics. In C. Cltolfrut. and T. Ll'ngaHN (<>ds.) 81'A C.'i '.90:

7th A 111/llfl! SylllfJO.'>ium nn Throrrliral A.r;prd.'i of Cnmpulrr Sri­

,. nn, L,·,·t.•rr·1· Not.c•s i 11 (~om pu 1.1'1' Scic•rr n• rro. ,J l .1, Spri ng1'1'· Vc•rl a g.

Yao, X. Findiug ;\pproximal<' Solutious t.o NP-hard problc•ms by

Nt•ural Nl'tworks is liard. lnfnrmnlion Prorr:ssing /,cl/cr,r;, 41 (2),

!Ia mt I H!l'L.

166

Page 185: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.. ~?· ' r· : ~

~:: ' > i

A Phylogenetic Systetnatics and the Inference

of Reticulation

In this appendix, I will gin• a short n•\'ic'\\' of \'arious approarllt's to infc•tTill.l!,

t'Pticulat ion, fnllowc•d hy a j usli ficat.ion of I lw n•l irnla t c• phy lop;c•n••tic· parsinloll,\'

prohh•m schc•m;,ta dc•finecl in Sc•ct ion :t:U. For in-clc•pt h rc•\'ic•ws of I he• lupin• iu

this apJ><'mlix, sc•p [Fun/).1, Gra~l . SchH~(i, St.a('i!iJ.

Hdiculat.c• t'\'t•nt.s as dc•snilll'd itt Sc·c·t ion ~.I an• part. of hiolo)!;ic'al c'\'olu-

lion. lly hridizat.ioll has cH'<'III'I'C'cl frc•quc•ut.ly i 11 111a ny p;l'llll ps of plan l.s and l•·ss

fr<'<JIU'IItly amoug animals, uot.ahly in birds and lishc•s I< :ra~\1, pp. :W:! :!II·IJ,

aud introgn•ssion, a fon11 of n•cotubiuat.iolt i11 whirl! dtat·ac·t.Prist.in; arc• pas~wcl

via hybrids from otH' spccil's t.o ii!Jot.ll<'r, :;c•c•Jits t.o mTJII' wit.h p;n·al.c·r frc·quc•tJc·y

than pn·viously t.hougltt., <'SJ><•cially among t.lu• rytoplastllir' and lllll'lc•ar p,t•uc•s iu

plants and animals (s<'<' [DH.A!J2] and rl'fc•rc!IICC's). TIH' c•volut.ion;•ry sig11ilkann•

of such reticulation has been cldmt.t•cl for clc•cadc•s; for iusl.illll'c•, IJyhri1li;,at.io11 has

hccn viewed as mere noise• 011 t.lw uuclc·rlyiug subst.ml.<• of •lkhol.ot11o11s I'Volu-

t.ion [WagW68], as a11 import.a11l fort:<' itt part.ir.ular groups at. pHrt.inJiar t.itiJc•s

(Gra81, pp. 17!1- 189), ami as t.lte domillaut force! in pl;tllt. c ~volutiou !St.aC7!",,

Lot.sy (1916) quoted on p. 2-1). H.egardlc~ss of sudt cldlitll ~ , n·t.ic·.ulat.iotl aud its

inference is crucial to many iuve:;tigat.ors.

There arc many t.radit.ioual biological heurist.ic:s for n~co~JIIWll!, iudivitltHc.l

W7

Page 186: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

iusUIIII't's of l1ybridizatiou m11l n•corubiuatiou, basl'd 011 tlu• iutc•rnwdiat1• ua-

luw of t.lw produn·d dmrad.t•r-stal.«!s and various at.t.rihuh•s of til<' propmwd

hyln·id t~nrl its parc•nl SJH'f'tPs c•.g. g.•ographicttl dist.rihut.iou, parc•ul<tl int<'l'f..r­

t.ility. PXpt·ritllt'lllal 1'1'-I'I'Patiou of ltyhrid [Grtt~H, StH<'i!l]: sollll' of tlu•st• ll<'uris­

tit's Jtavt• l1t 'l'll coclt•d iiS IIIIJIH'I'i<"a)II\I'HSIII'I'S (flybrirf i11dins) ([C:ra~l, J>J>· '2Qj-­

~f1Jj; [St.aC7.1, pp. 7·1 ~~]). lkn·nt.ly, algorithmic nwt.hods h<WI' ll<'<'ll prupost•d

fur in ft•rri 11,1.!; hyhrid iza t.io11 uudc•r t.lw 1'0111 pa t.i bility [Srwiii], maximtt rn li kl'li hood

[l·i · l~t!, l.at.X:l], <111d phylop;l'll<'l.ic parsimony [FuuH.1. lll'i!JO. L1·C'~H. NPIH:L PhiS·I.

Tl1oX'2, Wag;WHII) nitt·ria. Tlw focus in t.ltis apfH'IIclix will he• on those• nwt.hods

hmwd o11 t.ht• phy logl'lll'l.it· parsimony nil.c•rion.

All lmown parsiruony-bas1•d nwthods infl'l' hyhrirlization using the dwractN

nmll il'l. i ndu('(•d hy hyhricls. In a phylog<•netic parsimony aualysis, th<• t.heort•t.ical

lo\\'1'1' limit. uu rost. is that. l'ach charact<•r-st.a.t<• transition 1'\'ellt. occurs only once

in a t.n•<•; t.lu• portion of <1 tr<'<' 's cost. a how this theoretical minimum cousists

of ildditional IIYJ)()Lh<•s<•s of dmract<•r-st.at.c• transition (hmnopla.-;y) which are r<•­

quir<•d l.o Pxplain clwr·act<•r slates that did not arise only once in that tree. The

phylop;l'll<'l.k parsimouy cl'it.eriou, in pl'l•fcrring trees of minimum cost, minimizes

lton1oplasy. Wh<•u t.h<• possibility of <'rror in character analysis has been ruled

out., homoplasy is a sign that e\'olutiouary processes not belonging to the single­

t.rcttlsit.iou, dichotomous-speciation model have occurred. Reticulation as defined

in t.his t.lwsis ('omprisl's one such set. of processes.

HiS

Page 187: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

. . • ·- ~ .. , ...... ~ .... .. .... . ... ..-...... .,, """_,_ .... ___, ,,_..,,~ . ............. ,_ft ·-"'-·~~.._... .. __

Following [Fuu~il]. all parsinwny-ha:wd nwt hod:- for iufcrriu~ hyhridi;mt j,,n

can lw classifit•d into t hn•p approadws. dt•tu•Juliu).!; on how I lwy dt•al with honw-

plasy.

I. ludntlt• rt>tirulatinn implicitly \'ia thl' ho111oplasy in till' must parsilllOJiiuus

tr<'<' [NP~I].

~. ldt•ut.ify aud J't'IJIO\'t•ltyhl'id t a:w IH•forc• phylo~t'llt'f ic illlalysis. aud iut rwluc·c•

rl'f.iculation <lfll'l' phylogt'IIPt.ic <lll<dy:-ois to ilC't'tJlllflloclatt• t ht•st• tax a oil tlw

hasis of homoplasy in t.lw most. parsimouions t.n•t• !Wi!p;W~OJ.

:t lndndt• all taxa in t.h<• phylog<'IIPt.k analysis, and int.ruclun• n·t.intl;•t iun ancl

hybrid taxa as IIPn•ssary ('it.lwr <luring [PhiS·tJ or aft.t•r [FunX!i, l.t·t'~X, Nt•IX:t,

Tho8~] ana lysis, 011 til<' has is of homoplasy.

Each of thc>sc• approachc•s has int.riusic diflicult.it•s ht•t·alJst• J't•t.il'ldat.ion ,., .. , lu• d•m­

acl.erized by a wide variety of charart.Pr-st.al.t• patl.l'ms, "ot.l1 within t.llt' pnulllt't•d

taxa and within any non-rdic:ulat.< ~ 1.rt•(' iuchuliug t.llt'St' l.ax;1 (F11uHi,, llu111~a,

Mc090, St.a.C75]. Mm·cnvPr, thesP approadws aw not sal.isfad.ory fur ddiu­

ing criterion-based problems heralls<~ tlwy ;u·t• hased on spt~cifk algorithms a.11d

heuristics (sec Section I).

There are no general difficulties with iuferriug rdir:ulat.iouusiug tlu: tmr!'iilllcmy

criterion: reticulations remove homoplasy ),y unifying tlw tJI:CIJI'f'ellt:t! of st!t•IJiiiiJ!.IY

inr.ompati blc character-states into a si ugle eveut, a uri t:au t.llus lw ra11 kt!rl (as am

I lHJ

Page 188: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

tn•c·s) hy tlu• tlc·t·rc·as<~ iu homoplasy that they indurP, llowewr. t}H•re are S<'\'Nal

spt••·i(i(' dillindt.iPs.

I. 1\s appropriat<· rl'l.irulation cau n•pr<'S<'III. any uumlH'r of char<trter-stat1•

t.ransit.ious in ollP <'\'<'111.. unlJolllHI<·d rl'f,iculation n•udc•rs dichotomous SJ>P·

,·iatiou irn·IPvant. illtd t.lw phylop;t•tJ('I.ir parsimony nit.Priou lll<'illtinglt•ss

!NPHI, pp. ~17-11Hj .

~. 1\s l!olltoplasy is lls<'d to justify t.IU' additiou of l'<'l.iculat.ion, it is no longc>r

possihlt• to liSt' honwplasy as il sign of possihiP <'rror in rhararter aualy­

si:; and •·odiug (t.lu• "s<'lf-illuminatiug" proJH'rty of phylog<'rwtir parsimouy

;tualysis [Will'\ I, p. J:m]).

:J. lt. is 11111rlt nton· dimrult. to inf<•r phylogenies hy hand using 1-IPnnigian argu­

lll<'tlf.at.ion [Wii~H, WSBFntj or hy algorithm whl'll r·t>tirulation is a!IO\\'<•d;

nwn•ov<"r, t.ht• produn•d pltylog<'ni<'s cannot lw n•adily used as the basis for

hit•rardtiral Linm•au classifications of stwcies.

Tlu• first I wo of t.IU's<' diflkulties arc actually guidelines for the formulation of

usdul c·onlput.at.ioual problems. By (I), a problem should only be able to infer

a limif.t•d iiiiiOHIIt. of wdl-dl·flned rt•tkulation for a given instance, and this limit

should ht• uudt>r thP control of the investigatoL By (2), such a problem should

ouly IH' im·okt•d aftN a non-reticulate analysis has been performed to detect

pos~ihlt• t'tTors in character roding, and to determine if there is any homoplasy

170

Page 189: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

. ...... . ~-· · · · ~ · . ' . . .. u .. .... . . .......... ...... . .. . ..... .... . .. . ... ... - ~ ... ~ •• _ .... ····-->'t"t .. -'IHO' .. -~.---·-····-·--· ·-·---··-... , .. ~-...

that ran lw t•xplaiw•d by r«'ticnlat iou. Tlw dillirult it•s in (:q fii'C' t'tlllst'tl'tt'nn·s

of searching for phylogt•ru•t ic t.n•t•s iu a ridlt'r hypot llt'sis·spat't', anti 11111st ht•

arn•ptt•d if rt'tirulat<' hypot ht•st•s art• dt•sirahlt•.

Th<' rt•lirulnlt• prohlt>lll scht•mat il dt•liru••l iu St•l'l io11 :t:U sctl isr_\' t.ht• tirsl l'oll·

dilion ahm·P. and a pron•tlun• pal 1Prltt••l artt•r that p;ivc•u i11 \N••Il'::,J, i11 whic·l•

l'Pliculatious an• addt•d o1w at a tinw to t.ht• must. pmsimunions I l't't' snl'h that.

t.lw homopla.sy l'<'llloVt'd with t'at·h illst>rt.iun is maximi~<'tl, will satisfy t.lll' st•routl.

Surh a pron·dmt• using tbPSt' sclH'mata would dilrt•r rrolll that. ill !Nt•hO::\l ill t.hat. it.

would ht• ahlt• to S<'HI'rh 0\'t'l' t.ht• wholt• span• of a\·ail<~hlt• rt'l.intlalt• phylo~t·uit•s,

not. just thost' that. cau ht• rt•<trllt'd by additions of J'PI.indal.iun t.u t.lw most. parsi·

mouious IIO!I-rct.irulat.t• phylogt•uy, aud may thus lu• il hiP to find lt•ss ollVious hut.

equally valid solutions. This procPdun· is uul. irnlllllllt' t.u t.lw prohlc•ms disc·usst•tl

ahove of r<•roguizing tlw paf.f.<•nrs of homoplasy t.hat. imply rt·l.inrla.t.ion, or tlw

possibility thatt.h~ ob;.{•rvcd homoplasy may ha.vt• ol.llt•r· t'il.IISt's ''·.'!;· rnult.iplc• spt'·

ciatiou, ecological convergence, or· llu~ inclusion of mw<'sl.ral l.it.X<I. iu t.lw ,l!;iVPII taxa

[NP81, p. 265]. Moreover, this proce<lme is uot. so much a uwt.lrorl for prorlul'inp,

phylogenetic trees as an aid for exploring t.lu~ sp;u·p of phylogt•udk hypot.lrc•sps,

However, this is consistcmt. with the vi<~wpoint that sy:;l.t•m;tt,in; tlons 1101. so lllllt'lr

derive evolutionary history as oht.ain succ:essively bett.c~r approximations l.o it.

The beauty aud power of these reticulate prohlern sdwmata is that. t.lwy rio uot.

depend ou the precise structure of the permitl.<!d rdic:ulat.icm <!Vmrt.s. This allows

I 71

Page 190: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

iuv,·st.ig;.t,ors to ddirw n•tinrlaticm ~ ~veuts appropriate to their uecds, ami rcn­

fl,·rs tlw corn•sporuliug N P-com pll'terlf'ss proofs for surlr prohlf•rns trh·ial. Otlrt•r

sdll'rn;~l.a uwy lu• ,lf.rillf'd by allowing \V('ightPd n•tirulations or polyuomially­

IHJIHIII<·d sl'l.s of forl1idd<•rr rPticul;tt.iorrs.

Tlw !JypPI'I!;I'ilfllr fornutl ism givPrr irr tlris tlwsis should IH' adoptPd t.o d1•srriiH'

n•l.inrlat<• 1'\'<'llf.s irr plrylog<'llf'l.ic systc•matics. llyJH'I'<li'<'S provid<' unific•d I'PJ>I'C'­

s<'lll.al.iorrs of c·umpl<·x c•\·olutiouary plu•rronw11a. Morc•ovc•r, such a formalism will

111a kc· till' l'f'l'ogu i tiou mrd t.rausf<·r· of n•l<>vaul. rt•sults fmm other fields t•asic•r.

'I'll«' N P-,·urrrpiC't.t•rr,•ss n·sult.s giv<'ll ill this t.lll'sis art• OIH' c•xamplc. Of J>C'rhaps

111on• prill'l.if'al liSP woulcl IH' til<' applic·at.ion of \vork dorH' in datahas1• design

[A I>SX(i, AN 1!10, IH'M YH:I, Fag8:l) to algori thrns for ron st. ructing rcticulat.c phy-

lo,'.!;«'lll'l.k 1.n•c•s.

li2

Page 191: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

··r--·- ~··- ........ ~\••' .. ... · ·~ 0 • 0 - ·~ 0 ... ~ . ... - .,_ . .. - - .. ~ "" " . .. . o · - · •• A ' ••• " .... ~ ........ 400 _, .............. -t~ .... ~..._._-. ........ IJ .... --.----·· - .. ,frol--. ·--

B The Computational Complexity of Phylo-

genetic Pa;,~simony Problems Incorporating

Explicit Graphs

Cousid<•t· t.h(• followiug dl'cisiou pruhlt·tu:

UNWEIGIITEil BINAHY \VAGNim Pt\HSIMONY \\'ITII GBAI'll (I' /1\\'r;)

Instance: Positive iutc•gc•r cl: graph (,' = ( \', 8), whc·rc· F = {0. I }'1 iltlcl /•,' = {{u,t,}: u,t' E \1 and u aud 1• diffc·t· i11 c•xactl.v cuw posit.iou}; a suhsC'I. ·"''of {0, I }'1; and a positive• int.c•gc•r /J.

Question: Is t.ltc·n• a phylugc•ny sat.isfyiug t.hc• \Vagtt<'l' phylu~c·twt.ic parsitttouy crileriou t.hat iuducl<•s S mul has lc•up;t.lt at. most. IJ'!

This problem differs from prohlm1 UBW ch•filll'cl itt Sc•c·l.iou a.:U hy itwlucling tlw

fi-dimeusional graph explicitly in it.s iustaun•. Both of t.hc•sc• prohlc•ms arc• iu Ni';

however, UHW has been showu NP-romplete [D.ISH(i, (:Fx~}, and the· c·omplc•xit.y

of U BWa is unknown. The complexity of{/ IJWr; is of iut.c•rc•sl. nul. ouly lwl'allsc•

it has been used in proofs of N P-com pleteness [DayH:J], hut. also llC'ratlsc• i I. Wo11lcl

be interesting to know by exactly how much t.lw cxpotaenl.ial paclcliug of t.lw iuptrl.

instance with G reduces the complexit.y of UBW.

To this end, consider the following rcstrict.ions 011 a phylogmH~t.ic: pa.rsir11cmy

problem n: let no(poly) he the subproblem 11 rest.ridefl l.o inst.anc:c~s suc:lt that.

lSI ::; p(tl) for some. polynomialp, anclll0 (ex7•) he Lite slt!Jprohletn of II rest.rktc!d

to instances such that c2J $ lSI for some constant c, c > 0. The former rest.rkt.icm

Page 192: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

lli~ldi~!Jt.s, awl t.lu• lat.tn isolal.t!s, t.hc• r·omplexity iut.rorlucPcl by padding. If thP

r·onJplt•xit.y of 11/JWr: r·<LII!Jol. he dt•h•rnJiiJ(•d rlin•rt.ly, t.hc•:w r<'strictc•cl subproblems

lllii.Y still ,f.!;iVt' loW('J' houucls.a

( :ousid('r tlw t'OIIIpl('xit.i('s ,,f I I BW aud I! BWr;. As nwutioncd already, UB\V

is N l'·l'olllpl('t.<·. If I 1/lWr; is Nl'-l'olllplt~t.c•, then IJBW ~:;1 lf BJVc;; sur.h a redur-

tiou is diflir'tJII. to visualiz<·, lH'CilliS<' it. impli<•s that a problem on dimPusion d ran

lw rllapp<•d onto <trl c•quivalc·nf. problem of dimension O(log d). Alternatively, the

padding iut.nulucc•1l J,y G 111ight yi<'l<l polyuomial algorithms via algorithms that

solve• t.h<· pn,l,l<'lll STEINER THEE IN GRAPHS (s<~c Section 3.2.1). However,

all know11 STG algOI'it.hms, iucludiug lhosP rc>slricled to d-clirncnsional graphs,

al't· lilwar iu 1(,'1 and PXponPnt.ial in IS'I [Sny92, Win87).

( !onsid<•r now t.IH' rompl.-xit.ic•:; of (! BW0 (f'rp) and U Bwg<f'zl•l. These proh-

l«'ms an• ,·umput.at.ioually equivak•nt i.e. ll BW2(r-rp) $f,1

U BW0 (f'rr•) (discard G),

a till I I JJWil(r.r,,) :::;~1 U IJW~(rr,,) (add G, which can he constructed in time lin-

<'ar i11 tlu• siJ'.<' of 1111 iustance of{! BH!O(rrJ•l). Both of these problems reduce to

(//JIVr;; hoW<'V<'r, for reasons similar to those given above, it is not obvious that

t.lwy an• Pit.ll<'r NP-completc or in P.

'l'h<• rompl<•xity of the third pair of problems, U BW0 (voly) and U Bwg<voly),

is til<' most. ;nten•sting. The reduction from VERTEX COVER given in [DJS86)

asinn• this t.ht•sis wn.o; submit.t<'d t.o the referees, I have found out that the weighted version of U HU'<; (art.ually, t.ht• weighted version of U BW~(ezp)) has been shown to be NP-complete [<hl!;!)l, St•rlion 6J . While this does not immediately affect the problems examined in this st•rt.ion, it. may ht• n stimulus for furt.hrr rrscarrh.

174

Page 193: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

l i !·

.. ·-

---

Theorem 58 u BW0 (J•oly) <1' [f nn:(~(J•oly} ;r (lfl(l olllrJ if J> = N/1 -m (, J , '

Proof: The impliration from right. to ll'ft. is l.t'i\'ial. Tlw illlplil'al.ion from ll'l't

to right follows hy this ronstruct.ion: A rc·<luct.ion fwm an inst aun• of I' 1/1 rr>l,···1"l

to (I /JW;Jb•oly) must map it polynomiallllllllht•r of \'t't't.in•s in a p;raph of tlinu•u-

sion d into a polynomial numht•r uf \'<'rt.in•s in et p;raph uf lup;arithmically lm\'t'l'

also an instanre of ll J3W 0 (Jmly) . Hc'{l<'rtf. t.ltis prot'('SS a polyuumial lllllllln•r or

times to produce an inst.ann.• of dinwnsiun 0( I), wiJich n111 lw sol\'c•d in <'ollsl.clllt.

time. This yields a polynomial Hlgorit.lun for II /JW0 h'"1UI , which implit•s t.hat. P

= NP. I

Corollary 59 If P-:/:- NP then U JJW{j11101111 i.~ no/. Nf.J·f'OliiJildt·

All optimal solutions l.o instances of I! IJW{!1:'"

11' 1 an• of sir.<• polytlolllia.l in tl (s<'<'

Section 3.2.1 ), and hcucc~ of size polylogari t. lunic iu t.ltc~ iu:-;I.IIJU'(' of(/ nw::(l·,fu).

Thus, U Bwg<raly) is iu /Jpolylou' the class of dedsiou prol•lmus n~quit·iup; uuly

polylogarithmic nondcterminism, which is prohahly st.rktly r.outaiu<•tl J,dW<!<!II

p (= .Biogn} and N~ (= uk~I (-Jflk) [DT90, p. ~~]. Morf!OV<~r, J,y tiH~ Dreyfu:-;­

Wagner STG algorithm ([Sny92, Section 2} [Win87, Section 1 .~}), If uwJ:<rmtyl is

in O(nO(Iogn)).

175

Page 194: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

TlwrP is f!Vf'll drr:um:;tautial evideuce that. U BWf}(poly) is ill P. Au Pllr.mling

of;, ~raplr (J = ( V, E) is s1U:dnd if it is of siz<.~ polyloga.rit hmir in lVI [GW83J; an

I!X;unplt! of Sllf'h au f'llr:otliug is a polylogarithmically sized circuit that computes

tlw adjaf'f'llf'j lllat.rix for r:. Though {/ IJWO(poly) does not explicitly incorporate

au f'IH'otliug of r:, its prohl<mr inst.aures will always be of size polylogarithmic

iu r:; h!•tu'f', II HW0 '""1Y) ran h1· r.ousidr.recl as the succinctly encoded version of

II nw::h•oluJ. Ju gf•twral (cf. [LW!J~]), succinct cncodings precisely exponentiate

tlrf' t.iJJII' t·umpiPxit.y of graph problems e.g. the succinct version of t! • .J trivial

graph JII'OPI'rt.y r.ri.<~lrufr of a lrianglr is NP-hard [GW83, Theorem 2. I], and

the• surcind. vc•rsiun of t.lw NP-complde problem :3-COLORABILITY is NEXP-

<'Olllpl<•t.•• [PY~H, Corollary). If a problem fl is P-hard via a certain type of

n•dud.ion f'allc•d a ·. >()jrrlion from lht• Circuit Value Problem, then the succinct

t'lll'ucliug wrsion of ll is EXP-hard [PY86, p. 184); if, in turn, the succinct

t'II<"U<liug vt•t·siou of II is NP-complete, t.hen P # NP.

Corollary 60 If lf 13Wf:(lwly) is P-hard via a projection from the Circuit Value

Many rlassknl polynomial-time reductions can be easily made into projections

[PY~(1, p. I S2]; this may also be true of the log-time reductions used to establisl.

P-hanlness. As P =F NP probably cannot be proved in our standard system of

lo,~ir [G.Ji!J, p. 186], it i~ unlikely that U B~Vg(poly) can be proved to be P-hard,

and likc•ly that it. is in P.

Ii6

Page 195: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham

.,.. ____ .,._. ........ ._.. ....... - ....... -~---------,--___.,._._..._____ -.w ............. ..,.......,. ........... ~-- ~ -

U BWO(poly) NP-C

l1131F

~---------------------+-----

U13Wu ---------~

U Bwg(voly) : NP 1

p : NP

Figure 13: Reductions among implicit and c~xplkit. gmph Uuwdght.c•cl Binary Wagner parsimony decision problems. Reductious II ::;!;

1 II' an• dc•tuJI.c•d hy arrow:-~

from n ton'. The abbreviations NP-C and NPI stand fOI' the daSSl'S N I'·C'OIIlplc•t.c• and NP-intermediate (= NP- (P U NP-C)}, respediwly.

The known relations among problems examiucd iu this sPction art~ sun11na.-

rized in Figure 13. { conjecture that. U BW{;(Iwly) is in P and t.hat. { f/JW'1(u·p),

U BWg(Pxp), and U BWa arc all strictly contained hdwc•cm P and N P-c·ompld.c•.

To my knowledge, u awg(po/y) and u BWO(po/y) are f.Jw only prohlc~JII·pil.il' sudl

that the complexity of the succinct encoding version is known lmt. f.lw wruplc~xif.y

of the full graph version is unknown. This iu itself malws !.ht•lfl cr.ndidal.c!s for

further research.

177

Page 196: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 197: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 198: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham
Page 199: CENTRE FOR NEWFOUNDLAND STUDIEScollections.mun.ca/PDFs/theses/Wareham_HaroldTodd.pdf · ON TilE COMPIJTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES BY 0 II arold Todd Wareham