Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

Embed Size (px)

Citation preview

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    1/23

    Time and Knowability in Evolutionary ProcessesAuthor(s): Elliott Sober and Mike Steel

    Source: Philosophy of Science, Vol. 81, No. 4 (October 2014), pp. 558-579Published by: The University of Chicago Presson behalf of the Philosophy of Science AssociationStable URL: http://www.jstor.org/stable/10.1086/677954.

    Accessed: 26/09/2014 15:47

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    The University of Chicago Pressand Philosophy of Science Associationare collaborating with JSTOR to

    digitize, preserve and extend access to Philosophy of Science.

    http://www.jstor.org

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/action/showPublisher?publisherCode=ucpresshttp://www.jstor.org/action/showPublisher?publisherCode=psahttp://www.jstor.org/stable/10.1086/677954?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/10.1086/677954?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=psahttp://www.jstor.org/action/showPublisher?publisherCode=ucpress
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    2/23

    Time and Knowability in

    Evolutionary Processes

    Elliott Sober and Mike Steel*y

    Historical sciences like evolutionary biology reconstruct past events by using the traces

    that the past has bequeathed to the present. Markov chain theory entails that the passageof time reduces the amount of information that the present provides about the past. Here

    we use a Moran process framework to show that some evolutionary processes destroyinformation faster than others. Our results connect with Darwins principle that adaptive

    similarities provide scant evidence of common ancestry whereas neutral and deleterious

    similarities do better. We also describe how the branching in phylogenetic trees affects

    the information that the present supplies about the past.

    1. Introduction. What is the epistemic relation of present to past? Absent atime machine, we are trapped in the present and must rely on present tracesto learn about the past. There are memory traces inside the skull, but outsidethere are tree rings, fossils, and traces of other kinds. People use these tracesto reconstruct the past. Sometimes they simply assume that the traces pro-vide unerring information about the past, but often they realize that the jumpfrom present to past is subject to error. A bevy of epistemic concepts can

    be pressed into service to investigate the relation of present traces to pastevents, ranging from strong concepts like knowledge and certainty to more

    modest ones like justi

    ed belief and evidence.

    Received February 2014; revised April 2014.

    *To contact the authors, please write to: Elliott Sober, Philosophy Department, University of

    Wisconsin, Madison, WI, USA; e-mail: [email protected]. Mike Steel, Biomathematics Re-

    search Centre, University of Canterbury, Christchurch, New Zealand.yElliott Sober presented this paper in 2012 at the University of Bordeaux, the London School

    of Economics, and the Institute for Mathematical Philosophy at the Ludwig Maximilian

    University in Munich, and received valuable comments. We are grateful for these and also

    to Elchanan Mossel for helpful comments. Elliott Sober thanks the William F. Vilas Trust of

    the University of WisconsinMadison, and Mike Steel thanks the Allan Wilson Centre (New

    Zealand).

    Philosophy of Science, 81 (October 2014) pp. 558579. 0031-8248/2014/8104-0009$10.00

    Copyright 2014 by the Philosophy of Science Association. All rights reserved.

    558

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    3/23

    We are interested in how the natural processes connecting past to presentconstrain our ability to know about the past by looking at the traces found in

    the present. An optimistic view of these processes is that the past is po-tentially an open book; all we need do is understand the connecting pro-cesses correctly and look around for the right traces. If the relation of the

    past state of a system to its present state were deterministic and one to one,this optimistic view would be correct. If only we could know the presentstate with sufcient precision, and if only we could grasp the true mappingfunction that connects present and past, we would be home free. This op-timism is something that Laplace1814, 4afrmed when he discusseduneintelligencenow referred to as a demon:

    We may regard the present state of the universe as the effect of its past and

    the cause of its future. An intellect which at a certain moment would know

    all forces that set nature in motion, and all positions of all items of which

    nature is composed, if this intellect were also vast enough to submit these

    data to analysis, it would embrace in a single formula the movements of

    the greatest bodies of the universe and those of the tiniest atom; for such

    an intellect nothing would be uncertain and the future just like the past

    would be present before its eyes.

    It is worth noting that determinism is not sufcient for the optimistic view tobe true; without the one-to-one assumption, distinct states of the past may

    map onto the same state of the present, with the consequence that the exactstate of the past cannot be retrieved from even a perfectly precise grasp ofthe present.

    Is determinism necessary for the optimistic view to be true? It is not,provided that we set to one side strong concepts like knowledge and cer-tainty and take up an epistemic evaluation that is more modest. Consider,for example, a process in which the system is, at each moment, in one oftwo statescoded 0 and 1. Suppose Past5 0 makes Present5 0 extremely

    probablesay, 0.96and that Past5 1 makes Present5 1 extremely prob-able as well say, 0.98. This means that when we observe the systems

    present state, we gain strong evidence that discriminates between the twohypotheses Past5 0 and Past5 1. We cannot infer from Present5 0 thatthe past state was certainly 0; in fact, we cannot even infer that the past statewas probably 0. But we can conclude that the observation favors the hy-

    pothesis that Past5 0 over the hypothesis that Past5 1. This conclusion islicensed by what Hacking1965, 5962calls the Law of Likelihood:

    Observation O favors hypothesis H1 over hypothesis H2 if and only if

    PrOjH1>PrOjH2.

    Royall 1997, 911 suggests that this qualitative principle should besupplemented by a quantitative measure of favoring:

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 559

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    4/23

    The degree to which O favours H1 overH2is given by the likelihood ra-

    tio PrOjH1=PrOjH2.

    Royall further suggests that a reasonable convention for separating strongevidence from weak is a ratio of 8. Royalls suggestion entails that the prob-abilistic process just described has the consequence that Present5 0 pro-vides strong evidence favoring Past5 0 over Past5 1, since the likelihoodratio is 0.96/0.025 48.

    This simple example should not be overinterpreted. If there were a pro-cess connecting past to present in the way described, the present would pro-vide strong evidence about the past. Do not forget the if. Perhaps there aresuch processes, especially when the past under discussion is the recent past.But what if we consider not just the recent past, but past times that are moreand more ancient? How does increasing the temporal separation between

    present and past affect the amount of information that the present providesabout the past?

    2. Two Theorems. A simple theorem provides an answer to the questionjust posedCover and Thomas 2006. Consider a system that at any time isin one ofnpossible statess1,s2, . . . ,sn. For simplicity we shall think of thesystem as evolving in discrete time steps. We stipulate that the system hasthe following two properties:

    The Markov property. For any two times t1

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    5/23

    This condition asserts that it is possible to move from any given state to anyother state including the original state in a fixed finite number of steps with

    a probability that remains bounded above zero. Regularity for finite-statetime-homogeneous Markov processes is equivalent to the condition thatthe Markov chain is aperiodic and irreduciblefor details, see Hggstrm2002, corollary 4.1.

    For any system of this sort, the following result holds a precise state-ment and proof is provided in app. A:

    Exponential information loss theorem. If a nite-state system satises the

    Markov property and regularity, then IPast; Presentis less than or equal toa term that approaches zero exponentially fast as the time between the

    present and past increases.

    Here IX; Y is the mutual information linking the two variables. If thevariables are discrete, the formula for this quantity is

    IX;Y 5 oyYo

    xX

    px; ylog px; y

    pxpy

    ;

    wherepx,yis the joint probability thatX5 xandY5 y, andpx,pyarethemarginalprobabilities thatX5 x, and thatY5 y, respectively. Mutual

    information measures how muchon averageyou learn about the state ofone of the variables by observing the state of the other. Its value is zero whenXand Yare independent; otherwise it is positive. Mutual information is sym-metrical:IX; Y5IY;X.

    The exponential information loss theorem generalizes the special casewhere transition probabilities are constant with timeSober and Steel 2011,

    proposition 6. This generalization is important, since many processesin-cluding the biological examples we will discuss often change their ratesfrom one period of time to another.

    Note that the theorem does not ensure a monotonic decline in informa-

    tion as the temporal separation of past and present is increased. That extraelement is provided by a different result, the so-called Data Processing In-equalityDPI; Cover and Thomas 2006, 32:

    The Data Processing Inequality: In a causal chain from a distal cause D to

    a proximate causePto an effectE, ifPscreens-offDfromE, thenIE;Dis less than or equal to both IE; Pand IP; D.

    For a discrete-state process, these two inequalities are strict wheneverPis nei-

    ther perfectly correlated with D or with E, nor is P independent of them seeapp. B. The Data Processing Inequality does not require that the process link-

    ingD to Pis the same as the process linkingPto E.

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 561

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    6/23

    The information-processing inequality is chain internal. It does notsay that the present always provides more information about recent eventsthan about ones that are older. Considergure 1. Suppose thatRscreens-offA from D1 & D2 Then the information-processing inequality says thatID1& D2;R ID1& D2;A. It does not say that ID1& D2&: : :&

    D100;R ID1& D2&: : :& D100;A. It is perfectly possible that the hun-dred descendants that now exist provide more information about A thanthey do aboutR. Note thatR does not screen-offA fromD1& D2& . . . &D100. The heightened information that the leaves provide aboutA is not dueto the simple fact thatA has a larger number of outgoing lineages than Rdoes; it is possible to modify this tree so that each nonleaf node includingthe root node A has just two descending lineages Sober and Steel 2011,g. 5b, 24334, and still the leaves provide more information aboutAthanthey do aboutR, owing to the values of the transition probabilities attach-ing to branches.

    Both the exponential information loss theorem and the Data ProcessingInequality are very general. They characterize any system whose laws of mo-tion have therequisite probabilisticfeatures. The system mightbe a chamber ofgas, but it also might be an evolving population of organisms. Indeed, if therewere disembodied spirits that changed probabilistically, the results wouldapply to them. Both results are more general than physicsthey cover thesystems and properties that are discussed in the laws of physics, but they alsoapplyto systems andpropertiesthatarenot.In addition, both results are a priorimathematical truths, although of course it is an empirical matter whether agiven system satises the antecedent of the conditional that each result ex-

    pressesSober 2011a.

    Figure 1. A case in which it is possible for the present to provide more informationabout an ancient eventAthan about a more recent eventR.

    562 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    7/23

    3. Five Evolutionary Processes. Since our interest here is in how evolu-tionary processes affect the amount of information that the present provides

    about the past, it is worth making clear how the exponential informationloss theorem applies to models of biological evolution. In phylogenetics,the rapid loss of information in models of nucleotide substitution with timehas been highlighted as a signicant problem for using DNA sequence datato accurately resolve deep divergences of species lineages for a recent re-view, see Salichos and Rokas2013and for inferring ancestral states deepwithin a given treeMossel 2003; Gascuel and Steel 2014.

    We will return to phylogenies in a later section, but for now we considerinformation loss in population genetics. The application of information the-ory to population genetics has been investigated a bit. For example, Frieden,

    Plastino, and Soffer2001explore a variational principle

    extreme phys-ical informationbased on Fisher information to study genotype frequencychanges. The questions we consider here are different, as our primaryinterest is in the relative ranking of likelihood ratios across different evo-lutionary processes, including both drift and different types of selection.The Moran 1962 models of evolution will be our workhorse in whatfollows. We consider a population containing Nindividuals. The popula-tion evolves through a sequence of discrete temporal moments howlong a moment is will not matter. At each moment, one of those N in-dividuals produces a copy of itself and one of those Nindividuals dies. We

    consider two traits A and B; each individual has one of them or the other.At any moment, the population is in one ofN1 1 statesranging from 0%A to 100% A. This Moran framework can be articulated in different waysto represent different evolutionary processes. For example, if individualsare chosen at random to reproduce and die, then we have a drift process.Selection processes of different kinds can be represented by letting Aindividuals have chances of dying or reproducing that differ from those

    possessed by B individuals. A population undergoing a Moran processforms a Markov chain, with its recent past screening-off its more remote

    past from the present.

    Suppose we observe the population in the present and see that all Nin-dividuals are in state A. How much information does that observation pro-vide about the state of the population at some earlier time? If all states areaccessible to each otherwhich requires that mutations can prevent the pop-ulation from getting stuckat 100% A or 100% B, then the exponentialinformation loss theorem applies and so the mutual information declinesasymptotically to zero with time. However, if there is no mutation, then the

    population will evolve to either 100% A or 100% B and will stay there. In thiscase, the present state of the population provides information about its pasteven if the two are innitely separated. For example, if we observe that the

    population is now 100% A and the population has been evolving by drift, this

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 563

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    8/23

    observation favors the hypothesis that the population was at more than 95%A at some earlier time over the hypothesis that it was 5% A or less, and this

    is true regardless of the time separation between the past and the present.John Maynard Keynes1924, chap. 3once said thatin the long run, weare all dead.His point was to pooh-pooh the relevance of claims about theinnite long run. What should matter to us mortals is nite time. This pointapplies to the bearing of the exponential information loss theorem on ourknowledge of the evolutionary past. Who cares if mutual information goesto zero as the time separating present from past goes to innity? Life onearth is a mere 3.8 billion years old. What is relevant is that informationdecays monotonically in Markov chains. But in addition, we know thatthere are different kinds of evolutionary process. Which of these speeds the

    loss of information and which slows it?The ve processes we want to investigate are represented in gure 2;

    each of them can result in our present observation that all N of the in-dividuals in the population now have trait A. In panel i, trait A was favored

    by selection. In panel iii, there was selection against trait A. In panel ii, thetraits are equal in tness, and so the traits evolved by pure drift. In panel iv,selection favored the majority trait. And in panel v, selection favored theminority trait. We are not asking which of these process hypotheses is most

    plausible, given the observed state of the population at present. Rather, wewant to explore what happens to information loss under each of these ve

    scenarios, in each case thinking of the process in the context of the Moranframework of a nite population ofxed sizeN.

    To compare the ve processes in this respect, we calculate the followinglikelihood ratio for each of them:

    Rij5PrPresent5NjPast5 i

    PrPresent5NjPast5j; fori >j:

    Although our main interest is to compare cases in which i is close toNandjis close to 0, the Moran framework makes it easy to derive results for the

    more general case in which i > j.So the observation is that all Nof the individuals now in the population

    have trait A. Does this observation favor the hypothesis that there wereexactlyi individuals at some past time who had trait A over the hypothesisthat there were exactly j, where i > j? It does; Rij> 1 for each of the veprocesses we are consideringsee app. C. Our question is how the mag-nitude ofRijdepends on the underlying evolutionary process. It is worthnoting that although the likelihood ratio Rij is past-directed in that itdescribes the degree to which a present observation discriminates betweentwo hypotheses about the past, evaluating this ratio requires one to consider

    564 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    9/23

    Figure 2. Five processes that can result in all N of the individuals now in thepopulation having trait A.

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    10/23

    two future-directedprobabilitiesthe probability of reaching Present5Nif the system begins at Past5 i and the probability of reaching Present5Nifthe system begins at Past5 j.

    We begin by adopting an assumption that we treated above with Keynes-ian disdain. Let us assume that the temporal separation of past and presentis innite and that there is zero mutation. We will relax this idealization indue course. For each of the processes we are considering, we now can de-scribe what the value ofRijis for each pair of values fori andjsuch thati >

    j. For example, under neutral evolution the value ofRij is i/j. The valuesfor the other four processes are given in appendix C. The ordering of theRijvalues for the ve processes is depicted in gure 3.

    Let us rst consider three of the cases described in gure 3drift and thetwo cases of frequency-independent selection. The ordering ofRijvalues forthese three processes means that the observation that all the individuals inthe population now have trait A provides more information about the paststate of the population the less probable it was that A would evolve toxation. Selection for A is at the bottom of the pile, with neutrality next, andselection against A at the top.

    The ordering of these three processes has an intuitive interpretation.Suppose we observe that trait A is xed in a population and we wish toestimate whether A was commonat frequencyior rareat frequencyj

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    11/23

    formal proof concerning how theRijvalues for different processes compare,see appendix C.

    The three results depicted in

    gure 3 that describe frequency-independentprocesses echo an insight that Darwin expresses in the Origin: Adaptivecharacters, although of the utmost importance to the welfare of the being,are almost valueless to the systematist. For animals belonging to two mostdistinct lines of descent, may readily become adapted to similar conditions,and thus assume a close external resemblance; but such resemblances willnot revealwill rather tend to conceal their blood-relationship to their

    proper lines of descentDarwin 1859, 427. Darwin illustrates this idea bygiving an example: whales and sh both have ns, but this is not strongevidence for their common ancestry, since the trait is an adaptation for swim-

    ming through water. Far stronger evidence for common ancestry is providedby similarities that are useless or deleterious. One of us has called this ideaDarwins Principle, in view of its centrality to Darwins frameworkSober2008, 2011b.

    Darwins topic in the passage quoted is inferring common ancestry, notinferring the past state of a lineage from its present state, but the episte-mologies are similar. Here is a simplied argument that illustrates why.Suppose trait A has probability pof being xed in a recent species. If twospecies x and y diverged from their most recent common ancestor veryrecently, the probability that they both have trait A is approximatelyp. On

    the other hand, if they have no common ancestor, then the probability ofthem both having A is p2. This means that the likelihood ratio of the hy-

    potheses x and y have a recent common ancestor and x and y have nocommon ancestor is approximately p/p2 5 1/p under Markovian traitevolution. Thus, the smallerp is, the greater this likelihood ratio will be.And the value forpif there is selection for A is larger than the value forpifthere is drift, which in turn is larger than the value forp if there is selectionagainst A.

    There are two cases of frequency-dependent selection represented ingure 3. One of themfrequency-dependent selection for the majority trait

    shows that it would be an overstatement to say that the current state of thepopulationall individuals having trait Aalways provides scant evidenceconcerning the populations past state if trait A evolved because of natu-ral selection. It matters a great deal what sort of selection process we aretalking about. Frequency-dependent selection for the majority trait is bet-ter than drift in terms of how much information the present state of a line-age provides about its past. Figure 3 also locates the evidential meaning offrequency-dependent selection for the minority trait; it has an informationalyield that is worse than that provided by drift. As noted in appendix C, ourresults for both cases of frequency-dependent selection make the assump-

    tion thatj

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    12/23

    dependent selection cases is always maintained as shown in gure 3; it isan innocent assumption since, as noted above, we are mainly interested in

    the case where i and jare majority and minority values, respectively.1

    Figure 3 provides only a partial ordering of the ve cases depicted. Thereason for this is that a comparison of, say, frequency-independent selectionagainst A with frequency-dependent selection for the majority trait woulddepend on the values of specic parameters.

    We now can remove the idealization of innite time and zero mutation.The ordering of theRijratios for the ve processes, when time is innite andmutation is zero, is the same as the ordering of those processes for thefollowing slightly different likelihood ratio, when time is niteand suf-ciently largeand there is a sufciently small mutational input:

    PrPresent5NjPastN

    PrPresent5NjPast 0:

    See appendix D for a proof of this ordinal equivalence result. Here PastNand Past0just mean any states close to Nand to 0respectively,which are then held xed across the ve models. With mutational input,the exponential information loss theorem applies to all ve processes. Themutual information between present and past declines monotonically astheir temporal separation increases, but the decline is faster under some

    processes than it is under others.Our results are derived within the setting of the Moran model of popu-

    lation genetics. This is a nite-state Markov chain that forms a continuantprocessEwens 2010. That is, at each step the number of individuals car-rying a particular allele in the population ofxed sizeNeither goes up by 1,or goes down by 1, or it stays the same. We expect similar conclusionsconcerning the partial ordering of the Rij ratios for other continuant pro-cesses, provided the transition probabilities faithfully reect the varioustypes of selection being compared. Our results also apply to a slightly dif-ferent problem: how does the kind of evolutionary process at work in line-

    ages affect the strength of the evidence that similarity provides for commonancestrySober 2008?

    4. Mutual Information and the Likelihood Ratio. We began our discus-sion of information loss by using the concept of mutual information butthen shifted to considering the likelihood ratio. Have we illicitly changed

    1. The situation is messier if you want to consider frequency-dependent processes forthe full range of all i, jvalues such thati >j. Ifi >j>N/2, then frequency-dependentselection for the majority trait is rather like frequency-independent selection for trait A,and we know thatRijfor frequency-independent selection for trait A is less than Rijfordrift.

    568 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    13/23

    horses in midstream? We think not. Consider the mutual information be-tween the binary random variableEthat takes the value 1 if allele A is xed

    in the present populationE5

    0 otherwise, and the frequencyXof allele Ain the past population, under some strictly positive prior distribution. In thiscase, mutual information and the likelihood ratio are linked as follows:

    IX;E 5 0precisely whenRij5 1for all past statesi; j:

    If we now observeE5 1, then instead of comparing IX;Eacross the dif-ferent processes, it is more relevant to compare the values ofRij. That is, weare thinking of a single observationPresent5 N, so we are not consider-ing all the possible observations that we might make of the present state.

    This is why we used the likelihood ratioRij rather than mutual information tocarry out the cross-process comparison.

    5. The Impact of Branching on Information. We have emphasized thatloss of information within a lineage is a fact of life. However, the branchingthat takes place in evolution is a force that pushes in the opposite direction,since it creates new lineages. As illustrated in gure 1, it is possible for anancient ancestor to have more present-day descendants than a more recentancestor has, and this means that the information lost to the passage oftime can be offset by the proliferation of descendants that each bear wit-

    ness to the ancient ancestors state. If the process is a symmetric one ontwo states, it is possible to describe precisely how often branching mustoccur if information loss is to be offset in this wayEvans et al. 2000; formore general processes, one must usually be content with upper and lower

    bounds.It might be asked why one needs to worry about observing descendants

    to infer the characteristics of ancestors. Doesnt observing fossils provide asimpler and more denitive solution? Our answer has three parts. First, onecannot assume that a fossil comes from an ancestor of extant species; it may

    just be an ancient relative. Second, fossils provide evidence about the

    morphological hard parts of ancient organisms; molecular characters, not tomention phenotypic features of physiology and behavior, typically do notfossilize. And nally, fossil traces degrade and are subject to the exponen-tial information loss theorem if fossils change state in conformity with theregularity assumption and have the Markov property.

    Just as the process at work in lineages has an impact on information loss,so too does the topology of the branching process itself. It might seemintuitive to conjecture that the star phylogeny shown in gure 4i is betterthan the bifurcating topology shown in gure 4ii in the sense that the formertopology allows the observations to provide more information about the

    root than the latter topology does. In order to hold other factors xed, we

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 569

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    14/23

    assume that the two topologies have the same number of leaves and that theprocess at work in branches is the same in the two topologiesin particular,that the expected number of substitutionsthe branch lengthsbetweenthe root and each leaf match up for the two trees. The conjecture just statedseems reasonable, since in gure 4i the observations are independent ofeach other, conditional on the state of the root, whereas in 4ii, the obser-

    vations are not conditionally independent. The guess is correct for a two-state symmetrical process when we compare the two trees in gure 4Sober1989, 280. More generally, this guess holds whenever we compare any

    binary tree with a matching star phylogeny on the same number of leaveswhen a two-state symmetric process is at workEvans et al. 2000. But,surprisingly, the guess is not always true for symmetric Markov processeswith more than two states.

    More precisely, consider a completely balanced binary tree with 2n leaves,on which a constant symmetric process on ve or more states operateswith a constant substitution probability on each edge of the tree. Then if this

    substitution probability lies in a certain region, and n is large enough, themutual information between the leaf states and the root state can be higherfor the binary tree than for a comparable star tree on the same number ofleavesSly 2011, theorem 1.2. Here comparablemeans that the expectednumber of substitutions from the root to any tip is the same in both trees.In other words, the root state of a tree can sometimes be more accurately

    predicted from the state of its leaves when the tree is binaryand the leavesare correlatedthan when the tree is a starand the leaves are independent,where the two trees have the same marginal distribution. A similar resultholds for strongly asymmetric two-state modelsMossel 2001, theorem 1.

    So the intuitive maxim that testimony from independent witnesses of an

    Figure 4. Does observing the four leaves of the star phylogenyi provide moreinformation about the state of the root than observing the four leaves of the bi-furcating phylogenyii?

    570 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    15/23

    event provides more information about the event than testimony from oth-erwise similar dependent witnesses is sometimes false.

    6. Conclusion. In summary, the view of evolution as an information-destroying processis basically right, but it overlooks some interesting de-tails. For some special processese.g., zero mutation in simple populationgenetic models, information never completely disappears, even after innitetime. For example, under a drift model with zero mutation, some informationconcerning whether allele A was initially in the minority or the majority isalways detectable at any time in the present frequency of A which eventuallyxes at 0 orN in the population. At the other extreme there are certaindiscrete timeprocesses for which the information can collapse completely

    to zero in nite timeMossel 1998; Sober and Steel 2011, 233.The more usual situation lies between these two extremes; for processes

    that are Markovian and regular, the information between past and presentdecays at an exponential rate and vanishes only in the limit. Here we canstill compare the relative support such models provide in estimating an an-cestral state from an observation today. For the ve models considered, thissupport varies in a predictable way depending on the type of model assumed.

    We are well aware that the Moran framework contains various simpli-cations e.g., constant population size, but the models simplicity allows forexplicit calculations and results and does not raise questions as to whether

    the ordering might depend on the complexities that might be introduced in amore intricate model. We hope that our results will provide a foundation forexploring how adding complexities affects information loss. In philosophy aswell as in science, it is reasonable to walk before one runs.

    The estimation of an ancestral state from the leaves of a phylogenetic treeexhibits further subtleties, along with a surprise: the independent estimatesobtained from a star phylogeny may or may not be more informative thanthe correlated estimates obtained from a binary tree, depending on the num-

    ber of states, the size of the tree, and the substitution rate. The episte-mological principle that the testimony of independent witnesses always

    provides more evidence than the testimony of otherwise similar dependentwitnessesis wrong.

    Appendix A

    Formal Statement and Proof of the Exponential Loss Theorem

    Proposition 1. SupposeXt;t 0 is any discrete, nite-state Markov process

    that satises the following condition. For some > 0, and integerN>0 the

    following inequality holds for all t 0 and states i, j:

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 571

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    16/23

    PrXt1N5jjXt5 i :: A1

    ThenIX0;Xt Cexp2

    ctfor constants C, c>

    0.Proof. LetYn be the N-step Markov chain dened by Yn 5 XnNfor alln 0. Note that equation A1 implies thatYn satises the following in-equality for alln,i,j:

    PrYn11 5jjYn 5 i : A2

    We will show that

    IY0;Yn Bexp2bn A3

    for constantsB,b >0, from which proposition 1 follows because ift5 nN1 rfor 0 r

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    17/23

    By the law of total probability we can express PrY0

    k115jjY

    0

    k5 i as

    follows:

    PrY0k115jjH & Y0

    k5 iPrHjY0

    k5 i 1 PrY0

    k115jjH & Y0

    k5 iPrHjY0

    k5 i

    5 pk11j1 PrYk11 5jjYk5 i 2 pk11j 5 PrYk11 5jjYk 5 i:

    Summarizing, we have: Y0

    05 Y0, and PrY

    0

    k115jjY

    0

    k5 i 5 PrYk11 5jjYk

    5 i, and so Yk and Y0

    k describe the same Markov chain. In particular,

    IY0;Yn 5IY0

    0;Y

    0

    n, and pnj 5 PrY

    0

    n5j. Let

    pni; j :5 PrY0 5 i & Yn 5j 5 PrY0

    05 i & Y

    0

    n5j;

    and letAn

    be the event thatHoccurs at least once in the rstn biased cointosses that are performed in the construction of the chain Y0

    1; Y0

    2; : : :. Then

    pni, jcan be written as the sum of two terms:

    PrY0

    n5jjAn& Y

    0

    05 iPrAn& Y

    0

    05 i

    1 PrY0

    n5jjAn& Y

    0

    05 iPrAn& Y

    0

    05 i: A5

    Now, the rst term inA5is exactly pnj12 12 np0isince

    conditional onAn, Y0

    nis independent ofY

    0

    0and, in addition, Y

    0

    nis dis-

    tributed asp

    n; andAn and Y

    0

    0are independent, and so

    PrAn& Y0

    05 i 5 PrAnPrY

    0

    05 i 5 12 12 np0i:

    The second term inA5can be written as p0i multiplied by a term thatlies between 0 and12 n, since

    0 PrAn& Y0

    05 i PrAn 5 12

    n:

    Thus, selecting b >0 so thate2b 5 12 we can write:

    pni; j 5 p0ipnj 1Oe2bn; A6

    where, as usual, fn 5 Ognis shorthand for the statement thatfnisat most some constant times gn. Finally, observe that, fromA2, pnj > 0 and so, from A6:

    IY0

    0;Y

    0

    n 5o

    i;j

    pni; jlog pni; j

    p0ipnj

    5o

    i;j

    pni; jlog 11 Oe2bn Be2bn;

    for a constantB > 0. Since IY0;Yn 5IY0

    0;Y

    0

    n, this establishes A3, as

    required. QED

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 573

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    18/23

    Appendix B

    Proposition 2. Suppose thatX

    Y

    Zforms a Markov chain, where thestate spaces for X, Y, and Z are discrete, and PrX5x & Y5y> 0 andPrY5y & Z5z> 0for all choices of statesx,y,zforX, Y,Z, respectively.Then the DPI is an equality if and only ifX, Y, and Zare mutually inde-

    pendent.

    Proof. First, observe that ifX, Y, Zare independent then they are pair-wise independent and so IX; Y 5 IX; Z 5 IY; Z 5 0 and thus equal-ity holds trivially. Next, suppose that PrX5x & Y5y> 0and PrY5y & Z5z> 0 for all choices of statesx,y,zforX,Y,Z, respectively. Then

    PrX5x & Y5y & Z5z> 0 holds alsosinceX YZis a Markovchain. Suppose further that the DPI is an equality; we will show thatX,Y,andZare independent. Since the DPI is an equality,X YZandXZ Yare both Markov chainsCover and Thomas 2006. We write pxyzas shorthand for the probability PrX5x & Y5y & Z5z and similarlyfor conditional and marginal probabilities thus, e.g., pxjz 5 PrX5xjZ5z. First observe that the positivity condition pxyz > 0 for allx,y,zimplies thatpxy,pxz,pyz,px,py,pzare also strictly positive.SinceX Y Zis a Markov chain, and pxy >0:

    pxyz 5pzjxypxy 5pzjypxy; B1

    and sinceXZ Yis also a Markov chain, andpxz >0, we havepxyz5 py|xz pxz 5 py|z pxz. Applying Bayess theorem, the last termcan be written as

    pzjypy

    pz pxz

    note thatpz >0and so, combining this with equationB1gives

    pxyz 5pzjypxy 5pzjypy

    pz pxz;

    and so

    pzjypxypz 5pzjypypxz: B2

    Sincepz|y > 0becausepyz > 0we can cancel this term on the left andright of equationB2to obtain:

    pxypz 5pypxz: B3

    574 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    19/23

    Now, we can further write pxy 5 px|y py and pxz 5 px|z pzwhich, upon substitution into equationB3 gives:

    pxjypypz 5pypxjzpz;

    inotherwords,px|y5px|z notingthatpy,pz>0. Now, this equationmust hold for all choices ofx,y,andzsopx|y must be constant asy varieswhich implies thatXandYare independent. SimilarlyXandZare indepen-dent. Finally, reversing the two Markov chains gives thatZis independentofY. ThusX,YandZare pairwise independent.

    Moreover they are independent as a triple since X YZis a Markovchain and so

    pxyz 5pzjxypxy 5pzjypxpy 5pyzpx 5pxpypz:

    QED

    Appendix C

    RijRatios with Zero Mutation at the Infinite Time Limit

    Consider the Moran model in population genetics, with two trait values A

    and B and population size N. Let Xtf0; 1; : : : ; Ng be the number ofcopies of A in the population at time t. In this section we assume zeromutation, and we consider neutral evolution, selection for A, selectionagainst A, and frequency-dependent selection for the majority state andagainst the majority state. Since each of these Markov processes has ab-sorbing states 0 andNbecause of zero mutationeventually one allele will

    be xed and the other lost. Let Ef0; Ng be this end state, and Sf0; 1; : : : ; Ngthe starting statethusS5 X0andE5 limt Xt.

    We are interested in comparing the ratio of conditional probabilities:

    Rij :5PrE5NjS5 iPrE5NjS5j

    ;

    fori > junder the various models.

    Proposition 3.

    i Under neutral evolutionRij5 i=j, for all 0

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    20/23

    iii For any two values i,jf1, 2, . . . ,Ngwithi >jthe Rijvalue forselection against A exceeds the Rij value for neutral evolution,

    which in turn exceeds the Rijvalue for selection for A.iv For frequency-dependent selection, where the tness of trait A isproportional to its frequency, the associatedRijvalue exceeds thatfor neutral evolution for alli,jf1, 2, . . . ,Ngwithi > jprovidedthatj< N/2.

    v For frequency-dependent selection, where the tness of trait A isproportional to the frequency of the alternative trait B, the asso-ciatedRijvalue is lower than that for neutral evolution for alli,jf1, 2, . . . , Ngwith i >j, provided thatj1 for all i >j.

    Proof:Part i. For any integerx : 0 x Nit is a well-known resultformany neutral modelsthat

    PrE5NjS5x 5x=N

    see, e.g., eq. 3.49 of Ewens 2010, from which i immediately follows.Part ii. From Ewens2010, eq. 3.66we have, for anyx f1, . . . , Ng:

    PrE5NjS5x512

    c

    x

    12 cN;

    for a positive constantc which is greater than 1 for selection against A andless than 1 for selection for A. Part ii now follows immediately.

    Part iii. By parts i and ii, part iii is equivalent to the assertions that fori >j, ifc 0, 1then:

    12 ci

    12 cjjandc >1 then:

    12 ci

    12 cj>

    i

    j:

    Now, using the identity 1 2 ck5 1 2 c 1 1 c 1 c2 1 . . . 1 ck21wehave

    12 ci

    12 cj 5

    11 : : :1 c i21

    11 : : : 1 c j21 ; C1

    576 ELLIOTT SOBER AND MIKE STEEL

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    21/23

    and so we wish to compare

    11 : : :1 c i21

    11 : : :1 cj21 and i

    j;

    which is equivalent to comparing

    11 : : : 1 c i21

    i and

    11 : : :1 cj21

    j :

    Now, the left-hand side is merely the average of the terms ck fromk5 0 uptok5 i 2 1 while the right-hand side is the average of these terms up toj21 and sincei >jthe left-hand side is smaller than the right when c 1. This completes the proof.Part iv. When selection is frequency dependent, we need to use the

    expression:

    PrE5NjS5 i 5 11 oi21

    j51P

    j

    k51

    gk

    fk

    ! 11 o

    N21

    j51P

    j

    k51

    gk

    fk

    !; C2

    wherefkandgkdenote the tnesses of alleles A and B, respectively, when kindividuals have allele type Asee, e.g., Huang and Traulsen 2010, eq. 6.If we now take the tness of trait A to be proportional to its frequency, thatis fk 5 a kfor a constanta >0, and take the tness of trait B to also be

    proportional to its frequencywith the same coefcientthat is,gk5 aN2 kthen equationC2 gives

    Rij5 oi21

    m50

    N2 1

    m

    oj21

    m50

    N2 1

    m

    : C3

    As before, this ratio exceeds i/jprecisely when the average of the rsti terms

    N2 1

    m

    form 5 0, 1, 2 . . .exceeds the average of the rstjterms

    N2 1

    m

    ;

    this holds for all i >jwith j 0, and take the tness of trait

    TIME AND KNOWABILITY IN EVOLUTIONARY PROCESSES 577

    This content downloaded from 128.237.236.236 on Fri, 26 Sep 2014 15:47:41 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    22/23

  • 8/11/2019 Sober, E. and Steel, M. (2014). Time and Knowability in Evolutionary Processes

    23/23