19
Phylogenetic Methods How to reconstruct phylogenies Algorithms vs. Optimality Parsimony Models and Distances MtDNA Modern Humans Neanderthal Western Gorilla Mountain Gorilla Eastern Gorilla Sumatran Orangutan Bornean Orangutan Bonobo Western Chimpanzee Central Chimpanzee Eastern Chimpanzee Gagneux et al. (1999) root Humans are a recently-evolved species, and human genetic diversity is very low compared to other apes!

MHP 2 Phylogenetics - SFU.ca...Algorithms vs. Optimality UPGMA, Neighbor-Joining and ʻBranch and Boundʼ are algorithms (or recipes) that do not optimize anything Maximum Likelihood,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Phylogenetic Methods

    How to reconstruct phylogenies Algorithms vs. Optimality Parsimony Models and Distances

    MtDNA

    Modern Humans

    Neanderthal Western Gorilla

    Mountain Gorilla

    Eastern Gorilla

    Sumatran Orangutan

    Bornean Orangutan

    Bonobo

    Western Chimpanzee

    Central Chimpanzee

    Eastern Chimpanzee

    Gagneux et al. (1999)

    root

    Humans are a recently-evolved species, and human genetic diversity is very low compared to other apes!

  • Due to drift, samples of mitochondrial DNA are related as a tree, and if we can reconstruct that tree, we should be able to infer many things about the populations

    Background: how many different trees?

    A

    C

    B

    D

    For four taxa:

    A

    B

    C

    D A

    D

    C

    B

    # of taxa # of unrootedtrees

    2 13 14 35 156 1057 9458 103959 13513510 2027025

    4 taxa = 3 trees (2n-5)!!!

    10 taxa = ~2*106 trees How many

    rooted trees?

    A

    C

    B

    D

    For four taxa:

    A

    D

    C

    B

    The root can be placed on any branch or internode. The total number of branches on an unrooted tree is 2n-3 where n=number of taxa. Therefore, the number of rooted trees corresponding with one unrooted tree is 2n-3

    C D B A

    A

    B

    C

    D

    Background: how many different trees?

    Terms and concepts: how many different trees?

    For four taxa: The root can be placed on any branch or internode. The total number of branches on an unrooted tree is 2n-3 where n=number of taxa. Therefore, the number of rooted trees corresponding with one unrooted tree is 2n-3

    A

    C

    B

    D

    A

    D

    C

    B

    C D B A

    A B D C

    A

    B

    C

    D

  • Terms and concepts: how many different trees?

    For four taxa: The root can be placed on any branch or internode. The total number of branches on an unrooted tree is 2n-3 where n=number of taxa. Therefore, the number of rooted trees corresponding with one unrooted tree is 2n-3

    A

    C

    B

    D

    A

    D

    C

    B

    C D B A

    A B D C

    C D A B A

    B

    C

    D

    Terms and concepts: how many different trees?

    For four taxa: The root can be placed on any branch or internode. The total number of branches on an unrooted tree is 2n-3 where n=number of taxa. Therefore, the number of rooted trees corresponding with one unrooted tree is 2n-3

    A

    C

    B

    D

    A

    D

    C

    B

    C D B A

    A B D C

    C D A B

    A B C D

    A

    B

    C

    D

    Terms and concepts: how many different trees?

    For four taxa: The root can be placed on any branch or internode. The total number of branches on an unrooted tree is 2n-3 where n=number of taxa. Therefore, the number of rooted trees corresponding with one unrooted tree is 2n-3

    A

    C

    B

    D

    A

    D

    C

    B

    C D B A

    A B D C

    C D A B

    A B C D

    A B C D

    A

    B

    C

    D

    Terms and concepts: how many different trees?

    For four taxa:

    # oftaxa

    # ofunrootedtrees

    # of rootedtrees

    2 1 13 1 34 3 155 15 1056 105 9457 945 103958 10395 1351359 135135 202702510 2027025 34459425

    4 taxa = 15 trees (2n-3)(2n-5)!!!

    10 taxa = ~3*107 trees +5

    +5

    A

    C

    B

    D A

    B

    C

    D A

    D

    C

    B

    A B C D

    C D A B

    A B D C

    C D B A

    A B C D

  • Algorithms vs. Optimality

    UPGMA, Neighbor-Joining and ‘Branch and Bound’ are algorithms (or recipes) that do not optimize anything Maximum Likelihood, Maximum Parsimony and Least-Squares are optimality criteria: they do not specify how candidate hypotheses are arrived at (e.g. how trees are sampled) but do offer a yardstick for assessing which hypotheses are preferred.

    Algorithms are fast (N-J works in low-polynomial n time) but neither guarantee a ‘right’ answer or evaluation of fit.

    Maximum Parsimony

    We do not believe that evolution is parsimonious, but we believe that the characters we choose evolve in such a way that maximum parsimony offers the best chance of recovering the correct relationships.

    Tree length becomes our optimality criterion:

    choose the shortest tree among all contenders...

    L(t) = wjdiff (xk' j , xk' ' j )j=1

    N!

    k=1

    B!

    minimize L(t), length of tree

    We calculate the length of a tree L(t), as the sum across all branches B,

    for all N characters... each given weight w...

    and each having a cost of change diff(x,y).

    How to reconstruct phylogeny: Parsimony

    A aat tcg ctt cta gga atc tgc cta atc ctg!B ... ..a ..g ..a .t. ... ... t.. ... ..a!C ... ..a ..c ..c ... ..t ... ... ... t.a!D ... ..a ..a ..g ..g ..t ... t.t ..t t..!

    --assumes discrete data that represent state changes along a tree. Ie a column is a character with variation due to evolution

    Alignment: the art of producing such columns

  • How to reconstruct phylogeny: methods - parsimony

    A

    B

    C

    D

    A

    C

    B

    D

    A

    D

    C

    B

    4 4 4

    2 3 2 3

    2 3

    2 3

    2 3

    Length=3! Length=5! Length=5!

    1 2 3 4A a c a tB a c a tC a g g tD a g g a

    •  Parsimony allows the use of all known evolutionary information in building a tree.

    •  Parsimony involves assigning scores based on the number of evolutionary changes that are needed to explain the observed data to all possible trees.

    •  The best tree is the one that requires the fewest (homoplasious) changes.

    •  Only synapomorphies are parsimony-informative

    invariant

    unique

    parsimony informative

    How to reconstruct phylogeny: methods - parsimony

    A

    B

    C

    D

    A

    C

    B

    D

    A

    D

    C

    B

    4 4 4

    2 3 2 3

    2 3

    2 3

    2 3

    Length=3! Length=5! Length=5!

    •  Parsimony allows the use of all known evolutionary information in building a tree.

    •  Parsimony involves assigning scores based on the number of evolutionary changes that are needed to explain the observed data to all possible trees.

    •  The best tree is the one that requires the fewest (homoplasious) changes.

    •  Only synapomorphies are parsimony-informative

    1 2 3 4A a c a tB a c a tC a g g tD a g g a

    How to reconstruct phylogeny: methods - parsimony

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    A

    B

    C

    D

    A

    C

    B

    D

    A

    D

    C

    B Length=3! Length=5! Length=5!

    Shortest unrooted tree, but! where’s the root?

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

    How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    A

    B

    C

    D

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

  • How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    E E E E E

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

    How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    E E E E E

    1 1 1 1 1

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

    How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    E E E E E

    1 1 1 1 1

    2 2 2

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    2 2 2

    Homoplasious change!

    2

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

    How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    E E E E E

    1 1 1 1 1

    2 2 2 3 3 3

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    2 2

    2 2

    3 3

    3 3

    Homoplasious change!

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

  • How to reconstruct phylogeny: methods - parsimony

    C D B A A B D C C D A B A B C D A B C D

    E E E E E 4

    1 1 1 1 1

    4 4 4 4

    2 2 2 3 3 3

    5! 4! 5! 5! 5!

    To distinguish ancestral from derived character states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    2 2

    2 2

    3 3

    3 3

    Homoplasious change!

    1 2 3 4A a c a tB a c a tC a g g tD a g g gE c g a t

    How to reconstruct phylogeny: methods - cladistics - parsimony

    E E E E E 4

    1 1 1 1 1

    4 4 4 4 2

    2 2

    2 2 2 2 3 3 3 3

    3 3

    3

    5! 4! 5! 5! 5!

    The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. To distinguish ancestral from derived character

    states (and thus allow for a temporal dimension) an ‘outgroup’ (i.e. the sistergroup of the taxa of interest) is added. Using outgroup comparisons, the most parsimonious rooted tree can be found.

    C D B A A B D C C D A B A B C D A B C D

    Homoplasious change!

    •  MP uses the observed states of characters to infer the shortest set of paths (shortest tree). No probability.

    •  For datasets where different branches have different rates of evolution (ie fast and slow branches), the MP tree may not be the most likely tree.

    •  This is Long-Branch Attraction (LBA), or the inconsistency caused by heterogenous rates across the tree

    Maximum Parsimony Maximum Parsimony and the LBA

    MP has no model or way to correct for LBA

    We can use a model to “correct for multiple hits” This can be done in either a discrete or continuous framework. Let’s look at the continuous one first.

  • pij

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0 0.5 1 1.5 2 2.5

    estimated number of substitutions µt

    Observed distance p

    linear at low µt

    p = 34(1! e!4/3µt )

    µt = ! 34ln(1! 4

    3p)

    saturation

    1

    2

    3

    4

    0.1 0.1 0.1

    1.0 1.0

    This true tree produces these data:

    And the least-squares tree on the observed distances is incorrect, just like the MP tree would be:

    1 3

    2 4

    0.12

    0.35

    0.11

    1 2 3 41 0.0 0.577 0.704 0.5992 0.0 0.599 0.2473 0.0 0.5774 0.0

    1

    2

    3

    4

    0.1 0.1 0.1

    1.0 1.0

    This corrected distances would be:

    And the least-squares tree* on the corrected distances is now correct:

    1 2 3 4 1 0.0 1.1 2.1 1.2 2 0.0 1.2 0.3 3 0.0 1.1 4 0.0

    1

    2

    3

    4

    0.1 0.1 0.1

    1.0 1.0

    How to reconstruct phylogeny: distance methods

    A a c c g a t c g t a a c g tB . . . . g . . . c . . . . .C . . t . t . . a g . g . a .D g . g a c c . c a . c . t c

    A t g g c g t g a a g c g a cB c . a . t . . . g . . a . tC c . a . . . a . g . . a g tD c a a t t c a g t a g a g g

    Aligned DNA sequences

    Aligned DNA sequences

    Phenograms (i.e. phenetic trees) are obtained using distance methods to build trees from comparative data.

  • How to reconstruct phylogeny: distance methods

    AB 2C 6 6D 10 10 10

    A B C D

    AB 6C 7 3D 14 10 9

    A B C D

    A a c c g a t c g t a a c g tB . . . . g . . . c . . . . .C . . t . t . . a g . g . a .D g . g a c c . c a . c . t c

    A t g g c g t g a a g c g a cB c . a . t . . . g . . a . tC c . a . . . a . g . . a g tD c a a t t c a g t a g a g g

    Aligned DNA sequences Distance matrix

    Aligned DNA sequences Distance matrix

    Phenograms (i.e. phenetic trees) are obtained using distance methods to build trees from comparative data.

    A pairwise distance matrix contains the estimated number of different sites between all pairs of sequences

    How to reconstruct phylogeny: distance methods

    AB 2C 6 6D 10 10 10

    A B C D

    AB 6C 7 3D 14 10 9

    A B C D

    A a c c g a t c g t a a c g tB . . . . g . . . c . . . . .C . . t . t . . a g . g . a .D g . g a c c . c a . c . t c

    A t g g c g t g a a g c g a cB c . a . t . . . g . . a . tC c . a . . . a . g . . a g tD c a a t t c a g t a g a g g

    Aligned DNA sequences Distance matrix

    A B

    C D

    A B C D

    Ultrametric tree

    Aligned DNA sequences Distance matrix Additive tree

    1

    1 1

    1

    1

    5

    2

    6

    2

    3 2

    5

    Phenograms (i.e. phenetic trees) are obtained using distance methods to build trees from comparative data.

    Then, a tree is constructed: e.g. by linking the least distant pairs of taxa, followed by successively more distant taxa.

    A pairwise distance matrix contains the estimated number of different sites between all pairs of sequences

    How to reconstruct phylogeny: distance methods

    Distance methods can use clustering algorithms (N-J) or an optimality criterion (least squares, minimum evolution) to convert the distances to a tree

    Both can use raw or modelled distances

    raw: percent different sites (if aligned) Modelled: corrected with a model (e.g. Jukes-Cantor model)

    Neighbor-Joining Algorithm

    is similar in flavour to the more intuitive UPGMA, but doesn’t force everything to be equidistant from a root (indeed, it only produces unrooted trees: you have to root them by knowing the outgroup or using, eg. midpoint rooting)

    Many fast tree-building programs build N-J trees.

  • 1 RANDOM!2 RONDON!3 RONFON!

    RRDAOM!RRDOON!RRFOON!

    AMRAON!ONROON!ONROON!

    NNNDON!NNNDON!NNNFON!

    ...! ...!

    pseudoreplicates (500+)!

    1! 2! 3! 1! 2! 3! 1! 2! 3!

    66%!

    bootstrap tree!1! 2! 3!

    Consensus neighbor-joining tree of 104 human mtDNA complete sequences.

    Mishmar D et al. PNAS 2003;100:171-176 ©2003 by National Academy of Sciences

    African

    European

    Asian/Native American

    Likelihood

    Lik(h)! P(D | h,m)

    The likelihood of a hypothesis (e.g. of a tree) is is proportional to the probability of the data arising (the sequences) given the hypothesis and a model

    (This says nothing about the probability of the model)

  • Maximum Likelihood

    The ML is reached at the point that the hypothesis produces the highest probability of seeing the data

    D: HHTTHTHHTTT (11 tosses of a coin) m: independent tosses with some p= Prob(Heads) Likelihood for different h would correspond to different values for p Which h is most likely?

    Lik(h)! P(D | h,m)

    Lik

    0

    0.0001

    0.0002

    0.0003

    0.0004

    0.0005

    0.0006

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    all possible h (=p)

    Likelihood of h

    Data: HHTTHTHHTTT P(data|hyp, m)=pp(1-p)(1-p)p(1-p)pp(1-p)(1-p)(1-p)

    we are usually interested in the h (the tree) that returns the ML not in the ML itself (any one tree is not very likely)

    •  The method of Maximum Likelihood attempts to reconstruct a phylogeny using an explicit model of evolution.

    •  Since each nucleotide site evolves independently, the tree is calculated separately for each site. The product of the likelihoods for each site provides the overall likelihood of the observed data - FOR ONE OF ALL POSSIBLE TREE SHAPES

    •  Even with simple models of evolutionary change, the computational task is enormous, making this the slowest of all phylogenetic methods.

    How to reconstruct phylogeny: methods - maximum likelihood

    x=a!

    y=g!

    a ! g! c!One of many possible ways the pattern of

    nucleotides at a given site could have evolved:

    -ln(((0.7*0.3*0.3*0.3)+(all other combinations))*(all other sites))!

    No change: !p=0.7!Substitution: !p=0.3!

    Model of sequence evolution:

    Models

    There are many models of evolution, which have different numbers of parameters to estimate when calculating the Lik. (J-C has 1; Kimura,2; HKY,5; HKY+Inv,6; etc.)

    There are two types of parameters: variation in possible substitutions (ti, tv, A->C vs A->T) variation across the sequence (gamma, invariant sites) For discrete characters, Mk1 model has one parameter – we will (I hope) derive the model fully when we have some data

  • How to choose a model? 1.  One can first build a N-J tree on raw distances, 2.  then calculate Max. Lik. of data on each tree under different models of evolution 3. Compare the relationship between the number of parameters and the actual ML fit to decide on the model, e.g. the one with the lowest value for the Akaike Information Criterion (AIC)

    AIC = -2Log(Lik) +2k

    gets bigger with more complex models

    gets smaller with more complex models (better fit, higher lik)

    How to reconstruct phylogeny: search methods

    Before we can assess the ‘goodness’ of competing phylogenetic hypotheses (i.e. trees) using an optimality criterion, we have to build tree shapes. Assessing all possible tree shapes (i.e. exhaustive searches) takes a mighty long time for large numbers of taxa. A shortcut is provided by ‘hill-climbing’ algorithms (i.e. heuristic searches), of which many different flavors exist. They all follow this philosophy:

    •  Similar tree shapes have a similar ‘goodness’ (e.g. likelihoods).

    •  So, by starting with a tree (any old tree) and changing the shape in small steps, while constantly keeping track of which changes are improvements an which are not, the best tree will eventually be found.

    How to reconstruct phylogeny: search methods: exhaustive searching

    “Opt

    imal

    ity”

    All possible tree shapes

    An exhaustive search will return the optimal tree shape ‘A’ after evaluating all possible trees

    A

    Exhaustively evaluated trees

    How to reconstruct phylogeny: search methods: heuristic searching

    “Opt

    imal

    ity”

    All possible tree shapes

    Optimal tree ‘A’

    Exhaustively evaluated trees

    A

    Starting tree ‘B’

    B

  • How to reconstruct phylogeny: search methods: heuristic searching

    “Opt

    imal

    ity”

    All possible tree shapes

    Optimal tree ‘A’

    Exhaustively evaluated trees

    A B

    ‘hill climbing'

    How to reconstruct phylogeny: search methods: heuristic searching

    “Opt

    imal

    ity”

    All possible tree shapes

    Optimal tree ‘A’

    Exhaustively evaluated trees

    A B

    Heuristically evaluated trees

    The molecular clock for haemoglobins assumed in 1962 by Linus Pauling and Emile Zuckerkandl, shown later by Margoliash (1964), and by P&Z in 1965...

    "the discovery of the molecular clock stands out as the most significant result of research in molecular evolution. (R. Lewin)!!"a very important idea that has turned out to be much truer than people thought at the time." (F. Crick)!!"one of the most elegantly simple concepts in biology, but it is also one of the most contentious." (S. Eastal et al.)!!!!

    pauling.library.oregonstate.edu

    Can't find pic of Margoliash...

    52!

    Molecular Evolution and the Neutral Theory

    Margoliash, PNAS 1964

  • 53!

    Motoo Kimura (1966, 1983)

    Motoo described how substitutions might occur

    54!

    orange allele changes in frequency... perhaps due to selection

    A substitution is the replacement of one allele for another as the predominant allele in a population

    N=1

    0 in

    divi

    dual

    s

    55!

    Substitutions occur at some background rate due to drift in both large and small populations

    orange allele changes in frequency... due to drift

    The Neutral Theory simply states that most of genetic variation is not due to, nor acted on by, Natural Selection 56!

    This drift is not working against selection (as in small pops) but is simply 'not seen' by selection...

  • 57!

    Assumption is that most gene products are already at their optimum Selection weeds out the (very) harmful mutations and all the variation we actually see is the leftover, neutral variation created by mutation and drift. (very few new mutations are subject to positive selection.)

    Neutral Theory

    http://online.itp.ucsb.edu/online/infobio01/ohta/oh/01.html 58!

    The Neutral theory was proposed to explain the clock-like substitution pattern among species

    Number of neutral mutations created per generation: 2N! (where ! is a subset of µ, the overall mutation rate)

    e.g.: if neutral mutation rate 10-6 per position per generation, and if each position is represented 106 times (2N= 106) then expect 1 new mutation per base per generation (ie one someone is carrying that new mutation)

    59!6 different copies of an allele – here we focus on the ‘black’ one

    one lineage eventually drifts to fixation, and the chance it is the one we were looking at is 1/6

    drift...

    60!

    For a new neutral mutation, the probability that it ‘becomes’ the most common one is the same as its initial frequency = (2N)-1 (all have equal chance) You can think of ‘becomes the most common one as ‘being the ancestor of the most common one’

  • 61!

    If we focus on the black allele, it has a 1/6 chance of replacing others

    62!

    rate of substitution (in substitutions/gen)= k = 2N! x (2N)-1 = !"

    number of candidate mutations

    chance for each mutation to ‘fix’

    The substitution rate of neutral mutations k is equal to the neutral mutation rate v under the Neutral Model

    63!

    How can this be??

    Remember, looking only at those mutations that are neutral In small populations, number of new mutations is low (not a lot of DNA to mutate) = 2N is small. But drift is fast - ie chance that any allele increases in frequency is high (1/2N is relatively large).

    In large population, lots of new mutations (2N large). But chance that any one of them substitutes is low, because drift is slow (1/2N is small).

    So effects of population size on number of mutations and drift of those mutations cancel each other out.

    64!

    random substitution

    daughter species inherit random substitutions

    all this occurs at rate ! and so we have a clock

    shark quoll human

  • 65!

    And so, for the same locus, if ! is similar for all species, you get a ‘clock’ in generations

    66!

    ! must differ between genes (Table 7.1)

    The relevant rate is the neutral mutation rate, not the overall mutation rate. (The overall rate could also be lineage specific.) But for highly constrained proteins, most mutations aren’t neutral - they are selected out, and don’t ‘count’

    1.  Loci with fewer constraints evolve faster (and vice versa: e.g. histones don’t seem to evolve amino acid changes at all)

    2. Synonymous changes (e.g. 3rd base) evolve faster still

    3. Pseudogenes and (some) introns may evolve at true µ rate (and they do evolve most quickly across lineages)

    67!

    To recap:

    1.  The rate of substitution k = neutral mutation rate !"

    2.  Different genes have different constraints, so k differs

    between them (different proportions of mutations are neutral)

    3.  Mutations occur at meiosis, so accumulate generation by

    generation,not year by year.

    4. Many data are consistent with a constant rate of

    substitution per year !??

    Molecular Evolution. The clock Posterior probabilities

    In theory, we can use Bayes’ theorem to convert likelihoods into actual probabilities (‘posterior probabilities’).

    e.g. we want to know how probable it is that a particular coin has p(heads)=0.8 (biased) versus having p(heads)=0.5 (unbiased) (this is NOT the same as estimating p from data)

    Box has 10% of the coins biased Choose a coin at random, p(biased)=0.1 [“prior”]

  • Posterior Probability for “Biased”

    Now toss it 10 times, get HHTHHTTHHH (ML estimate of p= 0.7, so neither 0.5 nor 0.8) p(biased and data)=0.87 * 0.23 = .00167 p(true and data) =0.510 = 0.000976

    likelihood ratio LR = .00167/.000976 = 1.76X prior odds ratio = .1/.9=0.11 posterior odds ratio =LR*prior odds = 1.76*0.11= 0.19

    Posterior Probabilities

    p(biased | data)= p(data | biased)! p(biased)p(data)

    =p(data | biased)! p(biased)

    p(data | biased)! p(biased)+p(data | unbiased)! p(unbiased)

    =0.00167(0.1)

    0.00167(0.1)+ 0.000976(0.9)= 0.16

    Likelihood Prior hypothesis

    unconditional prob(data)

    Or P(biased|data) = odds/(1+odds)0.19/1.19=0.16 (remember, p(biased) before data was 0.1)

    p(biased | data)= p(data | biased)! p(biased)p(data)

    =p(data | biased)! p(biased)

    p(data | biased)! p(biased)+p(data | unbiased)! p(unbiased)

    =0.00167(0.1)

    0.00167(0.1)+ 0.000976(0.9)= 0.16

    p(hyp | data)= p(data | hyp)! p(hyp)p(hyp)p(data | hyp)

    hyp"

    Posterior Likelihood Prior

    Normalizing constant, but impossible to get (since there are ‘infinite’ ways to get data)

  • MCMC (aka Metropolis-Hastings) gets rid of the denominator!

    Metropolis-coupled Monte-Carlo techniques sample trees in proportion to their likelihoods*priors (so, uncorrected probabilities, just the numerators), and so allow for estimates of the posterior probabilities.

    How does it do that?

    By keeping a random sample of ‘hypotheses’ in storage in proportion to their likelihood*priors. So, if a hypothesis is found 80% of the time, it has a 80% posterior probability of being true.

    Represented by the consensus of your 1000000 MCMC trees

    MCMC gets around needing to know P(data)