18
1 BIOL3046 SEQUENCE EVOLUTION “LAB(NOV 2018) CONTENTS PAGE Background: PART 1: Extending microevolution to macroevolution: “origin-fixation models” 2 PART 2: The macro-evolutionary substitution process is a series of “steps” in a stochastic process of sequential fixation. 4 PART 3: A Markov model of nucleotide macro-evolution 6 PART 4: How to simulate the evolution of DNA sequence data 9 Sequence evolution lab PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities 17 & 18 Species 1 Species 2 Species 3 Species 4 Root Markov chain between DNA states A G C T DNA instantaneous rate matrix 3 α α α α α 3 α α α α α 3 α α α α α 3 α A C G T A C G T from: to: * This is equivalent to most simple model for DNA Species 1 ATGCTAGCGACTAT Species 2 ATGCTTGTATCCGT Species 3 ATGTTTGCGGCCGC Species 4 ATGTTAACACCTAA

Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 1!

BIOL3046 SEQUENCE EVOLUTION “LAB” (NOV 2018)

CONTENTS PAGE

Background:

PART 1: Extending microevolution to macroevolution: “origin-fixation models”

2

PART 2: The macro-evolutionary substitution process is a series of “steps” in a stochastic process of sequential fixation.

4

PART 3: A Markov model of nucleotide macro-evolution 6 PART 4: How to simulate the evolution of DNA sequence data 9

Sequence evolution lab

PART 5: The “lab protocol” 13

Tree and Model 16

Table of site pattern probabilities

17 & 18

Species 1 Species 2 Species 3 Species 4

Root

Markov chain between DNA states

A G C T

� �

DNA instantaneous rate matrix

−3α α α αα −3α α αα α −3α αα α α −3α

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

A C G T

A

C

G

T

from:

to:

* This is equivalent to most simple model for DNA (Jukes and Cantor, 1969).

Species 1 ATGCTAGCGACTAT Species 2 ATGCTTGTATCCGT Species 3 ATGTTTGCGGCCGC Species 4 ATGTTAACACCTAA

Page 2: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 2!

PART 1: Extending microevolution to macroevolution: “origin-fixation models”

If we consider a population genetic process run over a long period of time (say, 1000’s of generations), then we can view the evolutionary process as being dominated by the following two factors:

1. Origin: The rate that mutation generates new variation in the population.

2. Fixation: The probability that a new variant gets fixed, or lost, over time.

From our studies of population genetics we already have a good sense of these processes. Furthermore we understand that the second factor (probability of fixation) reflects the interplay between drift and selection (unless the variant is neutral; then only drift explains fixation dynamics). Thus we can express macroevolution in terms of explicit population genetic parameters.

1. Origin: e.g., mutation rate from A ! G has a per-site rate of μAG

2. Fixation: probability of fixing G in the population is determined by selection coefficient sAG and size Ne

While this is explicit in terms of population genetic parameters, it is a simplification because it makes predictions by ignoring the details of the actual population polymorphism that occurred during the process. The figure below illustrates the shift to an origin-fixation framework

1.0

0 alle

le fr

eque

ncy

A C G

time

DNA state in origin-fixation process

T

C G A

Page 3: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 3!

Assumptions: We will see that the above simplification will lead us to some very powerful tools. But these simplifications are only justified if the following assumptions can be made:

1. Mutations are rare such that they only enter a population at loci that are monomorphic (i.e., μij.Ne << 1)

2. The interval of macro-evolutionary time (Δt) is such that a mutation is highly likely to be fixed or lost.

3. μij.Ne is small enough so that other mutations are highly unlikely to be fixed during that same interval, Δt.

Result of assumptions: the residence time of a population polymorphism is much shorter than the time between mutation events (see Figure above).

The origin-fixation family: These simplifications allowed the development of a wide variety of evolutionary models in which mutations of any kind (e.g., nucleotide, amino acids, indels, transposon events) enter a population at one point in time and go to fixation, or are lost, depending on an explicit fixation probability. The individual models have different names, but the family is called origin-fixation models, or mutation-selection models.

Page 4: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 4!

PART 2: The macro-evolutionary substitution process is a series of “steps” in a stochastic process of sequential fixation.

Substitutions: The framework developed in Part 1 implies a model whereby the population “jumps” directly from one fixed state to another over macro-evolutionary time. These jumps are what evolutionary biologists infer as substitutions in macro-evolutionary time.

Evolutionary “walk” in sequence space: If the origin-fixation process does not involve natural selection (i.e., all the mutations are fixed via genetic drift), then the substitutions are the sequential steps in a neutral walk. If the origin-fixation process does involve natural selection, then the substitutions are the sequential steps in an adaptive walk. Each is a process of tracing an evolutionary path through some form of DNA sequence space. Markov chain for nucleotide evolution: Since the sequence of “steps” (Figure in part 1) result from a process of “proposal” (origin) and “acceptance” (fixation) the evolution of a gene by substitution defines a Markov chain. This Markov chain is a stochastic process where the states are nucleotides at a single locus, and the steps in the chain are substitutions between states occurring with a rate equal to the population fixation rate.

population rate = origination rate × fixation rate

The population rate, q(i,j), can be fully defined as:

!!,! != !2!!!!,! !!×!!! !!,!!!

Where i and j are nucleotide states, 2Neμij is the total rate at which mutations occur within a diploid population and π(sij,Ne) represents the probability of fixation given that the mutation has occurred.

Evolutionary “move rule” in the Markov chain: The Markov chain describes the evolutionary dynamics of a DNA sequence, by substitution, as a long run average. This means that we can predict the properties of the process when it is at equilibrium. Assuming that the population starts out fixed for some nucleotide (i = A,C,G or T), this model predicts that the amount of time to the next fixation event, t*, (i.e, the time to the next

(1)

Page 5: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 5!

“jump”, or change in nucleotide state) is exponentially distributed. The mean of this exponential distribution is the inverse of the substitution rate given the current state = i of the process (the current state of the Markov chain). The substitution rate (!) conditioned on state = i is the overall rate away from state = i to each of its k=3 nucleotide “neighbors” [! =

! !, !!!! ]. If you can generate a uniform(0,1) random number = !, and you have a value for the substitution rate (!), then you can easily generate an exponentially distributed time interval (t*).

!∗ = − 1! !"#! !

The Markov chain also determines the probability of the next nucleotide state, j, in the evolution process (i.e., the state at the time of the next fixation). This probability of a particular substitution (!!→!∗ ) is simply the ratio of the population fixation rate for state j to the overall rate:

!!→!∗ != !!"#$%"&'!!"#$!"!#$!!"#$ != ! ! !, !! !, !!!!

Equation 3 above represents the “move rule” of a Markov chain model for nucleotide evolution. Remember that the dynamics of this process are fully determined by explicit population genetic parameters (equation 1). This is a very powerful framework because:

i. The Markov chain is a model that extends population genetics theory to the macro-evolution of a gene.

ii. The process can be specified using macro-evolutionary rates even if there is no

knowledge of the population genetic parameter values (explained in the next section).

(2)

(3)

Page 6: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 6!

PART 3: A Markov model of nucleotide macro-evolution

Evolutionary model: We can now think about how to fully specify a continuous-time Markov chain that determines the rate of evolution from any one nucleotide state to any other nucleotide state. The above Markov chain represents the structure of model of nucleotide evolution. The model is constructed by converting the illustration above into a matrix of all possible rates of change. The rates are denoted by the parameter ! (we will discuss the role of -3! later) The rates are instantaneous because any fixation event represents one “step” in evolutionary walk though the DNA state space. When the above matrix takes on a specific set of values it is referred to as the instantaneous rate matrix, and is denoted by Q.

A G C T

The loop arrows indicate that in addition to a change in state, it is also possible to observe no change in state after a certain amount of time.

Markov chain between DNA states

A G C T

� �

DNA instantaneous rate matrix

−3α α α αα −3α α αα α −3α αα α α −3α

⎜⎜⎜⎜

⎟⎟⎟⎟

A C G T

A

C

G

T

from:

to:

DNA states in the origin-fixation process

C G A G

instantaneous rate: qAC within Q-matrix

instantaneous rate: qCG within Q-matrix

Page 7: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 7!

Consider the instantaneous rate matrix below. For the time being it does not matter where the numbers came from (we will return to that later). The point is that we now have quantities that determine the rate of nucleotide change starting with a given nucleotide (A,C,G or T) in an ancestor and ending in any state in the descendent (given by each row of the matrix).

! = {!!"} =−0.886 0.190 0.633 0.0630.253 −0.696 0.127 0.3161.266 0.190 −1.519 0.0630.253 0.949 0.127 −1.329

⎜⎜⎜⎜

⎟⎟⎟⎟

Let’s take nucleotide A as an example. Assume the ancestor has an A; then we need only look at the first row, as it give the rates for starting with an A and ending in an A (1st column), C (2nd column), G (3rd column), or T (4th column).

! = {!!"} =−0.886 0.190 0.633 0.063. . . .. . . .. . . .

⎜⎜⎜⎜

⎟⎟⎟⎟

Based on the 1st row, the rate of change from A to, say, G is 0.633. The overall rate of change away from A (to any other k=3 “neighbor” states) is qAC + qAG + qAT = 0.886. Now we return to the meaning of −3! in a previous figure; we can see that for the matrix above, the total rate of change away from nucleotide A is simply - qAA = 0.886

Transition probability matrix for evolutionary time: The fully specified Q-matrix is for an instant in time. But, you say, marco-evolution happens over long periods of time! How can the Q-matrix be useful for that? The answer is that the Q-matrix can be generalized to any amount of time (t).

SIDE-NOTE: Units of time. For convenience, the Q-matrix is typically scaled so that the mean rate is one expected nucleotide substitution per unit of time. The means that the evolutionary time between two gene sequences can, very conveniently, be measured as the mean number of substitutions per site for that gene. Details will be presented elsewhere. !

(4)

(5)

Page 8: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 8!

Given some amount of evolutionary time, t, there is some probability that a substitution process started at nucleotide A would be observed to end at nucleotide G. This is called a transition probability, and is denoted by pAG(t). The generalized transition probability is denoted by pij(t). These values are NOT the same as those given in the Q-matrix. The values for all the transition probabilities, pij(t), are given in a different matrix denoted P(t). The transition probability matrix, P(t), is obtained as follows:

! ! = !!" Below gives the transition probabilities for the case of t = 0.5.

If time is long, there could be one or more changes in state along this evolutionary history

ancestral nucleotide:

A

descendant nucleotide:

G

time = 0.5

Transition probability, pAT(0.5): the probability of this change accounts for ALL POSSIBLE ways (different changes) for the process to start in A and end in G!

P(0.5)=0.708 0.081 0.183 0.0270.109 0.738 0.054 0.0990.367 0.081 0.524 0.0270.109 0.298 0.054 0.539

⎜⎜⎜⎜

⎟⎟⎟⎟

!Q=−0.886 0.190 0.633 0.0630.253 −0.696 0.127 0.3161.266 0.190 −1.519 0.0630.253 0.949 0.127 −1.329

⎜⎜⎜⎜

⎟⎟⎟⎟

! ! 0.5 = !!!.!!

instantaneous rate matrix (instant in time)

transition probability matrix (time, t, = 0.5)

Page 9: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 9!

PART 4: How to simulate the evolution of DNA sequence data

The Markov model framework we have developed in these notes is very powerful. One of the things we can do with it is generate a DNA sequence alignment according to the evolutionary process represented by our model! To do this we will need to exploit three properties of the Markov chain that we have already covered:

1. The Q matrix gives us the instantaneous rates for each “step” in evolutionary walk though the DNA state space (eq 4)

2. The time between fixation events is exponentially distributed (eq 2).

3. A “move rule” can be constructed from the probability of the next nucleotide state (eq 3).

A simple example of starting with nucleotide A: Lets assume that the ancestor had nucleotide A at some location in its genome (later we will learn how to choose a starting nucleotide, rather than just assuming it was known). For this case we only need to look at the 1st row of matrix Q: To use this information, we need to know how much time occurs until the ancestral state (A) changes to some other state. Since this time interval is exponentially distributed we can use equation (1) of these notes to obtain this interval of time.

ancestral nucleotide:

A

unknown descendant nucleotide:

? total time = 0.5

A G C T

�AG

�AT �AC

Page 10: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 10!

Since we are starting at state A, the value of λ = 0.886 (from Q above). We also need a value for !, which is a uniform(0,1) random number. I used a 10-sided die to generate this number (! = 0.901). Plugging the values of λ and ! into equation (1) from these notes gives the time to the next substitution = 0.118.

Now that we know when the next substitution will occur, we can use a “move rule” derived from the 1st row of the Q-matrix to compute the probabilities of change from A to C, to G and to T at that instant in time.

!!∗ != !!!!"−!!!

!= !0.1900.886 != !!0.214

!!∗ != !!!!"−!!!

!= !!0.6330.886 != !!0.714

!!∗ != !!!!"−!!!

!= !!0.0630.886 != 0.072

To decide which of the possible nucleotide substitutions occurred, we will need to draw another a uniform(0,1) random number. Again, we will use the 10-sided die for this, and denote this number as !.

If the value of ! is between 0.000 and 0.213 we say that A changed to C If the value of ! is between 0.214 and 0.927 we say that A changed to G If the value of ! is between 0.928 and 0.999 we say that A changed to T

ancestral nucleotide:

A

unknown descendant nucleotide:

? total time = 0.5

fixation event

t1 = 0.118

! = {!!"} =−0.886 0.190 0.633 0.063. . . .. . . .. . . .

⎜⎜⎜⎜

⎟⎟⎟⎟

!

Page 11: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 11!

I used a 10-sided die to generate ! = 0.741, which means (based on the above move rule) that A changed to G. Now we can update our history. Given we currently have a G at this site in the gene, the 3rd row of the Q matrix is now the relevant row. The process above is simply repeated: (i) draw an exponentially distributed interval of time and (ii) use a new move rule to decide which nucleotide substitutions occurred at the next fixation event.

ancestral nucleotide:

A

unknown descendant nucleotide:

? G

t1 = 0.118

A G C T

�GT �GC

�GA

ancestral nucleotide:

A

G

t1 = 0.118

T

t2 = 0.283

unknown descendant nucleotide:

?

Page 12: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 12!

The process is ended when an interval of time is drawn that is longer than the “remaining” length of the branch. This tells you that state at the actual end of the branch is equal to the state chosen for the last fixation event.

Extending evolution from one nucleotide site to the evolution of a complete gene: This is easy if we are willing to make the “independence assumption”; specifically, this is the assumption that sites evolve independently of each other. This will be true for neutral sites, and for sites where fitness does not depend on that state at another site. However, when fitness does depend on the state at another site (epistasis), then the independence assumption is violated. Under the independence assumption, the evolution of a full sequence of, say, 100 nucleotides is very easy. The process that is used for the 1st site is simply used again for 99 more sites (but, of course, using independent uniform(0,1) random numbers whenever they are needed).

Extending evolution from one lineage to an entire phylogenetic tree: Again, this is easy. The process described above is started from the root of a given tree (with branch lengths), and proceeds along each path until it hits a node of the tree (a bifurcation). The descendant state chosen for that node of the tree is used as the ancestor for each bifurcating path, and the process is repeated until each tip-species of the tree is reached, and a state is chosen for that species. Simulating a multi-species alignment involves simulating the data at a site for all species within a given tree, and repeating this simulation process for the required number of sites

ancestral nucleotide:

A

descendant nucleotide:

G

t1 = 0.118

T

t2 = 0.283 t3 = 0.159, and over-runs the branch:

No change in state

T

Page 13: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 13!

PART 5: The sequence evolution “lab protocol”

Objective: Use a Markov model of DNA evolution to evolve a set of gene sequences over a phylogeny of 4 of species. Most of the sequence will be evolved in class. Based on a class size of 35, this means that we will work together to generate 35 DNA sites for the gene. You will then evolve another 10 sites for this gene on your own. This means that you will generate a multi-species alignment of at least 45 DNA sites for 4 species.

The model of DNA evolution: Three kinds of prior information are required to simulate DNA sequence evolution (this is our “model”).

1. Tree with branch lengths 2. Equilibrium DNA frequencies 3. Q-matrix

A figure is provided at the end of this document that supplies all this information.

Use a 10-sided die to generate a uniform(0,1) random number: The interval between 0 and 1 is continuous. However, because we only need precision to 3 decimal places, it is easy to generate a random number within the interval with just a single 10-sided die. At this level of precision our interval is 0.000 to 0.999. To generate a random number, simply start with 0._ _ _, and roll the single die three times, filling in each level of precision with the result of a single roll.

The process has three basic steps:

1. Choose the nucleotide state at the root of the tree according to the nucleotide equilibrium frequencies for this model.

2. Start at the root and evolve sequences along each branch of the tree from each descendant node to each ancestral node.

3. Record the final states at the tips of the tree in the multi-sequence alignment

This basic procedure is repeated for every site that you want to evolve for this full “gene sequence”

Page 14: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 14!

Three steps in detail:

1. For a given site, choose the nucleotide state at the root of the tree.

a. Roll a 10-sided die to obtain a uniform(0,1) random number.

b. The equilibrium frequencies are: πA = 0.4, πC = 0.3, πG = 0.2, & πT = 0.1; therefore, select the state at the root of the tree as follows:

i. from 0.000 - 0.399 …choose A ii. from 0.400 - 0.699 …choose C iii. from 0.700 - 0.899 …choose G iv. from 0.900 - 0.999 …choose T

2. Start at the root of the tree and repeat the below process to systematically evolve the DNA for each descendant branch until you have reached all the tips of the tree. For each internal branch, the state of the descendant node is used as the ancestral state for the next branch.

a. Record the current DNA state. (If at the root, this is the state chosen in step 1 above.)

b. Choose the row of the Q-matrix based for the current DNA state.

c. Draw a uniform(0,1) random number (!),

d. Use (i) the random number ! and (ii) the total rate of change from the nucleotide (from the Q-matrix) to obtain an exponentially distributed time interval to the next substitution.

i. Does the interval exceed the end of the branch?

1. IF YES:

a. set the state at the end of the node equal to the current DNA state

b. select an “un-evolved” branch that already has a state for its ancestral node and start again with step 2a.

2. IF NO: do step 2e next.

e. Use the Q-matrix to create the move rule for the next DNA substitution

f. Draw a new uniform(0,1) random number.

g. Use that random number with the move rule to select the new state of the

DNA in the Markov chain

h. Return to step 2a above.

Page 15: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

! 15!

3. Record the data and repeat as necessary

a. Check that this step was reached only after all branches have been evolved (i.e., all tips of the tree now have a DNA state).

b. Record the DNA state for each species at the tip of the tree as a single position within the multi-sequence alignment.

Species 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Species 2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Species 3 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Species 4 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

c. If you have not yet reached the desired total length of the DNA sequence, then return to step 1 and begin the process again to generate the data for another site in the alignment.

Page 16: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

Species 1 Species 2 Species 3 Species 4

A C G T

A -0.886 0.190 0.633 0.063

C 0.253 -0.696 0.127 0.316

G 1.266 0.190 -1.519 0.063

T 0.253 0.949 0.127 -1.329

TO:

FROM:

πA πC πG πT

0.4 0.3 0.2 0.1

Root

2. Equilibrium DNA frequencies

1. Tree with branch lengths

3. Q-matrix

Page 17: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

AAAA -- 0.199465 AGAA -- 0.014711 CAAA -- 0.018317 CGAA -- 0.001490AAAC -- 0.004185 AGAC -- 0.000725 CAAC -- 0.000628 CGAC -- 0.000210AAAG -- 0.014711 AGAG -- 0.019868 CAAG -- 0.001490 CGAG -- 0.002878AAAT -- 0.001395 AGAT -- 0.000242 CAAT -- 0.000166 CGAT -- 0.000048AACA -- 0.009075 AGCA -- 0.000843 CACA -- 0.005277 CGCA -- 0.000669AACC -- 0.000703 AGCC -- 0.000315 CACC -- 0.004524 CGCC -- 0.002262AACG -- 0.000843 AGCG -- 0.002202 CACG -- 0.000669 CGCG -- 0.002304AACT -- 0.000121 AGCT -- 0.000048 CACT -- 0.000375 CGCT -- 0.000188AAGA -- 0.028625 AGGA -- 0.005985 CAGA -- 0.003304 CGGA -- 0.001065AAGC -- 0.000702 AGGC -- 0.000755 CAGC -- 0.000210 CGGC -- 0.000209AAGG -- 0.005985 AGGG -- 0.032738 CAGG -- 0.001065 CGGG -- 0.006655AAGT -- 0.000234 AGGT -- 0.000252 CAGT -- 0.000048 CGGT -- 0.000059AATA -- 0.003025 AGTA -- 0.000281 CATA -- 0.000959 CGTA -- 0.000120AATC -- 0.000121 AGTC -- 0.000048 CATC -- 0.000360 CGTC -- 0.000180AATG -- 0.000281 AGTG -- 0.000734 CATG -- 0.000120 CGTG -- 0.000420AATT -- 0.000154 AGTT -- 0.000073 CATT -- 0.000404 CGTT -- 0.000202ACAA -- 0.004185 ATAA -- 0.001395 CCAA -- 0.000628 CTAA -- 0.000166ACAC -- 0.005482 ATAC -- 0.000350 CCAC -- 0.009592 CTAC -- 0.000415ACAG -- 0.000725 ATAG -- 0.000242 CCAG -- 0.000210 CTAG -- 0.000048ACAT -- 0.000350 ATAT -- 0.001594 CCAT -- 0.000415 CTAT -- 0.001214ACCA -- 0.000703 ATCA -- 0.000121 CCCA -- 0.004524 CTCA -- 0.000375ACCC -- 0.019527 ATCC -- 0.000752 CCCC -- 0.167489 CTCC -- 0.005866ACCG -- 0.000315 ATCG -- 0.000048 CCCG -- 0.002262 CTCG -- 0.000188ACCT -- 0.000752 ATCT -- 0.001546 CCCT -- 0.005866 CTCT -- 0.007452ACGA -- 0.000702 ATGA -- 0.000234 CCGA -- 0.000210 CTGA -- 0.000048ACGC -- 0.001837 ATGC -- 0.000116 CCGC -- 0.004796 CTGC -- 0.000208ACGG -- 0.000755 ATGG -- 0.000252 CCGG -- 0.000209 CTGG -- 0.000059ACGT -- 0.000116 ATGT -- 0.000535 CCGT -- 0.000208 CTGT -- 0.000607ACTA -- 0.000121 ATTA -- 0.000154 CCTA -- 0.000360 CTTA -- 0.000404ACTC -- 0.001781 ATTC -- 0.000517 CCTC -- 0.011625 CTTC -- 0.001716ACTG -- 0.000048 ATTG -- 0.000073 CCTG -- 0.000180 CTTG -- 0.000202ACTT -- 0.000517 ATTT -- 0.004711 CCTT -- 0.001716 CTTT -- 0.013873

Pattern Probabilities (I)

Wednesday, July 25, 12

Site pattern probabilities for 4-species alignment (page 1)

Page 18: Sequence evolution lab - Dalhousie Universityawarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/lab/... · PART 5: The “lab protocol” 13 Tree and Model 16 Table of site pattern probabilities

GAAA -- 0.045565 GGAA -- 0.005060 TAAA -- 0.006106 TGAA -- 0.000497GAAC -- 0.001004 GGAC -- 0.000453 TAAC -- 0.000166 TGAC -- 0.000048GAAG -- 0.005060 GGAG -- 0.017648 TAAG -- 0.000497 TGAG -- 0.000959GAAT -- 0.000335 GGAT -- 0.000151 TAAT -- 0.000099 TGAT -- 0.000038GACA -- 0.002514 GGCA -- 0.000532 TACA -- 0.000959 TGCA -- 0.000120GACC -- 0.000315 GGCC -- 0.000194 TACC -- 0.000548 TGCC -- 0.000274GACG -- 0.000532 GGCG -- 0.002904 TACG -- 0.000120 TGCG -- 0.000420GACT -- 0.000048 GGCT -- 0.000036 TACT -- 0.000215 TGCT -- 0.000108GAGA -- 0.014437 GGGA -- 0.008240 TAGA -- 0.001101 TGGA -- 0.000355GAGC -- 0.000476 GGGC -- 0.001251 TAGC -- 0.000048 TGGC -- 0.000059GAGG -- 0.008240 GGGG -- 0.056794 TAGG -- 0.000355 TGGG -- 0.002218GAGT -- 0.000159 GGGT -- 0.000417 TAGT -- 0.000038 TGGT -- 0.000030GATA -- 0.000838 GGTA -- 0.000177 TATA -- 0.001119 TGTA -- 0.000143GATC -- 0.000048 GGTC -- 0.000036 TATC -- 0.000231 TGTC -- 0.000116GATG -- 0.000177 GGTG -- 0.000968 TATG -- 0.000143 TGTG -- 0.000488GATT -- 0.000073 GGTT -- 0.000040 TATT -- 0.000893 TGTT -- 0.000447GCAA -- 0.001004 GTAA -- 0.000335 TCAA -- 0.000166 TTAA -- 0.000099GCAC -- 0.001837 GTAC -- 0.000116 TCAC -- 0.001389 TTAC -- 0.000240GCAG -- 0.000453 GTAG -- 0.000151 TCAG -- 0.000048 TTAG -- 0.000038GCAT -- 0.000116 GTAT -- 0.000535 TCAT -- 0.000240 TTAT -- 0.002009GCCA -- 0.000315 GTCA -- 0.000048 TCCA -- 0.000548 TTCA -- 0.000215GCCC -- 0.009764 GTCC -- 0.000376 TCCC -- 0.019456 TTCC -- 0.001275GCCG -- 0.000194 GTCG -- 0.000036 TCCG -- 0.000274 TTCG -- 0.000108GCCT -- 0.000376 GTCT -- 0.000773 TCCT -- 0.001275 TTCT -- 0.006924GCGA -- 0.000476 GTGA -- 0.000159 TCGA -- 0.000048 TTGA -- 0.000038GCGC -- 0.001823 GTGC -- 0.000117 TCGC -- 0.000694 TTGC -- 0.000120GCGG -- 0.001251 GTGG -- 0.000417 TCGG -- 0.000059 TTGG -- 0.000030GCGT -- 0.000117 GTGT -- 0.000530 TCGT -- 0.000120 TTGT -- 0.001005GCTA -- 0.000048 GTTA -- 0.000073 TCTA -- 0.000231 TTTA -- 0.000893GCTC -- 0.000891 GTTC -- 0.000258 TCTC -- 0.004935 TTTC -- 0.003240GCTG -- 0.000036 GTTG -- 0.000040 TCTG -- 0.000116 TTTG -- 0.000447GCTT -- 0.000258 GTTT -- 0.002355 TCTT -- 0.003240 TTTT -- 0.031522

Pattern Probabilities (II)

Wednesday, July 25, 12

Site pattern probabilities for 4-species alignment (page 2)