24
ISMB 2007 Review Kyung-Ah Sohn

ISMB 2007 Review

  • Upload
    ania

  • View
    43

  • Download
    5

Embed Size (px)

DESCRIPTION

ISMB 2007 Review. Kyung-Ah Sohn. Bayesian Association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations Jim C. Huang, Anitha Kannan and John Winn University of Toronto, MS Research, Cambridge. - PowerPoint PPT Presentation

Citation preview

Page 1: ISMB 2007 Review

ISMB 2007 Review

Kyung-Ah Sohn

Page 2: ISMB 2007 Review

Bayesian Association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations

Jim C. Huang, Anitha Kannan and John Winn

University of Toronto, MS Research, Cambridge

Page 3: ISMB 2007 Review

A statistical method for alignment-free comparison of regulatory sequences

Miriam R. Kantorovitz, Gene E. Robinson and Saurabh Sinha

UIUC, USA

Page 4: ISMB 2007 Review

Motivation

How do we measure the similarity between two regulatory DNA sequences in an alignment-free manner? For sequences which do not demonstrate any

statistically significant alignment e.g. two sequences which are not orthologous, yet are

functionally related detecting regulatory regions in the new genome that are

homologous to known enhancers or promoters, which show a significantly less level of alignment than coding sequences

Page 5: ISMB 2007 Review

Comparison of k-word frequency distributionHow to compare two 4k-dimensional vectors of k-

word counts?

1. Euclidian distance

2. Information theoretic measure like KL-distance

3. Geometric measure such as the cosine of the angle between the count vectors

4. Statistical measure such as the correlation coefficient

Page 6: ISMB 2007 Review

Contribution of this paper

D2 score: Alignment-free similarity measure defined as the number of k-word matches

D2z score: normalized measure that captures the statistical significance of D2 score

Reduce the time complexity from O(42k) to O(4k)

Page 7: ISMB 2007 Review

D2 score

For A=A1A2…An1, B=B1B2…Bn2

),( jiY

}11,11|),{( 21 knjknijiI

: indicator variable for a match between the k-words starting at position i in A and at position j in B

The number of k-word matches between the two sequences A and B, including overlaps

Page 8: ISMB 2007 Review

D2 score

The inner product of the vectors of word counts in A and B

Let : the set of all k-words on the

alphabet of size d : the number of times w

appears in the sequence

Then

},...,,{ 21 kdwwwW

),...,,(21

Aw

Aw

Aw

Akd

NNNN

Ww

Bw

Aw

BA NNNNBAD ,),(2

Page 9: ISMB 2007 Review

D2z score

)(

)(),(),(2

2

22

D

DEBADBAzD

where E(D2) and σ (D2): the expectation and the standard deviation of D2(A,B)

Approximately standard normal when the lengths of the sequences are large enough

How to compute E(D2) and σ (D2)?

1. IID case

2. Markov model case

Page 10: ISMB 2007 Review

Expectation

IID modelk

k

a

Ba

Aa

k

lljlijiji gffBAYYE 1,1

1

0),(),( )Pr()1Pr()(

kgknknDE 1,1212 )1)(1()(

where faA

: background probability of letter a in the sequence A

a

yBa

xAayx ffg )()(,

Page 11: ISMB 2007 Review

Expectation

Markov Model

)|(Pr)(Pr)|(Pr)(Pr

)(Pr)(Pr)1Pr()(

1||

111

||),(),(

wwwwww

wwYYE

B

kw

BAA

kw

BAjiji

Page 12: ISMB 2007 Review

Variance

),(),,(

),(),(),(

),(2 ),()()(tsji

tsjiji

ji YYCovYVarDVar

Page 13: ISMB 2007 Review

Variance – IID case

Case (a): Cov(Y(i,j), Y(s,t))=0

Case (b):

Case (c): …

Page 14: ISMB 2007 Review

Variance – Marcov Model

Case (a)

Page 15: ISMB 2007 Review

Evaluation and Comparison

Evaluate if functionally and/or evolutionarily related sequence pairs are scored better than unrelated pairs of sequences randomly chosen from the genome Positive set: a set of CRMs, known to regulate expression in the same

tissue Negative set: a set of equally many randomly chosen non-coding

sequences Compare each pair of sequences in the positive set, and also for

negative set, sort all the scores in one combined list, and then count how many of the pairs in top half of this list are from the positive set

Page 16: ISMB 2007 Review

Evaluation on functionally related regulatory sequences

Page 17: ISMB 2007 Review

Evaluation on orthologous regulatory sequences

Page 18: ISMB 2007 Review

Summary

Proposed a new sequence similarity score

Page 19: ISMB 2007 Review

Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics

Song Wu, Jie Yang and Rongling Wu

Department of Statistics, University of Florida

Page 20: ISMB 2007 Review

HIV dynamics

Bi-exponential model for short-term dynamic changes of HIV virion copies in AIDS patients after initiation of HAART

tt ePePtV 2121)(

Plasma load at time t

Viral decay rates in the first and second phase

Baseline viral loads when the treatment is initiated

Lack of incorporating the characteristics of long-term HIV viral load changes

Page 21: ISMB 2007 Review

HIV dynamics

Two phases of viral load decayThe early rapid decay – λ1

The late slow decay corresponding to the cleaning of free and latent viruses

It is not sensible to assume constant λ2 over a long term treatment period

ttt ePePtV )(21

21)(

Page 22: ISMB 2007 Review

Natural cubic spline

Piecewise third-order polynomial function that passes through a set of control points

Estimate λ2(t) using a cubic spline

Page 23: ISMB 2007 Review

Quantitative genetic model

marker QTL

Alleles with frequency

M/m A/a

p/1-p q/1-q

Genetically associated

D: linkage disequilibrium

Four haplotypes of MA, Ma, mA, and ma with frequencies

p11=pq+D, p10=p(1-q)-D,

p01=(1-p)q-D, p00=(1-p)(1-q)+D

Page 24: ISMB 2007 Review

Linear model linking genetic and residual effects