38
What is DNA copy number? Normally, each somatic cell contains 2 copies of every chromosome.

What is DNA Copy Number

Embed Size (px)

DESCRIPTION

Techcnics in DNA: copy number

Citation preview

Page 1: What is DNA Copy Number

What is DNA copy number?

Normally, each somatic cell contains 2 copies of everychromosome.

Page 2: What is DNA Copy Number

What is DNA copy number?

One of the earliest observed “copy number” changes is trisomyof chromosome 21 in Down’s Syndrome.

Page 3: What is DNA Copy Number

What is DNA copy number?

In fact, it became apparent later that chromosome aberrationscome in all forms and sizes.

Page 4: What is DNA Copy Number

High density DNA copy number data

Page 5: What is DNA Copy Number

Array-based Comparative Genomic Hybridization

Page 6: What is DNA Copy Number

Figures from Garnis et al. (2004)

Page 7: What is DNA Copy Number

DNA Copy Number Data from Different Platforms

Page 8: What is DNA Copy Number

Why analyze DNA copy number?Cancer genomics

Page 9: What is DNA Copy Number

Why analyze DNA copy number?Douglas et al. (2004), colorectal cancer.

Page 10: What is DNA Copy Number

Why analyze DNA copy number?Copy number polymorphisms in Hapmap samples

CNV in natural population may be risk factors for diseases.

Page 11: What is DNA Copy Number

Statistical methods for single sample, total copynumber segmentation

1. Circular Binary Segmentation algorithm of Olshen et al.(2004)

2. HMM based methods (Fridlyand et al. (2004), Lai et al.(2007))

3. Wavlet based methods of Hsu et al. (2005)4. Cluster ALong Chromosomes method of Wang et al.

(2005)5. Many others: CBS, HMM, GLAD, CNV, CGHseg,

Quantreg,Wavelet, Lowess, ChARM, GA, L1Regularizaiton, ACE...

Page 12: What is DNA Copy Number

Statistical methods for single sample, total copynumber segmentation

1. Circular Binary Segmentation algorithm of Olshen et al.(2004)

2. HMM based methods (Fridlyand et al. (2004), Lai et al.(2007))

3. Wavlet based methods of Hsu et al. (2005)4. Cluster ALong Chromosomes method of Wang et al.

(2005)5. Many others: CBS, HMM, GLAD, CNV, CGHseg,

Quantreg,Wavelet, Lowess, ChARM, GA, L1Regularizaiton, ACE...

Page 13: What is DNA Copy Number

Statistical methods for single sample, total copynumber segmentation

1. Circular Binary Segmentation algorithm of Olshen et al.(2004)

2. HMM based methods (Fridlyand et al. (2004), Lai et al.(2007))

3. Wavlet based methods of Hsu et al. (2005)

4. Cluster ALong Chromosomes method of Wang et al.(2005)

5. Many others: CBS, HMM, GLAD, CNV, CGHseg,Quantreg,Wavelet, Lowess, ChARM, GA, L1Regularizaiton, ACE...

Page 14: What is DNA Copy Number

Statistical methods for single sample, total copynumber segmentation

1. Circular Binary Segmentation algorithm of Olshen et al.(2004)

2. HMM based methods (Fridlyand et al. (2004), Lai et al.(2007))

3. Wavlet based methods of Hsu et al. (2005)4. Cluster ALong Chromosomes method of Wang et al.

(2005)

5. Many others: CBS, HMM, GLAD, CNV, CGHseg,Quantreg,Wavelet, Lowess, ChARM, GA, L1Regularizaiton, ACE...

Page 15: What is DNA Copy Number

Statistical methods for single sample, total copynumber segmentation

1. Circular Binary Segmentation algorithm of Olshen et al.(2004)

2. HMM based methods (Fridlyand et al. (2004), Lai et al.(2007))

3. Wavlet based methods of Hsu et al. (2005)4. Cluster ALong Chromosomes method of Wang et al.

(2005)5. Many others: CBS, HMM, GLAD, CNV, CGHseg,

Quantreg,Wavelet, Lowess, ChARM, GA, L1Regularizaiton, ACE...

Page 16: What is DNA Copy Number

HMM Model of Fridlyand et al. (2004)This is a classic application of hidden Markov models:

▶ The underlying states 1, . . . ,K represent the “true” copynumber.

▶ Given state k , the observed intensity levels are N(�k , �2).

▶ The transition matrices and emission parameters areestimated by EM.

▶ The AIC or BIC criterion is used to choose K .

Page 17: What is DNA Copy Number

A Bayesian Model for Inference

When we estimate model parameters,confidence intervals are desirable!

1. Confidence bands on estimated copy number.2. How certain are we that [i , j] contains a CNV?3. Confidence intervals on the aberration boundaries.4. Confidence intervals on global measures of “complexity",

such as total number of aberrations.

Page 18: What is DNA Copy Number

A Bayesian Model for Inference

When we estimate model parameters,confidence intervals are desirable!

1. Confidence bands on estimated copy number.

2. How certain are we that [i , j] contains a CNV?3. Confidence intervals on the aberration boundaries.4. Confidence intervals on global measures of “complexity",

such as total number of aberrations.

Page 19: What is DNA Copy Number

A Bayesian Model for Inference

When we estimate model parameters,confidence intervals are desirable!

1. Confidence bands on estimated copy number.2. How certain are we that [i , j] contains a CNV?

3. Confidence intervals on the aberration boundaries.4. Confidence intervals on global measures of “complexity",

such as total number of aberrations.

Page 20: What is DNA Copy Number

A Bayesian Model for Inference

When we estimate model parameters,confidence intervals are desirable!

1. Confidence bands on estimated copy number.2. How certain are we that [i , j] contains a CNV?3. Confidence intervals on the aberration boundaries.

4. Confidence intervals on global measures of “complexity",such as total number of aberrations.

Page 21: What is DNA Copy Number

A Bayesian Model for Inference

When we estimate model parameters,confidence intervals are desirable!

1. Confidence bands on estimated copy number.2. How certain are we that [i , j] contains a CNV?3. Confidence intervals on the aberration boundaries.4. Confidence intervals on global measures of “complexity",

such as total number of aberrations.

Page 22: What is DNA Copy Number

Observations

1. For array-CGH data, there is a known baseline at 0.

2. Due to mosaicism, the data is drawn from mixtures ofdiscrete copy number levels, and thus is continuous.

3. In some tumors the number of distinct levels is very high.

Page 23: What is DNA Copy Number

Observations

1. For array-CGH data, there is a known baseline at 0.2. Due to mosaicism, the data is drawn from mixtures of

discrete copy number levels, and thus is continuous.

3. In some tumors the number of distinct levels is very high.

Page 24: What is DNA Copy Number

Observations

1. For array-CGH data, there is a known baseline at 0.2. Due to mosaicism, the data is drawn from mixtures of

discrete copy number levels, and thus is continuous.3. In some tumors the number of distinct levels is very high.

Page 25: What is DNA Copy Number

Fitted Levels

Page 26: What is DNA Copy Number

Heterogeneity of cancer samples

Image from: http://science.kennesaw.edu/ mhermes/cisplat/cisplat19.htm

Page 27: What is DNA Copy Number

Stochastic Change Model

St ∈ {baseline, changed}

Page 28: What is DNA Copy Number

Stochastic Change Model

St ∈ {baseline, changed}

baseline state: �t = 0, changed state: �t ∼ N(�, v).

If St jumps, �t takes on new value. Otherwise �t = �t − 1.

Page 29: What is DNA Copy Number

Stochastic Change Model

St ∈ {baseline, changed}

baseline state: �t = 0, changed state: �t ∼ N(�, v).

If St jumps, �t takes on new value. Otherwise �t = �t − 1.

yt = �t + ��t , �t ∼ N(0,1)

Page 30: What is DNA Copy Number

Stochastic Change Model

P(St = changed ∣ St−1 = baseline) = p

P(St = different changed state ∣ St−1 = changed) = b

P(St = baseline ∣ St−1 = changed) = c

Page 31: What is DNA Copy Number

Stochastic Change Model

P(St = changed ∣ St−1 = baseline) = p

P(St = different changed state ∣ St−1 = changed) = b

P(St = baseline ∣ St−1 = changed) = c

This can be modeled with a 3-state Markov model with transitionmatrix:

P =

⎛⎝ 1− p 12p 1

2pc a bc b a

⎞⎠ .

Page 32: What is DNA Copy Number

Estimating �t , St

We can compute:

E(�t ∣ y1:n) “smoothed" estimate of meanP(St = changed ∣ y1:n) probability of CNV at t

P(CNV at [i,j] ∣ y1:n) probability of aberration at [i , j]

Page 33: What is DNA Copy Number

Estimating �t , St

The posterior distribution of �t given Yn (1 ≤ t ≤ n), which is amixture of normal distributions and a point mass at 0:

�t ∣Yn ∼ �t�0 +∑

1≤i≤t≤j≤n

�ijtN(�ij , vij).

The parameters of this distribution can be computed byrecursive formulas.

E(�t ∣ y1:n) =∑

1≤i≤t≤j≤n

�ijt�ij ,

P(St = changed ∣ y1:n) = �t

P(CNV at [i,j] ∣ y1:n) = �ijt ,

where�t = �

∗t/

At , �ijt = �∗ijt/

At , At = �∗t +

∑1≤i≤t≤j≤n

�∗ijt ,

�∗t = pt [(1− p)p̃t+1 + cq̃t+1]

/c,

�∗ijt =

{qi,t (pp̃t+1 + bq̃t+1)

/p, i ≤ t = j,

aqi,t q̃j,t+1 i,t t+1,j/(p i,j ), i ≤ t < j.

Page 34: What is DNA Copy Number

Estimating �t , St

The posterior distribution of �t given Yn (1 ≤ t ≤ n), which is amixture of normal distributions and a point mass at 0:

�t ∣Yn ∼ �t�0 +∑

1≤i≤t≤j≤n

�ijtN(�ij , vij).

The parameters of this distribution can be computed byrecursive formulas.

E(�t ∣ y1:n) =∑

1≤i≤t≤j≤n

�ijt�ij ,

P(St = changed ∣ y1:n) = �t

P(CNV at [i,j] ∣ y1:n) = �ijt ,

where�t = �

∗t/

At , �ijt = �∗ijt/

At , At = �∗t +

∑1≤i≤t≤j≤n

�∗ijt ,

�∗t = pt [(1− p)p̃t+1 + cq̃t+1]

/c,

�∗ijt =

{qi,t (pp̃t+1 + bq̃t+1)

/p, i ≤ t = j,

aqi,t q̃j,t+1 i,t t+1,j/(p i,j ), i ≤ t < j.

Page 35: What is DNA Copy Number

Hyperparameter Estimation

The model was defined as:

yt = �t + ��t , �t ∼ N(0,1)

baseline state: �t = 0, changed state: �t ∼ N(�, v).St modeled by a 3-state Markov model with transition matrix:

P =

⎛⎝ 1− p 12p 1

2pc a bc b a

⎞⎠ .

The hyperparameters of this model are �, �, v , a, b, c , p.

Likelihood of the data as a function of these hyperparameterscan be expressed by recursive formulas. Maximum-likelihoodvalues, computed by the EM algorithm, are used.

Page 36: What is DNA Copy Number

Hyperparameter Estimation

The model was defined as:

yt = �t + ��t , �t ∼ N(0,1)

baseline state: �t = 0, changed state: �t ∼ N(�, v).St modeled by a 3-state Markov model with transition matrix:

P =

⎛⎝ 1− p 12p 1

2pc a bc b a

⎞⎠ .

The hyperparameters of this model are �, �, v , a, b, c , p.

Likelihood of the data as a function of these hyperparameterscan be expressed by recursive formulas. Maximum-likelihoodvalues, computed by the EM algorithm, are used.

Page 37: What is DNA Copy Number

Confidence Bands for BT474

Page 38: What is DNA Copy Number

Inference on Measures of Genome Complexity