27
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Do ugherty, Michael L. Bittner, Paul S. Me ltzer1, and Jeffery M. Trent

(2) Ratio statistics of gene expression levels and applications to microarray data analysis

  • Upload
    agnes

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

(2) Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent. Outline. Introduction Ratio Statistics - PowerPoint PPT Presentation

Citation preview

Page 1: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(2) Ratio statistics of gene expression levels and applications to microarray data analysis

Bioinformatics, Vol. 18, no. 9, 2002

Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent

Page 2: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

OutlineOutline

Introduction

Ratio Statistics

Quality Metric for Ratio Statistics

Conclusion

Page 3: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

IntroductionIntroduction

Motivation Expression-based analysis for large families of genes

has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.

Page 4: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

IntroductionIntroduction Results 1. estimation of signal ratios from the two channels,

and the significance of those ratios.

2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper.

3. a quality metric is formulated for spots

Page 5: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio StatisticsRatio Statistics

Page 6: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Consider a microarray having n genes, with red and green fluorescent expression values labeled by

and , respectively.

Hypothesis test:

Assumption:

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variationconstant coefficient of variation

nRRR ,...,, 21 nGGG ,...,, 21

kk

kk

kk

GR

H

GG

RR

c

c

0under

kk

kk

GR

GR

H

H

:

:

1

0

Page 7: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio test statistics:

Assuming and to be normally and

identically distributed, has the density function

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)

kkk GRT /

kR kG

kT

],)1(2

)1(exp[

2)1(

1)1();(

2

2

22

2

tc

t

tc

ttctf

kT

n

i i

i

t

t

nc

12

2

)1(

)1(1ˆ

Page 8: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio Statistics assuming a constant Ratio Statistics assuming a constant coefficient of variation coefficient of variation (cont.)

self-self experiment Duplicate

),log(log)log(log

logloglog

,'/

''

'

kkkk

kkk

GGRR

ttT

ttT

).log(loglog where

)log(1

log1

2

R

R

n

ik

RR

Rn

ck

Page 9: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)

Confidence interval

1. Integrating the ratio density function

2. The C.I. is determined by the parameter c, one can

either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.

2

2'log

2log

2'log

2loglog

4c

)()(

Therefore,

GGRRT

Page 10: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratioto-noise ratio

The actual expression intensity measurement is of the form

kBRkkk BRSRR )(

level backgroundmean theis

and level, background fluoresent theis

, gene of

t measuremenintensity expression theis where

kBRk

k

BR

k

SR

Page 11: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)

Null hypothesis of interest:

test statistics:

kkkk GRSGSR HH : : 00

kk SGSR

k

k

k

SR

BRkk

kR

BRSRE

RE

])[(

][

kkk GRT /

Page 12: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)

Major difference:

1. the assumption of a constant cv applies to

and , not to and

2. the density of is not applicable

SNR (signal-to-noise ratio)

kSR

kSG kR kG

kT

Page 13: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Assuming that are independent,

SNR (SNR (signal-to-noise ratiosignal-to-noise ratio))

and kk BRSR

2222 )(kkkkk BRSRBRSRR c

k

k

kk

kBR

SR

BRBRk

kR BRE

SRESNR

][

][

2

22

22

2

222

2 1)(

kk

k

k

kk

k

k

kRSR

BR

SR

BRSR

R

RR SNR

ccc

c

Page 14: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

The Expression intensity scatter plot

Page 15: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Confidence interval for the Confidence interval for the test statisticstest statistics

Assumption:

k

k

BGkk

BRkk

k

kk BGSG

BRSR

G

RT

)(

)(

BGBGBGp

BRBRBRp

NpN

NpNT

),(),(

),(),(

)( ,under 0 kkkk GRSGSRpH

t.independen and

ddistributenormally are ,,, kkkk BGBRSGSR

Page 16: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Confidence interval for the Confidence interval for the test statistics test statistics (cont.)

Under the assumption of constant cv for the signal (wi

thout the background),

cpp

ratio) std d(backgroun /

ratio) noise-to-(signal /

par.) (variance },max{

BGBR

B

BGBRB

ps

),0(),(

),0(),(

BGBB

BGBB

NcssN

NcssNT

Page 17: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

The 99% confidence interval for ratio statistic

1 (b) )1or ( 100 (a)

,2.0

BGBR

c

Page 18: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Correction of background Correction of background estimationestimation

Owing to interaction between the fluorescent signal and background, local-background estimation is often biased.

To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.

Page 19: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Correction of background Correction of background estimation estimation (cont.)(cont.)

Simulation

1. generate 10,000 data points from exp. dist. with

2,000 to simulate 10,000 gene expression levels,

2. The intensity measurement for each channel is

further simulated by using a normal dist. with mean

intensity from the exp. dist. and a constant cv of 0.2

3. simulate background level by a normal dist.

(1) no bias: background level ~ N (0,100)

(2) some bias: background level ~ N (b,100)

),0(),(

),0(),(

BGp

BGp

NpN

NpNT

Page 20: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Scatter plot of simulated expression data

500 of bias estimation background with points data 10,000 (b)

estimation background from bias no with points data 10,000 )(a

dog-leg effect

Page 21: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Correction of background Correction of background estimation estimation (cont.)(cont.)

G = aR+b

we employ a chi-square fitting method that minimizes

N

k GR

kk

kk

baRG

122

22 ))((

N

k BGBRkk

N

k kkBGBRkk

GRc

RGGRcb

11222

11222

)ˆ2ˆ2)((

)()ˆ2ˆ2)((

Page 22: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

Quality Metric for Ratio Quality Metric for Ratio StatisticsStatistics

For a given cDNA target, the following factors affect ratio measurement quality:

(1) Weak fluorescent intensities

(2) A smaller than normal detected target area

(3) A very high local background level

(4) A high standard deviation of target intensity

Page 23: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(1)Fluorescent intensity (1)Fluorescent intensity measurement quality measurement quality

Under the null hypothesis, the signal means are equal, so that

B

R

BGBR

RGR SNRSNR

},max{

},min{

otherwise , 1

6ˆ 2

GR3 ,

ˆ 6

GR

3ˆ 2

GR ,0

obtain to,ˆ and G)/2(R ,estimators

hypothesis-nullby their and replace We

BB

B

B

BR

Iw

Page 24: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(2)Target area measurement (2)Target area measurement quality quality

.target

theof components connectedlargest two theof

area thebe let and tip,-print particular afor

t cDNA targe theofmask of area thebe Let

k

A

A

kT

M

otherwise ,1

20.0 ,

}05.0,/10max{a ,0

by

qualityt measuremen area the define We

./

istarget each of area alproportion The

minmin

min

min

bb

M

a

MTk

sasss

a-s

As

w

AAak

Page 25: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(3)Background flatness quality(3)Background flatness quality

Define background flatness

similarly. defined is and

6 ,0

64 ,3

)6(

4 ,1

where},,min{

BG

BRBRk

BRBRkBRBRBR

kBRBR

BRBRk

BR

BGBRb

w

BR

BRBR

BR

w

www

Page 26: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(4)Signal intensity consistency (4)Signal intensity consistency quality quality

Typical target shap

cv=0.48 cv=0.45 cv=0.31

cv=0.81 cv=0.98 cv=0.59

Page 27: (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(4)Signal intensity consistency (4)Signal intensity consistency quality quality (cont.)

9.0 ,1

1.10.9 ,2.0

9.0

1.1 ,0

channels,green and

red for the variationoft coefficienintensity the

between minimun thedenote Letting

min,

min,min,

min,

min,

k

kk

k

s

k

cv

cvcv

cv

w

cv