44
Differentially expressed genes 09/19/07

Differentially expressed genes

  • Upload
    andren

  • View
    31

  • Download
    1

Embed Size (px)

DESCRIPTION

Differentially expressed genes. 09/19/07. Identify differentially expressed genes. Fold Change. Based on the expression index, select genes with high fold change (e.g., R/G > 3) Advantage: Intuitive Larger fold change may indicate greater biological impact. Drawback - PowerPoint PPT Presentation

Citation preview

Page 1: Differentially expressed genes

Differentially expressed genes

09/19/07

Page 2: Differentially expressed genes

Identify differentially expressed genes

Page 3: Differentially expressed genes

Fold Change

• Based on the expression index, select genes with high fold change (e.g., R/G > 3)

• Advantage: – Intuitive

– Larger fold change may indicate greater biological impact.

• Drawback – Reliable estimates are difficult to get.

Page 4: Differentially expressed genes

Fold change is noisy

Log-transformed expression in replicate #1

Log-

tran

sfor

med

exp

ress

ion

in r

eplic

ate

#2

Noise is very high at low intensity.

Page 5: Differentially expressed genes

SAM• Significance Analysis of Microarrays (SAM) considers a

signal-to-noise ratio.

where

• d(i) can large if either the signal is large or the noise is low. Therefore, it is different from fold change.

• Genes are ranked by d(i). The top candidates genes correspond to most positive or negative d(i).

(Tusher et al. 2001)

Page 6: Differentially expressed genes

Permutation test

1

23

4

5

6

I

U

“I”

“U”

53

4

6

1

2

If a gene expresses at the same level in I and U conditions, then then relabeling the arrays will not affect the result of the value of d.

Page 7: Differentially expressed genes

SAM

• To test for statistical significance, arrays are randomly permuted.

• For each permutation, compute and rank the result dp(i).

• Calculate • Idea is that for truly differentially expressed

genes, d(i) should be greater than dE(i).

• Select those d(i) that are different from dE(i) more than a threshold level .

))(()( idEid pE

Page 8: Differentially expressed genes
Page 9: Differentially expressed genes

Statistical hypothesis testing

• Null hypothesis H0: there is no association

between the expression levels and the sample groups.

• Alternative hypothesis H1: there is association.

• Differentially expressed genes Rejection of null hypothesis.

• Genes are selected regardless of fold change.

Page 10: Differentially expressed genes

Single Hypothesis Testing

• Calculate the value a test statistic.

• IF the value is very unlikely given the null hypothesis H0, THEN

– H0 is rejected and H1 is accepted.

– The gene is differentially expressed.

• ELSE– H0 is not rejected.

– The gene is not differentially expressed.

Page 11: Differentially expressed genes

Rejection Region

t-value

Den

sity

0

0

Hreject not do , If

Hreject , If

t

t

Page 12: Differentially expressed genes

Two type of errors

0

0

1

Hreject torequired minimum value

)H |Pr(t level ceSignifican

)|Pr( Power

p

Ht

t-value

Den

sity

Page 13: Differentially expressed genes

p-value

The p-value is the probability of obtaining a result at least as extreme as a given data point.

It is also the minimum significance level required to reject H0.

Page 14: Differentially expressed genes

Choice of test statistic

Standard t-test

• Assume that yij are Gaussian distributed, then ti is given by the

student-t distribution.

• A p-value is calculated from t-distribution with the 2n-2 degree of freedom.

Issues:• When n is small the denominator is an unreliable estimate of the

variance.

• The assumption that yij are Gaussian is often violated in real data.

nss

yyt

ii

iii

/)( 22

21

21

Page 15: Differentially expressed genes

Variance shrinkage

• Basic idea: The variance at different genes should be correlated. If the data are noisy, then they are likely to be noisy everywhere. Thus one can use the information from other genes to estimate the variance at a given gene.

Page 16: Differentially expressed genes

Variance shrinkage

(Smyth 2004) Assume

and

where d0 and s02 correspond to the pooled data.

Then

Modify the t-statistic by replacing si2 with

The new statistic obeys t-distribution with d0 + di degrees of freedom.

22 where,/)(~ 222 nddds iiiii

2000

22 /)(~/1 sddi

)(

)(ˆ

0

22002

i

iii dd

sdsd

2ˆ i

Page 17: Differentially expressed genes

Permutation test

1

23

4

5

6

normal

cancer

“normal”

“cancer”

53

4

6

1

2

If H0 is correct, then relabeling the arrays will not affect the result of the test statistic.

Page 18: Differentially expressed genes

Permutation p-value

• Permutation-test– For the b-th permutation, b = 1, …, B,

• Permute the n columns (array labels) of the data matrix X.

• Compute test statistics t1,b, …, tm,b for each hypothesis (whether the m-th gene is not differentially expressed).

– The permutation distribution of the test statistic Ti for hypothesis Hi, ti,1, …, ti,B. For two-sided alternative hypotheses, the permutation p-value for hypothesis Hi is

where I(.) is the indicator function, equaling 1 if the condition in parenthesis is true, and 0 otherwise.

B

ttIp

B

bibi

i

1

,*

||||

Page 19: Differentially expressed genes

Permutation p-value

-10 -5 0 5 10

permutationdistribution t-distribution

scaled

H0 is correct H0 is rejected

Page 20: Differentially expressed genes

Multiple hypothesis testing

• Microarray experiments measure expression levels of thousand of genes.

• The hypothesis testing procedure is applied once for each gene.

• A large number of false positives may result.

Cutoff at p = 0.05 for 6000 genes

6000 X 0.05 = 300 genes falsely rejected

If number of real target ~ 100, then most rejected genes are false targets.

Page 21: Differentially expressed genes
Page 22: Differentially expressed genes
Page 23: Differentially expressed genes
Page 24: Differentially expressed genes

Bonferroni correction

• Let m be the total number of tests. Reject hypothesis at /m instead of .

•Strong control of FWER.

•Too conservative.

)1Pr( VFWER

m

jj m

P1

)Pr(

Page 25: Differentially expressed genes

Adjusted p-value

• The adjusted p-value for a single hypothesis Hj is the

nominal level of the entire test procedure at which Hj

would just be rejected, given the values of all test

statistics involved.

• Example: pi = 0.001. If rejecting all hypotheses with

cutoff p < pi leads to FDR = 0.2, then the adjusted p-

value is 0.2.

• The adjusted p-value is dependent on the specific test

procedure.

Page 26: Differentially expressed genes

Adjusted p-value

The adjusted p-value for Bonferroni correction is.

)1,min(~jj mpp

Page 27: Differentially expressed genes

False Discovery Rate

• FWER aims at requiring no false positive at all. This is often too stringent in practice.

• False discovery rate (FDR) is proposed by Benjamini and Hochberg (1995). The idea is to allow a few false positives while enhancing the power.

E(Q) FDR

0. R if ,0

}hypotheses {rejected#positive}/ {false#

Q

V/R Q

Page 28: Differentially expressed genes

Control of FDR, BH-procedure

• Find ordered observed p-values, and

• Let k be the largest i for which

• Reject all H1, …, Hk.

(Benjamini and Hochberg, 1995)

mppp 21

mqipi /*

qFDR

Page 29: Differentially expressed genes

Control of FDR, BH-procedure

• Find ordered observed p-values, and

• Let k be the largest i for which

• Reject all H1, …, Hk.

• Strongly controls FDR

• Also weakly controls FWER

(Benjamini and Hochberg, 1995)

mppp 21

mqipi /*

qFDR

Page 30: Differentially expressed genes

Positive false discovery rate (pFDR)

• Better power than FDR procedure.

• Estimate

)0|/( RRVEpFDR

Page 31: Differentially expressed genes

Estimation of 0(t)

Under the null hypothesis, p-value is uniformly distributed.

Page 32: Differentially expressed genes

Estimation of 0(t)

Procedure:

Choose 0 < < 1

Assume pi is uniformly distributed at p > .

Then estimate as

Page 33: Differentially expressed genes
Page 34: Differentially expressed genes
Page 35: Differentially expressed genes
Page 36: Differentially expressed genes
Page 37: Differentially expressed genes
Page 38: Differentially expressed genes

(Streinsland)

Page 39: Differentially expressed genes

(Streinsland)

Page 40: Differentially expressed genes

SAM

• To test for statistical significance, arrays are randomly permuted.

• For each permutation, compute and rank the result dp(i).

• Calculate • Idea is that for truly differentially expressed

genes, d(i) should be greater than dE(i).

• Select those d(i) that are different from dE(i) more than a threshold level .

))(()( idEid pE

Page 41: Differentially expressed genes

Estimation of FDR in SAM

• R ≈ #(genes called significant)

• V ≈ #(genes called significant in permutation tests)

• FDR ≈ V/R

• Power of SAM is better than fold change criteria.

Page 42: Differentially expressed genes

Data: Apo AI experiment

• 8 mice in treatment group (apo AI knockout); 8 mice in control group (normal)

• 16 arrays: Cy5 – mRNA from trt or control mice; Cy3 – mRNA from pooled control mice.

• 6356 genes.• Want to detect differentially (trt vs control

mice) expressed genes.

Page 43: Differentially expressed genes
Page 44: Differentially expressed genes

Cutoff value vs top genes

• Each metric can be viewed as a monotonic transformation of another.

• The only difference is the cutoff values are different.

• All statistical hypothesis testing methods are equivalent in terms of selecting the top k genes, for a fixed k.