35
Transformations what for? which one? Wolfgang Huber Div.Molecular Genome Analysis DKFZ Heidelberg

Transformations… what for? which one?

  • Upload
    keisha

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Transformations… what for? which one?. Wolfgang Huber Div.Molecular Genome Analysis DKFZ Heidelberg.  Microarray intensities x 1 ,…,x n. Log-ratio with/without background correction Shrunken log-ratio (BHM) Variance stabilized log-ratio (=generalized log-ratio, “glog”). - PowerPoint PPT Presentation

Citation preview

Page 1: Transformations…  what for? which one?

Transformations…

what for?which one?

Wolfgang HuberDiv.Molecular Genome

AnalysisDKFZ Heidelberg

Page 2: Transformations…  what for? which one?

2 2

2 2

( ,...)log

( ,...)

log

log

i i

j j

i

j

i i i

j j j

x f xx f x

xx

x x c

x x c

Log-ratio with/without background correction

Shrunken log-ratio (BHM)

Variance stabilized log-ratio(=generalized log-ratio, “glog”)

Microarray intensities x1,…,xn

How do you like to think about (interprete) it?

How do you estimate the parameters?

What comes out (the “bottom line”?)

Page 3: Transformations…  what for? which one?

ratios and fold changes

Fold changes are useful to describe continuous changes in expression

10001500

3000

x3

x1.5

A B C

0200

3000

?

?

A B C

But what if the gene is “off” (below detection limit) in one condition?

Page 4: Transformations…  what for? which one?

ratios and fold changes

Many interesting genes will be off in some of the conditions of interest

1.If you want expression measure (“net normalized spot intensity”) to be an unbiased estimator of abundance

many values 0 need something more than

(log)ratio

2. If you let expression measure be biased (always>0)

can keep ratios. how do you choose the bias?

Page 5: Transformations…  what for? which one?

ratios and fold changes

Ratios are scale-free:

But there is (at least) one absolute scale in the data:

Can we use this to construct useful functions

( , ) / log( / )i j i j i jf y y y y or y y

( | ( ) 0)i ibg sd Y E Y

( , , ) ?i j bgf y y

Page 6: Transformations…  what for? which one?

How to compare microarray intensities with each other?

How to incorporate measurement uncertainty (“variance”)?

How to simultaneously and consistently deal with calibration (“normalization”)?

In the following:

Page 7: Transformations…  what for? which one?

Sources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-fluorescent detection

probe purity and length distributionspotting efficiency, spot sizecross-/unspecific hybridizationstray signal

Calibration Error model

Systematic o similar effect on many measurementso corrections can be estimated from data

Stochastico too random to be ex-plicitely accounted for o remain as “noise”

Page 8: Transformations…  what for? which one?

iik ika a

ai per-sample offset

ik ~ N(0, bi2s1

2)

“additive noise”

bi per-sample normalization factor

bk sequence-wise probe efficiency

ik ~ N(0,s22)

“multiplicative noise”

exp( )iik k ikb b b

ik ik ik ky a b x

modeling ansatz

measured intensity = offset + gain true abundance

Page 9: Transformations…  what for? which one?

The two-component model

raw scale log scale

“additive” noise

“multiplicative” noise

B. Durbin, D. Rocke, JCB 2001

Page 10: Transformations…  what for? which one?

variance stabilizing transformations

Xu a family of random variables with

EXu=u, VarXu=v(u). Define

var f(Xu ) independent of u

1( )

v( )

x

f x duu

derivation: linear approximation

Page 11: Transformations…  what for? which one?

0 20000 40000 60000

8.0

8.5

9.0

9.5

10

.01

1.0

raw scale

tra

nsf

orm

ed

sca

le

variance stabilizing transformations

f(x)

x

Page 12: Transformations…  what for? which one?

variance stabilizing transformations

1( )

v( )

x

f x duu

1.) constant variance (‘additive’)

2( ) sv u f u

2.) constant CV (‘multiplicative’)

2( ) logv u u f u

4.) additive and multiplicative

2 2 00( ) ( ) arsinh

u uv u u u s f

s

3.) offset2

0 0( ) ( ) log( )v u u u f u u

Page 13: Transformations…  what for? which one?

the “glog” transformation

intensity-200 0 200 400 600 800 1000

- - - f(x) = log(x)

——— hs(x) = asinh(x/s)

2arsinh( ) log 1

arsinh log log2 0limx

x x x

x x

P. Munson, 2001

D. Rocke & B. Durbin, ISMB 2002

Page 14: Transformations…  what for? which one?

parameter estimationparameter estimation

2Yarsinh , (0, )iki

k ki kii

aN c

b

:

o maximum likelihood estimator: straightforward – but sensitive to deviations from normality

o model holds for genes that are unchanged; differentially transcribed genes act as outliers.

o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression.

o works as long as <50% of genes are differentially transcribed

ii k i k i ka a L a i p e r - s a m p l e o ff s e t

L i k l o c a l b a c k g r o u n d p r o v i d e d b y i m a g e a n a l y s i s

i k ~ N ( 0 , b i2 s 1

2 )

“ a d d i t i v e n o i s e ”

b i p e r - s a m p l en o r m a l i z a t i o n f a c t o r

b k s e q u e n c e - w i s el a b e l i n g e ffi c i e n c y

i k ~ N ( 0 , s 22 )

“ m u l t i p l i c a t i v e n o i s e ”

e x p ( )ii k k i kb b b

i k i k i k i ky a b x

m e a s u r e d i n t e n s i t y = o ff s e t + g a i n * t r u e a b u n d a n c e

Page 15: Transformations…  what for? which one?

Least trimmed sum of squares regression

Least trimmed sum of squares regression

0 2 4 6 8

02

46

8

x

y 2n/2

( ) ( )i=1

( )i iy f x

minimize

- least sum of squares - least trimmed sum of squares

P. Rousseeuw, 1980s

Page 16: Transformations…  what for? which one?

evaluation: effects of different data transformations

evaluation: effects of different data transformations

diff

ere

nce r

ed

-g

reen

rank(average)

Page 17: Transformations…  what for? which one?

Normality: QQ-plot

Page 18: Transformations…  what for? which one?

evaluation: sensitivity / specificity in detecting differential abundance

evaluation: sensitivity / specificity in detecting differential abundance

o Data: paired tumor/normal tissue from 19 kidney cancers, in color flip duplicates on 38 cDNA slides à 4000 genes.

o 6 different strategies for normalization and quantification of differential abundance

o Calculate for each gene & each method: t-statistics, permutation-p

o For threshold , compare the number of genes the different methods find, #{pi | pi}

Page 19: Transformations…  what for? which one?

evaluation: comparison of methodsevaluation: comparison of methods

more accurate quantification of differential expression higher sensitivity / specificity

more accurate quantification of differential expression higher sensitivity / specificity

one-sided test for up one-sided test for down

Page 20: Transformations…  what for? which one?

evaluation: a benchmark for Affymetrix genechip expression measures

o Data: Spike-in series: from Affymetrix 59 x HGU95A, 16 genes, 14 concentrations, complex backgroundDilution series: from GeneLogic 60 x HGU95Av2,liver & CNS cRNA in different proportions and amounts

o Benchmark: 15 quality measures regarding-reproducibility-sensitivity -specificity Put together by Rafael Irizarry (Johns Hopkins) http://affycomp.biostat.jhsph.edu

Page 21: Transformations…  what for? which one?

affycomp results (28 Sep 2003) good

bad

Page 22: Transformations…  what for? which one?

ROC curves

Page 23: Transformations…  what for? which one?

Stratification

i

collaboration with R. Irizarry

25

1

log log ( )i ii

Y x w s

wi

position- and sequence-specific effects wi(s):Naef et al., Phys Rev E 68 (2003)

Page 24: Transformations…  what for? which one?

glog versus "sliding z-score"

sliding z-score

Page 25: Transformations…  what for? which one?

Availability

o implementation in Ro open source package

vsn on www.bioconductor.org

o Bioconductor is an international collaboration on open source software for bioinformatics and statistical omics

Page 26: Transformations…  what for? which one?

What to do with the gene lists: the functional genomics pipeline @

DKFZ

High-throughput transcripto

me sequencing: clones with unannotated full length

ORFs

functional characterizati

on

Neoplastic diseases

association of mRNA profiles with- genetic aberrations- histopathology- clinical behavior

Page 27: Transformations…  what for? which one?

HT functional assays (S. Wiemann, D. Arlt)

Library of "unknow

n" transcrip

ts expression cloneBrdU

incorporation

GFP-ORF- protein

DAPI: identification CFP: expression

BrdU: proliferation

Image segmentation

and quantification

proliferation

+activator -inhibitor

automated

microscopeRainer Pepperkok, EMBL

Page 28: Transformations…  what for? which one?

DAPI ORF-YFP Anti-BrdU/Cy5

Detection of modulators of cell proliferation

overlay

YFP – Cy5

Dorit Arlt

YFP channel Cy5 channel 72.0 761.0 71.0 684.1 119.7 779.0 87.3 820.2 149.5 645.6 70.2 536.1 84.7 799.5 103.1 912.8 81.0 916.7 2621.8 267.6 74.1 766.2 156.8 866.6 169.0 819.8 105.5 757.7 156.0 367.8 76.5 746.2 135.2 731.2 86.2 567.3 77.7 896.3 92.6 1095.4 104.6 633.3 481.2 567.7 539.0 663.9 95.0 726.2 156.7 842.1

Measurement of fluorescence intensities

68.5,

231.6,

80.9,

-4.8

by automated image analysis

SMPCell

Page 29: Transformations…  what for? which one?

activ

atio

n in

hib

ition

Statistical analysis of cellular assay data

0 50 100 150 200 250

0.0

00

0.0

02

0.0

04

0.0

06

0.0

08

0.0

10

0.0

12

brdU

transfected cells

control cells

1 2 3 4 5 6 7 8 9 10 11 12

A

B

C

D

E

F

G

H

dorit6

-6-4

-20

24

6

detect transfection

effect:

inh (p=10-8)

Plate summary plot

Page 30: Transformations…  what for? which one?

Cellular assays: challenges for statisticians

o Image analysis:

pattern recognition, classification

o Low-level analysis

what are good models for calibration, “normalization”, data transformation?

o High-level analysis

models for the dependence of cellular processes on over-/underexpression of genes

connect results from different assays, microarray data

Page 31: Transformations…  what for? which one?

Summaryo log-ratio log: what about genes that are not expressed in some of

the conditions of interest?

o generalized log-ratio h: a useful extrapolation- interpretability- sensitivity- specificity- computational convenience

o what to do with the gene lists? systematic (high throughput) functional assays

Page 32: Transformations…  what for? which one?

Acknowledgements

DKFZ HeidelbergMolecular Genome

Analysis

Annemarie PoustkaHolger SültmannAndreas BuneßMarkus RuschhauptKatharina FinisJörg SchneiderKlaus Steiner

Stefan WiemannDorit Arlt

MPI Molekulare GenetikAnja von HeydebreckMartin Vingron

Uni HeidelbergGünther Sawitzki

DFCI HarvardRobert Gentleman

UMC LeidenJudith Boer

RZPDAnke Schroth Bernd Korn

EMBLUrban Liebel

...and many more!

Page 33: Transformations…  what for? which one?

Models are never correct, but some are useful

True relationship:

1 2 22

(0, 0.15 )y x x N

Model: linear dependence

Model: quadratic dependence

Page 34: Transformations…  what for? which one?

raw scale log glog

difference

log-ratio

generalized

log-ratio

constant partvariance:

proportional part

variance stabilization

Page 35: Transformations…  what for? which one?

ratio compression

Yue et al.,

(Incyte Genomics

) NAR (2001) 29

e41