Transformations… what for? which one?

Transformations…

what for?which one?

Wolfgang HuberDiv.Molecular Genome

AnalysisDKFZ Heidelberg

( ,...)log

( ,...)

x f xx f x

Log-ratio with/without background correction

Shrunken log-ratio (BHM)

Variance stabilized log-ratio(=generalized log-ratio, “glog”)

Microarray intensities x1,…,xn

How do you like to think about (interprete) it?

How do you estimate the parameters?

What comes out (the “bottom line”?)

ratios and fold changes

Fold changes are useful to describe continuous changes in expression

10001500

But what if the gene is “off” (below detection limit) in one condition?

Many interesting genes will be off in some of the conditions of interest

1.If you want expression measure (“net normalized spot intensity”) to be an unbiased estimator of abundance

many values 0 need something more than

(log)ratio

2. If you let expression measure be biased (always>0)

can keep ratios. how do you choose the bias?

Ratios are scale-free:

But there is (at least) one absolute scale in the data:

Can we use this to construct useful functions

( , ) / log( / )i j i j i jf y y y y or y y

( | ( ) 0)i ibg sd Y E Y

( , , ) ?i j bgf y y

How to compare microarray intensities with each other?

How to incorporate measurement uncertainty (“variance”)?

How to simultaneously and consistently deal with calibration (“normalization”)?

In the following:

Sources of variationamount of RNA in the biopsy efficiencies of-RNA extraction-reverse transcription -labeling-fluorescent detection

probe purity and length distributionspotting efficiency, spot sizecross-/unspecific hybridizationstray signal

Calibration Error model

Systematic o similar effect on many measurementso corrections can be estimated from data

Stochastico too random to be ex-plicitely accounted for o remain as “noise”

iik ika a

ai per-sample offset

ik ~ N(0, bi2s1

“additive noise”

bi per-sample normalization factor

bk sequence-wise probe efficiency

ik ~ N(0,s22)

“multiplicative noise”

exp( )iik k ikb b b

ik ik ik ky a b x

modeling ansatz

measured intensity = offset + gain true abundance

The two-component model

raw scale log scale

“additive” noise

“multiplicative” noise

B. Durbin, D. Rocke, JCB 2001

variance stabilizing transformations

Xu a family of random variables with

EXu=u, VarXu=v(u). Define

var f(Xu ) independent of u

f x duu

derivation: linear approximation

0 20000 40000 60000

raw scale

variance stabilizing transformations

f x duu

1.) constant variance (‘additive’)

2( ) sv u f u

2.) constant CV (‘multiplicative’)

2( ) logv u u f u

4.) additive and multiplicative

2 2 00( ) ( ) arsinh

u uv u u u s f

3.) offset2

0 0( ) ( ) log( )v u u u f u u

the “glog” transformation

intensity-200 0 200 400 600 800 1000

- - - f(x) = log(x)

——— hs(x) = asinh(x/s)

2arsinh( ) log 1

arsinh log log2 0limx

P. Munson, 2001

D. Rocke & B. Durbin, ISMB 2002

parameter estimationparameter estimation

2Yarsinh , (0, )iki

k ki kii

o maximum likelihood estimator: straightforward – but sensitive to deviations from normality

o model holds for genes that are unchanged; differentially transcribed genes act as outliers.

o robust variant of ML estimator, à la Least Trimmed Sum of Squares regression.

o works as long as <50% of genes are differentially transcribed

ii k i k i ka a L a i p e r - s a m p l e o ff s e t

L i k l o c a l b a c k g r o u n d p r o v i d e d b y i m a g e a n a l y s i s

i k ~ N ( 0 , b i2 s 1

“ a d d i t i v e n o i s e ”

b i p e r - s a m p l en o r m a l i z a t i o n f a c t o r

b k s e q u e n c e - w i s el a b e l i n g e ffi c i e n c y

i k ~ N ( 0 , s 22 )

“ m u l t i p l i c a t i v e n o i s e ”

e x p ( )ii k k i kb b b

i k i k i k i ky a b x

m e a s u r e d i n t e n s i t y = o ff s e t + g a i n * t r u e a b u n d a n c e

Least trimmed sum of squares regression

0 2 4 6 8

y 2n/2

( ) ( )i=1

( )i iy f x

minimize

- least sum of squares - least trimmed sum of squares

P. Rousseeuw, 1980s

evaluation: effects of different data transformations

rank(average)

Normality: QQ-plot

evaluation: sensitivity / specificity in detecting differential abundance

o Data: paired tumor/normal tissue from 19 kidney cancers, in color flip duplicates on 38 cDNA slides à 4000 genes.

o 6 different strategies for normalization and quantification of differential abundance

o Calculate for each gene & each method: t-statistics, permutation-p

o For threshold , compare the number of genes the different methods find, #{pi | pi}

evaluation: comparison of methodsevaluation: comparison of methods

more accurate quantification of differential expression higher sensitivity / specificity

one-sided test for up one-sided test for down

evaluation: a benchmark for Affymetrix genechip expression measures

o Data: Spike-in series: from Affymetrix 59 x HGU95A, 16 genes, 14 concentrations, complex backgroundDilution series: from GeneLogic 60 x HGU95Av2,liver & CNS cRNA in different proportions and amounts

o Benchmark: 15 quality measures regarding-reproducibility-sensitivity -specificity Put together by Rafael Irizarry (Johns Hopkins) http://affycomp.biostat.jhsph.edu

affycomp results (28 Sep 2003) good

ROC curves

Stratification

collaboration with R. Irizarry

log log ( )i ii

Y x w s

position- and sequence-specific effects wi(s):Naef et al., Phys Rev E 68 (2003)

glog versus "sliding z-score"

sliding z-score

Availability

o implementation in Ro open source package

vsn on www.bioconductor.org

o Bioconductor is an international collaboration on open source software for bioinformatics and statistical omics

What to do with the gene lists: the functional genomics pipeline @

High-throughput transcripto

me sequencing: clones with unannotated full length

functional characterizati

Neoplastic diseases

association of mRNA profiles with- genetic aberrations- histopathology- clinical behavior

HT functional assays (S. Wiemann, D. Arlt)

Library of "unknow

n" transcrip

ts expression cloneBrdU

incorporation

GFP-ORF- protein

DAPI: identification CFP: expression

BrdU: proliferation

Image segmentation

and quantification

proliferation

+activator -inhibitor

automated

microscopeRainer Pepperkok, EMBL

DAPI ORF-YFP Anti-BrdU/Cy5

Detection of modulators of cell proliferation

overlay

YFP – Cy5

Dorit Arlt

YFP channel Cy5 channel 72.0 761.0 71.0 684.1 119.7 779.0 87.3 820.2 149.5 645.6 70.2 536.1 84.7 799.5 103.1 912.8 81.0 916.7 2621.8 267.6 74.1 766.2 156.8 866.6 169.0 819.8 105.5 757.7 156.0 367.8 76.5 746.2 135.2 731.2 86.2 567.3 77.7 896.3 92.6 1095.4 104.6 633.3 481.2 567.7 539.0 663.9 95.0 726.2 156.7 842.1

Measurement of fluorescence intensities

231.6,

by automated image analysis

SMPCell

Statistical analysis of cellular assay data

0 50 100 150 200 250

transfected cells

control cells

1 2 3 4 5 6 7 8 9 10 11 12

dorit6

detect transfection

effect:

inh (p=10-8)

Plate summary plot

Cellular assays: challenges for statisticians

o Image analysis:

pattern recognition, classification

o Low-level analysis

what are good models for calibration, “normalization”, data transformation?

o High-level analysis

models for the dependence of cellular processes on over-/underexpression of genes

connect results from different assays, microarray data

Summaryo log-ratio log: what about genes that are not expressed in some of

the conditions of interest?

o generalized log-ratio h: a useful extrapolation- interpretability- sensitivity- specificity- computational convenience

o what to do with the gene lists? systematic (high throughput) functional assays

Acknowledgements

DKFZ HeidelbergMolecular Genome

Analysis

Annemarie PoustkaHolger SültmannAndreas BuneßMarkus RuschhauptKatharina FinisJörg SchneiderKlaus Steiner

Stefan WiemannDorit Arlt

MPI Molekulare GenetikAnja von HeydebreckMartin Vingron

Uni HeidelbergGünther Sawitzki

DFCI HarvardRobert Gentleman

UMC LeidenJudith Boer

RZPDAnke Schroth Bernd Korn

EMBLUrban Liebel

...and many more!

Models are never correct, but some are useful

True relationship:

1 2 22

(0, 0.15 )y x x N

Model: linear dependence

Model: quadratic dependence

raw scale log glog

difference

log-ratio

generalized

log-ratio

constant partvariance:

proportional part

variance stabilization

ratio compression

Yue et al.,

(Incyte Genomics

) NAR (2001) 29

Transformations… what for? which one?

Documents

Warm-Up: List the types of transformations and what each one means

A butterfly? What transformations can you see?. Skateboarding? What transformations can you see?

1 Student Objective To identify energy transformations To identify energy transformations Energy transfer is when the energy is passes on. *What is energy

10 Properties of Transformations - Glencoeglencoe.com/.../ls1_c3_explore_properties_of_transformations.pdf · Properties of Transformations ... 1. Describe what you ... 7. MAKE A

Colorado Teacher-Authored Instructional Unit Sample ... · PDF fileUnit Title: Energy Transformations in Living ... Energy Transformations in Living Things ... • What might be the

Computer Graphics 3D Transformations. 3D Translation Remembering 2D transformations -> 3x3 matrices, take a wild guess what happens to 3D transformations

1. Les transformations physiques Les transformations chimiques Les transformations nucléaires Les transformations physiques Les transformations chimiques

Introduction to Bidirectional Transformations · 2018. 4. 22. · book entail two unidirectional transformations: a parser, which reads the tex-tual representation and constructs

Today’s Lesson: What: transformations (dilations)... Why: To perform dilations of figures on the coordinate plane. What: transformations (dilations)

Embedded librarianship - What Trends and Transformations Have We Seen

Transformations - Center For Teaching & Learningcontent.njctl.org/courses/math/geometry-2015-16/transformations/... · Geometry – Transformations ~1~ NJCTL.org Transformations Transformations:

Lesson 12: What Are Similarity Transformations, and Why Do ... · GEOMETRY NYS COMMON CORE MATHEMATICS CURRICULUM Lesson 12 Lesson 12: What Are Similarity Transformations, and Why

WHICH TRANSFORMATIONS DO YOU KNOW? ROTATION WHICH TRANSFORMATIONS DO YOU KNOW? ROTATION

mrlamarmath.weebly.com€¦ · Web viewUnit 1. 8th Grade Math. UNIT 1. Transformations, Congruence, and Similarity. Unit Essential Questions. What are the transformations? How do

Transformations Worksheet Name: Date...Dec 06, 2016 · 21. Three transformations will be performed on triangle ABC. Which set of transformations will always produce a congruent triangle?

TRANSFORMATIONS AND THE COORDINATE€¦ · In this chapter, we will review what you know about the coordinate plane and you will study transformationsand symmetry, which are common

1. Les transformations de la matière. Les 3 types de transformations Les transformations physiques Les transformations chimiques Les transformations nucléaires

Aim: What are transformations?...Oct 15, 2019 · Geometry Transformation Intro.notebook 1 Oct 910:00 AM Aim: What are transformations? Do Now: Write down a list of commands for Ms

PHASE TRANSFORMATIONS in METALS and ALLOYS · Phase Transformation in Metals Development of microstructure in both single- and two-phase alloys involves phase transformations-which

Projective Transformations. Geometric Transformations