20
Robust estimation of the relationship between DNA copy number and gene expression Pierre Neuvial Laboratoire Statistique et G´ enome Universit´ e d’ ´ Evry Val d’Essonne UMR CNRS 8071 – USC INRA Joint work with Antoine Chambaz and Mark van der Laan P. Neuvial (Stat & G´ enome) Associating copy number and expression June 8, 2011 1 / 20

Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Robust estimation of the relationship between DNAcopy number and gene expression

Pierre Neuvial

Laboratoire Statistique et GenomeUniversite d’Evry Val d’EssonneUMR CNRS 8071 – USC INRA

Joint work with Antoine Chambaz and Mark van der Laan

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 1 / 20

Page 2: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Outline

1 Association between DNA copy number and gene expression

2 Targeted maximum likelihood estimation of association

3 Results

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 2 / 20

Page 3: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

Outline

1 Association between DNA copy number and gene expression

2 Targeted maximum likelihood estimation of association

3 Results

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 3 / 20

Page 4: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

Caracteristics of tumor cells

Hanahan & Weinberg (2000)

self-sufficiency in growth factors insensibility to anti-growth signals no apoptosis

angiogenesis limitless replication potential tissue invasion and metastases

Enabled by genetic instability of tumor cells

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 4 / 20

Page 5: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

Changes in cancer cells at the molecular level

Different levels of biological information

DNA copy number

gene expression

DNA methylation

Quantitative measurements can beobtained from DNA microarrays

Goal: find genes that drive tumorigenesis

to better understand cancer cells

to help find new treatments

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 5 / 20

Page 6: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

What gene-level data look like187 GBM (brain cancer) samples from the Cancer Genome Atlas (TCGA)

DNA methylation

2 4 6 8 10 12 14

cor=−0.57

0.02

0.04

0.06

0.08

cor=−0.54

2468

101214

●● ●

●●

●●●●

●●

●●

●●●●

● ●

●●

●●●

●●●

●●

DNA copy number

cor=0.87

0.02

0.04

0.06

0.08

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

−1 0 1 2

−1

0

1

2gene expression

EGFR

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 6 / 20

Page 7: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

Which genes are drivers ?

“Driver genes” are expected to show some association between DNAcopy number and gene expression

⇒ Test for association, and quantify it

Methods for genome-wide scanning for gene-level associations

linear correlations

differential expression (T -tests) between copy number states

canonical correlation analyses

Issues with existing methods

they essentially identify genes that were already known to be implied

associations may be non linear

DNA methylation may down-regulate gene expression

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 7 / 20

Page 8: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Association between DNA copy number and gene expression

Defining “gene-level data”

In the preceding plot:

DNA methylation (W) : proportion of “methylated” signal at a CpG locusin the gene’s promoter region.

DNA copy number (X) : smoothed normalized total copy number relativeto a set of reference samples.

Expression (Y) : “unified” gene expression level across 3 platforms

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 8 / 20

Page 9: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Outline

1 Association between DNA copy number and gene expression

2 Targeted maximum likelihood estimation of association

3 Results

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 9 / 20

Page 10: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Definition of a parameter of interest

Observation O = (W ,X ,Y ) ∼ P ∈M for a given gene:

W : DNA methylation

X : DNA copy number; X = 0: copy neutral state (2 copies)

Y : gene expression

M: non-parametric set of all possible data-gen. distributions of O

Parameter of interest (defined for all P ∈M)Ψ(P) = arg min

β∈REP

[(EP(Y |X ,W )− EP(Y |X = 0,W )− βX )2

]In a semi-parametric model whereEP(Y |X ,W ) = EP(Y |X = 0,W ) + βX , we have Ψ(P) = β.

By contrast, Ψ :M→ R is defined universally

Ψ(P) is a non-parametric variable importance measure of the “effect”of X (continuous) on Y (continuous) accounting for W

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 10 / 20

Page 11: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Comment on the parameter of interest

Let θ(P)(X ,Y ) = EP(Y |X ,W ), then

Ψ(P) = corr(X , rP(X ,W ))

√EP [rP(X ,W )2]

EP [X 2],

where rP(X ,W ) = θ(P)(X ,W )− θ(P)(0,W )

Case where X is binary

If X ∈ {0, 1}, then

Ψ(P) = EP [(θP(1,W )− θP(0,W ))h(W )]

with weight h(W ) = P(X = 1|W )/P(X = 1)

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 11 / 20

Page 12: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Targeted maximum likelihood methods: motivation

Goal: estimate a parameter Ψ(P) from observations arising from adistribution P. Ψ is known.

Naive strategy

1 Estimate P using P

2 Plug-in: Ψ(P)

Our target parameter is Ψ(P), not P !

P aims at balancing bias and variance for the whole distribution

Ψ(P) does not balance bias and variance for Ψ(P)

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 12 / 20

Page 13: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Targeted maximum likelihood estimation (TMLE)

From an initial estimate P0n :

1 Create a model P0n(ε) parametrized by ε ∈ R whose score is the

efficient influence curve of Ψ at P0n

2 Estimate ε using maximum likelihood: ε0n

3 Update accordingly: P1n = P0

n(ε0n)

Repeat as many times as necessary... hence final estimate P?n

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 13 / 20

Page 14: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Targeted maximum likelihood estimation of association

Statistical properties

P0: true distribution of O

Consistency (double robustness)

TMLE is consistent if one of the following conditions holds:

θ(P?n)(0, ·) consistently estimates true θ(P0)(0, ·)

EP?n

(X |W ) and P?n(X = 0|W ) consistently estimate EP0(X |W ) and

P0(X = 0|W )

Asymptotic normality

Under the same conditions, TMLE is asymptotically GaussianWe can compute asymptotic p-values and thus rank genes

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 14 / 20

Page 15: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Outline

1 Association between DNA copy number and gene expression

2 Targeted maximum likelihood estimation of association

3 Results

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 15 / 20

Page 16: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Simulation strategy

Assumptions:

up to 3 copy number classes: normal regions, and regions of copynumber gains and losses

in normal regions, expression is negatively correlated with methylation

in regions of copy number alteration, copy number and expression arepositively correlated

GBM data used as a baseline for simulation:

Sample name Methylation Copy number Expression

TCGA-02-0001 0.05 2.72 -0.46TCGA-02-0003 0.01 9.36 1.25

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 16 / 20

Page 17: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Simulated data set mimics real data set

Real data (GBM, n=187) Simulated data (n=200)

DNA methylation

2 4 6 8 10 12 14

cor=−0.57

0.02

0.04

0.06

0.08

cor=−0.54

2468

101214

●● ●

●●

●●●●

●●

●●

●●●●

● ●

●●

●●●

●●●

●●

DNA copy number

cor=0.87

0.02

0.04

0.06

0.08

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

−1 0 1 2

−1

0

1

2gene expression

EGFR

DNA methylation

5 10 15 20

−0.66

0.00

0.02

0.04

0.06

0.08

0.10

−0.70

5

10

15

20

● ●

● ●

● ●

●●●

●● ●●

● ●● ●

●●

● ● ●

●●

● ●●● ●

● ●

● ●●

●●● ●

● ●

●● ●

● ●●

●●

●●

●●

●●

●●

●● ●● ● ●

● ●

● ●●

●●

●●

●●

●●

●● ● ●●

●● ●

●●● ●●

●●

● ●

●●

●●

●●

DNA copy number

0.80

0.00

0.02

0.04

0.06

0.08

0.10

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2

−1 0 1 2

−2

−1

0

1

2gene expression

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 17 / 20

Page 18: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Simulated data: TMLE corrects initial estimation

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 18 / 20

Page 19: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Real data analysis : TCGA OV data set

DNA methylation

−1.

0

−0.

5

0.0

0.5

cor=0.051

0.0

0.2

0.4

0.6

0.8

cor=−0.46

−1.0

−0.5

0.0

0.5

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

DNA copy number

cor=0.23

0.0

0.2

0.4

0.6

0.8

●●●

●●

●●● ●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●●

● ●

●● ●

●●

●●

●●

●●

●●●

●●

●● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●●

●●●●

●●

●● ●

● ●

●●

●●

●●

●●●

● ●

●●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

−2

−1 0 1 2

−2

−1

0

1

2gene expression

STAT5A pcor=0.286

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 19 / 20

Page 20: Robust estimation of the relationship between DNA …helios.mi.parisdescartes.fr/~chambaz/Atelier209/14...Robust estimation of the relationship between DNA copy number and gene expression

Results

Thanks

Antoine Chambaz

Mark van der Laan

Terry Speed

The Cancer Genome Atlas Research Network

P. Neuvial (Stat & Genome) Associating copy number and expression June 8, 2011 20 / 20