33
Analysis of DNA Methylation and Gene Expression data in Placenta tissue to predict childhood obesity An Integrative Approach Bhatnagar SR 1,2 , Houde A 4,5 , Voisin G 2 , Bouchard L 4,5 , Greenwood CMT 1,2,3 1 Department of Epidemiology, Biostatistics and Occupational Health, McGill University 2 Lady Davis Institute, Jewish General Hospital, Montr´ eal, QC 3 Departments of Oncology and Human Genetics, McGill University 4 Department of Biochemistry, Universit´ e de Sherbrooke, QC 5 ECOGENE-21 and Lipid Clinic, Chicoutimi Hospital, QC sahirbhatnagar.com/talks Poster Session B, # 56

Analysis of DNA methylation and Gene expression to predict childhood obesity

Embed Size (px)

Citation preview

Analysis of DNA Methylation and Gene

Expression data in Placenta tissue to

predict childhood obesityAn Integrative Approach

Bhatnagar SR1,2, Houde A4,5, Voisin G2,Bouchard L4,5, Greenwood CMT1,2,3

1Department of Epidemiology, Biostatistics and Occupational Health, McGill University2Lady Davis Institute, Jewish General Hospital, Montreal, QC

3Departments of Oncology and Human Genetics, McGill University4Department of Biochemistry, Universite de Sherbrooke, QC

5ECOGENE-21 and Lipid Clinic, Chicoutimi Hospital, QC

sahirbhatnagar.com/talksPoster Session B, # 56

Motivating Question # 1

sahirbhatnagar.com Data Integration CHSGM 2015 2 / 25

Motivation

I 1 in 4 adult Canadians and 1 in 10 children are clinically obese.

I Events during pregnancy are suspected to play a role in childhoodobesity → we don’t know about the mechanisms involved

I Children born to women who had a gestational diabetesmellitus-affected pregnancy are more likely to be overweight and obese

I Evidence suggests epigenetic factors are important piece of the puzzle

sahirbhatnagar.com Data Integration CHSGM 2015 3 / 25

Motivating Question # 2

sahirbhatnagar.com Data Integration CHSGM 2015 4 / 25

Motivating Question

sample size

genomic data

25 50

GeneExpression

DNAMethylation

DNAMethylation

GeneExpression

??

?

sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25

Motivating Question

sample size

genomic data

25 50

GeneExpression

DNAMethylation

DNAMethylation

GeneExpression

??

?

sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25

Motivating Question

sample size

genomic data

25 50

GeneExpression

DNAMethylation

DNAMethylation

GeneExpression

??

?

sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25

The Data

sahirbhatnagar.com Data Integration CHSGM 2015 6 / 25

The Data

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y?

sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25

The Data

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y?

sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25

The Data

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y?

sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25

The Data

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y

?

sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25

The Data

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y?

sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25

Summarizing Expression,Methylation and Gestational

Diabetes Phenotype in PlacentaTissue

sahirbhatnagar.com Data Integration CHSGM 2015 8 / 25

Sparse Canonical Correlation Analysis (sCCA)

I CCA requires calculation of(XTX

)−1and

(YTY

)−1

I When p + q >> n, these matrices are singular

I sCCA applies an L1 penalty to the canonical vectors to obtain sparsesolutions (Witten et al., 2009; Parkhomenko et al., 2009)

I Assumes XTX = Ip, YTY = Iq

maximizeu,v uTXTYv

subject to‖u‖2

2 ≤ 1, ‖v‖22 ≤ 1

andP1(u) ≤ λ1, P2(v) ≤ λ2

sahirbhatnagar.com Data Integration CHSGM 2015 9 / 25

Supervised Sparse CCA

Main idea:

1. The features that are most associated with the outcome Q areidentified to form the reduced matrices X and Y

2. sCCA is performed on X and Y

sahirbhatnagar.com Data Integration CHSGM 2015 10 / 25

Importance of Gestational Diabetes Phenotype

0.88

0.90

0.92

0.94

0.96

0.98

# no

n−0

expr

essi

on p

robe

s

# non−0 methylation probes

Cor

rela

tion

Gestational Diabetes Status Used in Sparse CCA

0.88

0.90

0.92

0.94

0.96

0.98

# no

n−0

expr

essi

on p

robe

s# non−0 methylation probes

Cor

rela

tion

Gestational Diabetes Status Not Used

sahirbhatnagar.com Data Integration CHSGM 2015 11 / 25

GO Stat Analysis for Enrichment

I Enrichment Analysis based on non zero vector of 1st component fromthe Supervised sCCA analysis

I Genes associated with inflammatory processes

Table : Top list of enriched GO terms

GOBPID FDR OR E.Count Count Size Term

0002376 < 10−14 2.1 131.6 227 2178 immune system process0006955 < 10−13 2.3 78.7 153 1303 immune response0002252 < 10−9 2.7 34.1 80 565 immune effector process0045087 < 10−8 2.3 49.0 99 811 innate immune response0002682 < 10−8 2.1 66.56 122 1102 regulation of immune system process0002684 < 10−8 2.4 40.1 84 664 positive regulation of immune system process0006952 < 10−8 1.9 84.5 144 1399 defense response0050776 < 10−8 2.3 44.5 90 738 regulation of immune response0050778 < 10−7 2.6 28.5 65 473 positive regulation of immune response0006950 < 10−7 1.6 196.8 271 3258 response to stress

sahirbhatnagar.com Data Integration CHSGM 2015 12 / 25

Summarizing Bodyfat Measures

sahirbhatnagar.com Data Integration CHSGM 2015 13 / 25

Cluster 6 Bodyfat measures in 2 groups

34 14 8 16 7 6 38 30 20 25 13 3 12 11 17 21 39 31 19 37 28 32 18

Zscore BMI

percent fat

subscapularis

bicep

tricep

iliacus

−2 0 2Value

Color Key

sahirbhatnagar.com Data Integration CHSGM 2015 14 / 25

Circle of Correlations

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Variables factor map (PCA)

Dim 1 (50.68%)

Dim

2 (

15.4

1%)

Zscore BMIpercent fat

tricep

bicepsubscapularis

iliacus

sahirbhatnagar.com Data Integration CHSGM 2015 15 / 25

Combining Both Data

sahirbhatnagar.com Data Integration CHSGM 2015 16 / 25

Regression via Elastic Net

ExpressionHT-12 v4p = 46, 889

Methylation

Illumina 450k

p = 375, 561

GestationalDiabetesn = 45GD = 29

Placentan = 45

timeat birth age 5| |

X

7 FatMeasures

Childn = 23GD = 16

Y?

sahirbhatnagar.com Data Integration CHSGM 2015 17 / 25

1st PC as Summary Bodyfat Measure

3

8

32

14294

102

1443

187

12853

124

375563

36

37513

81

338052

197

75115

30

7505

188

67612

196

84503

37

9380

202

751251

2

3

4

data used to predict 1st PC of bodyfat measures

LOO

CV

mea

n sq

uare

d er

ror

data.type

Canonical Variables

Expr+Methy non 0 CCA factors

Expr non 0 CCA factors

Methy non 0 CCA factors

Expr+Methy Filter

Expr Filter low means

Methy Filter low var

Expr+Methy Filter low+t.test

Expr Filter low+t.test

Methy Filter low+t.test

Expr+Methy Filter t.test

Expr Filter t.test

Methy Filter t.test

sahirbhatnagar.com Data Integration CHSGM 2015 18 / 25

Ward Clustering Groups

1

8

22

14294

1

1443

20

12853

331

375563

1

37513

54

338052

6

75115

1

7505

6

67612

7

84503

1

9380

30

751250.0

0.1

0.2

0.3

0.4

0.5

data used to predict Ward clustering groups

LOO

CV

mis

clas

sific

atio

n er

ror

data.type

Canonical Variables

Expr+Methy non 0 CCA factors

Expr non 0 CCA factors

Methy non 0 CCA factors

Expr+Methy Filter

Expr Filter low means

Methy Filter low var

Expr+Methy Filter low+t.test

Expr Filter low+t.test

Methy Filter low+t.test

Expr+Methy Filter t.test

Expr Filter t.test

Methy Filter t.test

sahirbhatnagar.com Data Integration CHSGM 2015 19 / 25

Neuropeptide Y Receptor (NPY1R)

From OMIM:

I One of the most abundant neuropeptides in the mammaliannervous system

I Exhibits a diverse range of important physiologic activities,including effects on food intake

I Have been identified in a variety of tissues, includingplacenta (Herzog et al., 1992).

sahirbhatnagar.com Data Integration CHSGM 2015 20 / 25

Motivating Question #2: My Answer

sample size

genomic data

25 50

GeneExpression

DNAMethylation

DNAMethylation

GeneExpression

sahirbhatnagar.com Data Integration CHSGM 2015 21 / 25

Big Data

Data IntegrationMachine Learning

sahirbhatnagar.com Data Integration CHSGM 2015 22 / 25

Big DataData Integration

Machine Learning

sahirbhatnagar.com Data Integration CHSGM 2015 22 / 25

Big DataData IntegrationMachine Learning

sahirbhatnagar.com Data Integration CHSGM 2015 22 / 25

Smalln Data

sahirbhatnagar.com Data Integration CHSGM 2015 23 / 25

Acknowledgements

I Celia Greenwood andMathieu Blanchette

I Greg Voisin, Andree-AnneHoude, Luigi Bouchard

I All the mothers and childrenthat took part in this study

I You

sahirbhatnagar.com Data Integration CHSGM 2015 24 / 25

References

Principal component analysis plots and beamer template. URLhttp://gastonsanchez.com/.

Elena Parkhomenko, David Tritchler, and Joseph Beyene. Sparse canonicalcorrelation analysis with application to genomic data integration.Statistical Applications in Genetics and Molecular Biology, 8(1):1–34,2009.

Daniela M Witten and Robert J Tibshirani. Extensions of sparse canonicalcorrelation analysis with applications to genomic data. Statisticalapplications in genetics and molecular biology, 8(1):1–27, 2009.

Daniela M Witten, Robert Tibshirani, and Trevor Hastie. A penalizedmatrix decomposition, with applications to sparse principal componentsand canonical correlation analysis. Biostatistics, page kxp008, 2009.

sahirbhatnagar.com Data Integration CHSGM 2015 25 / 25