Upload
sahirbhatnagar
View
132
Download
1
Tags:
Embed Size (px)
Citation preview
Analysis of DNA Methylation and Gene
Expression data in Placenta tissue to
predict childhood obesityAn Integrative Approach
Bhatnagar SR1,2, Houde A4,5, Voisin G2,Bouchard L4,5, Greenwood CMT1,2,3
1Department of Epidemiology, Biostatistics and Occupational Health, McGill University2Lady Davis Institute, Jewish General Hospital, Montreal, QC
3Departments of Oncology and Human Genetics, McGill University4Department of Biochemistry, Universite de Sherbrooke, QC
5ECOGENE-21 and Lipid Clinic, Chicoutimi Hospital, QC
sahirbhatnagar.com/talksPoster Session B, # 56
Motivation
I 1 in 4 adult Canadians and 1 in 10 children are clinically obese.
I Events during pregnancy are suspected to play a role in childhoodobesity → we don’t know about the mechanisms involved
I Children born to women who had a gestational diabetesmellitus-affected pregnancy are more likely to be overweight and obese
I Evidence suggests epigenetic factors are important piece of the puzzle
sahirbhatnagar.com Data Integration CHSGM 2015 3 / 25
Motivating Question
sample size
genomic data
25 50
GeneExpression
DNAMethylation
DNAMethylation
GeneExpression
??
?
sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25
Motivating Question
sample size
genomic data
25 50
GeneExpression
DNAMethylation
DNAMethylation
GeneExpression
??
?
sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25
Motivating Question
sample size
genomic data
25 50
GeneExpression
DNAMethylation
DNAMethylation
GeneExpression
??
?
sahirbhatnagar.com Data Integration CHSGM 2015 5 / 25
The Data
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y?
sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25
The Data
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y?
sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25
The Data
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y?
sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25
The Data
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y
?
sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25
The Data
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y?
sahirbhatnagar.com Data Integration CHSGM 2015 7 / 25
Summarizing Expression,Methylation and Gestational
Diabetes Phenotype in PlacentaTissue
sahirbhatnagar.com Data Integration CHSGM 2015 8 / 25
Sparse Canonical Correlation Analysis (sCCA)
I CCA requires calculation of(XTX
)−1and
(YTY
)−1
I When p + q >> n, these matrices are singular
I sCCA applies an L1 penalty to the canonical vectors to obtain sparsesolutions (Witten et al., 2009; Parkhomenko et al., 2009)
I Assumes XTX = Ip, YTY = Iq
maximizeu,v uTXTYv
subject to‖u‖2
2 ≤ 1, ‖v‖22 ≤ 1
andP1(u) ≤ λ1, P2(v) ≤ λ2
sahirbhatnagar.com Data Integration CHSGM 2015 9 / 25
Supervised Sparse CCA
Main idea:
1. The features that are most associated with the outcome Q areidentified to form the reduced matrices X and Y
2. sCCA is performed on X and Y
sahirbhatnagar.com Data Integration CHSGM 2015 10 / 25
Importance of Gestational Diabetes Phenotype
0.88
0.90
0.92
0.94
0.96
0.98
# no
n−0
expr
essi
on p
robe
s
# non−0 methylation probes
Cor
rela
tion
Gestational Diabetes Status Used in Sparse CCA
0.88
0.90
0.92
0.94
0.96
0.98
# no
n−0
expr
essi
on p
robe
s# non−0 methylation probes
Cor
rela
tion
Gestational Diabetes Status Not Used
sahirbhatnagar.com Data Integration CHSGM 2015 11 / 25
GO Stat Analysis for Enrichment
I Enrichment Analysis based on non zero vector of 1st component fromthe Supervised sCCA analysis
I Genes associated with inflammatory processes
Table : Top list of enriched GO terms
GOBPID FDR OR E.Count Count Size Term
0002376 < 10−14 2.1 131.6 227 2178 immune system process0006955 < 10−13 2.3 78.7 153 1303 immune response0002252 < 10−9 2.7 34.1 80 565 immune effector process0045087 < 10−8 2.3 49.0 99 811 innate immune response0002682 < 10−8 2.1 66.56 122 1102 regulation of immune system process0002684 < 10−8 2.4 40.1 84 664 positive regulation of immune system process0006952 < 10−8 1.9 84.5 144 1399 defense response0050776 < 10−8 2.3 44.5 90 738 regulation of immune response0050778 < 10−7 2.6 28.5 65 473 positive regulation of immune response0006950 < 10−7 1.6 196.8 271 3258 response to stress
sahirbhatnagar.com Data Integration CHSGM 2015 12 / 25
Cluster 6 Bodyfat measures in 2 groups
34 14 8 16 7 6 38 30 20 25 13 3 12 11 17 21 39 31 19 37 28 32 18
Zscore BMI
percent fat
subscapularis
bicep
tricep
iliacus
−2 0 2Value
Color Key
sahirbhatnagar.com Data Integration CHSGM 2015 14 / 25
Circle of Correlations
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
Variables factor map (PCA)
Dim 1 (50.68%)
Dim
2 (
15.4
1%)
Zscore BMIpercent fat
tricep
bicepsubscapularis
iliacus
sahirbhatnagar.com Data Integration CHSGM 2015 15 / 25
Regression via Elastic Net
ExpressionHT-12 v4p = 46, 889
Methylation
Illumina 450k
p = 375, 561
GestationalDiabetesn = 45GD = 29
Placentan = 45
timeat birth age 5| |
X
7 FatMeasures
Childn = 23GD = 16
Y?
sahirbhatnagar.com Data Integration CHSGM 2015 17 / 25
1st PC as Summary Bodyfat Measure
3
8
32
14294
102
1443
187
12853
124
375563
36
37513
81
338052
197
75115
30
7505
188
67612
196
84503
37
9380
202
751251
2
3
4
data used to predict 1st PC of bodyfat measures
LOO
CV
mea
n sq
uare
d er
ror
data.type
Canonical Variables
Expr+Methy non 0 CCA factors
Expr non 0 CCA factors
Methy non 0 CCA factors
Expr+Methy Filter
Expr Filter low means
Methy Filter low var
Expr+Methy Filter low+t.test
Expr Filter low+t.test
Methy Filter low+t.test
Expr+Methy Filter t.test
Expr Filter t.test
Methy Filter t.test
sahirbhatnagar.com Data Integration CHSGM 2015 18 / 25
Ward Clustering Groups
1
8
22
14294
1
1443
20
12853
331
375563
1
37513
54
338052
6
75115
1
7505
6
67612
7
84503
1
9380
30
751250.0
0.1
0.2
0.3
0.4
0.5
data used to predict Ward clustering groups
LOO
CV
mis
clas
sific
atio
n er
ror
data.type
Canonical Variables
Expr+Methy non 0 CCA factors
Expr non 0 CCA factors
Methy non 0 CCA factors
Expr+Methy Filter
Expr Filter low means
Methy Filter low var
Expr+Methy Filter low+t.test
Expr Filter low+t.test
Methy Filter low+t.test
Expr+Methy Filter t.test
Expr Filter t.test
Methy Filter t.test
sahirbhatnagar.com Data Integration CHSGM 2015 19 / 25
Neuropeptide Y Receptor (NPY1R)
From OMIM:
I One of the most abundant neuropeptides in the mammaliannervous system
I Exhibits a diverse range of important physiologic activities,including effects on food intake
I Have been identified in a variety of tissues, includingplacenta (Herzog et al., 1992).
sahirbhatnagar.com Data Integration CHSGM 2015 20 / 25
Motivating Question #2: My Answer
sample size
genomic data
25 50
GeneExpression
DNAMethylation
DNAMethylation
GeneExpression
sahirbhatnagar.com Data Integration CHSGM 2015 21 / 25
Acknowledgements
I Celia Greenwood andMathieu Blanchette
I Greg Voisin, Andree-AnneHoude, Luigi Bouchard
I All the mothers and childrenthat took part in this study
I You
sahirbhatnagar.com Data Integration CHSGM 2015 24 / 25
References
Principal component analysis plots and beamer template. URLhttp://gastonsanchez.com/.
Elena Parkhomenko, David Tritchler, and Joseph Beyene. Sparse canonicalcorrelation analysis with application to genomic data integration.Statistical Applications in Genetics and Molecular Biology, 8(1):1–34,2009.
Daniela M Witten and Robert J Tibshirani. Extensions of sparse canonicalcorrelation analysis with applications to genomic data. Statisticalapplications in genetics and molecular biology, 8(1):1–27, 2009.
Daniela M Witten, Robert Tibshirani, and Trevor Hastie. A penalizedmatrix decomposition, with applications to sparse principal componentsand canonical correlation analysis. Biostatistics, page kxp008, 2009.
sahirbhatnagar.com Data Integration CHSGM 2015 25 / 25