Upload
becca
View
41
Download
0
Embed Size (px)
DESCRIPTION
Associations to Quantitative Trait Network and Analysis of Asthma Data. Seyoung Kim and Eric P. Xing {sssykim, epxing}@cs.cmu.edu Machine Learning Dept. Carnegie Mellon University. 10/30/2009. Genome Informatics 2009 @ Cold Sprint Harbor Lab. Association Analysis of Single Trait. - PowerPoint PPT Presentation
Citation preview
Associations to Quantitative Trait Associations to Quantitative Trait Network and Analysis of Asthma Network and Analysis of Asthma
DataData
Seyoung Kim and Eric P. Xing
{sssykim, epxing}@cs.cmu.edu
Machine Learning Dept.
Carnegie Mellon University
10/30/2009Genome Informatics 2009 @ Cold Sprint Harbor Lab
Association Analysis of Single Trait
a univariate phenotype:a univariate phenotype:i.e., disease/control, i.e., disease/control, gene expression levelgene expression level
causal SNPcausal SNP
Association Analysis of Quantitative Trait Network
a univariate phenotype:a univariate phenotype:i.e., disease/control, i.e., disease/control, gene expression levelgene expression level
causal SNPcausal SNP
Subnetworks for lung physiology Subnetwork for
quality of life
Genetic Association for Asthma Clinical Traits
TCGACGTTTTACTGTACAATT
Gene correlation network with gene modules
TCGACGTTTTACTGTACAATTExpression QTL Mapping
Microarray experiments
Motivation : Multiple-trait AssociationTraditional approach: analyze one phenotype at a
time
Our approach: consider multiple related phenotypes jointly and incorporate correlation structure in the phenotypesGraph-guided fused lasso (Kim & Xing, PLoS Genetics, 2009)
T G
A A
C C
A T
G A
A G
T
A
GenotypeAllergy Symptom
Multivariate Regression for Single-Trait Association Analysis
Xy
2.1 x=
x= β
Association Strength
T G
A A
C C
A T
G A
A G
T
A
Genotype
Multivariate Regression for Single-Trait Association Analysis
Many non-zero associations: how to pick the threshold?
2.1 x=
Association Strength
argmin (y – Xβ) (y – Xβ)β
'
Allergy Symptom
Genotype
x=2.1
Lasso for Reducing False Positives (Tibshirani, 1996)
Many zero associations (sparse results), but what if there are multiple related traits?
+
J
j 1
| βj |
T G
A A
C C
A T
G A
A G
T
A
Lasso Penalty for sparsity
Association Strength
argmin (y – Xβ) (y – Xβ)β
' λ
Allergy Symptom
Genotype
(3.4, 1.5, 2.1, 0.9, 1.8)
Allerg
y fo
r cat
s
Multivariate Regression for Multiple-Trait Association Analysis
T G
A A
C C
A T
G A
A G
T
A
Allergy Lung physiology
How to combine information across multiple traits to increase the power?
Association Strength
x=
+
J
j 1
| βj |argmin (y – Xβ) (y – Xβ)β
' λ
Allerg
y fo
r roa
ches
Allerg
y in
spr
ing
FEVFEF
Genotype
(3.4, 1.5, 2.1, 0.9, 1.8)
Multivariate Regression for Multiple-Trait Association Analysis
T G
A A
C C
A T
G A
A G
T
A
Allergy Lung physiology
Association Strength
x=
+We introduce
graph-guided fusion penalty
argmin (y – Xβ) (y – Xβ)β
' +
J
j 1
| βj |λ
Allerg
y fo
r cat
s
Allerg
y fo
r roa
ches
Allerg
y in
spr
ing
FEVFEF
Genotype
(3.4, 1.5, 2.1, 0.9, 1.8)
Multivariate Regression for Multiple-Trait Association Analysis
T G
A A
C C
A T
G A
A G
T
A
Allergy Lung physiology
Association Strength
x=
+
argmin (y – Xβ) (y – Xβ)β
' +
J
j 1
| βj |λ
Allerg
y fo
r cat
s
Allerg
y fo
r roa
ches
Allerg
y in
spr
ing
FEVFEF
Fusion Penalty
Fusion Penalty: | βjk - βjm |
If two traits are correlated (connected in the trait network), they are likely to share a similar association strength
ACGTTTTACTGTACAATT
SNP j
Trait m
Trait k
Association strength between SNP j and Trait k: βjk
Association strength between SNP j and Trait m: βjm
ACGTTTTACTGTACAATT
Overall effect
Graph-Constrained Fused Lasso
Fusion effect propagates to the entire network Association between SNPs and subnetworks of traits
ACGTTTTACTGTACAATT
Graph-Weighted Fused Lasso
Subnetwork structure is embedded as a densely connected nodes with large edge weights
Edges with small weights are effectively ignored
Overall effect
Previous Works vs. Our Approach
Previous approach Our approach
PCA-based approach (Weller et al., 1996, Mangin et al., 1998)
Implicit representation of trait correlations
Hard to interpret the derived traits
Explicit representation of trait correlations
Extension of module network for eQTL study (Lee et al., 2009)
Average traits within each trait cluster
Loss of information
Original data for traits are used
Network-based approach (Chen et al., 2008, Emilsson et al., 2008)
Separate association analysis for each trait (no information sharing)
Single-trait association are combined in light of trait network modules
Joint association analysis of multiple traits
Asthma Dataset
543 severe asthma patients from the Severe Asthma Research Program (SARP)
Genotypes : 34 SNPs in IL-4R gene 40kb region of chromosome 16 Impute missing genotypes with PHASE (Li and Stephens, 2003)
Traits : 53 asthma-related clinical traits Quality of Life: emotion, environment, activity, symptom Family history: number of siblings with allergy, does the father has asthma? Asthma symptoms: Chest tightness, wheeziness
Asthma Trait Network
Trait Correlation Structure
Traits are reordered according to hierarchical clustering results
Threshold at 0.7
Trait Network
Asthma Trait Network
Subnetwork for quality of life
Subnetwork for lung physiology
Phenotype Correlation Structure
Subnetwork for Asthma symptoms
Results from Single-SNP/Trait Test
Trait Correlation Matrix
Single-Marker Single-Trait Test
SN
Ps
Phenotypes
Phe
noty
pes
Trait Network
Lung physiology-related traits I• Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online • Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted
Q551R SNP• Codes for amino-acid changes in the intracellular signaling portion of the receptor• Exon 12
Permutation test α = 0.05
Permutation test α = 0.01
Lasso Graph-constrained Fused Lasso
Graph-weighted Fused Lasso
Comparison of Gflasso with Others
Trait Correlation Matrix
Single-Marker Single-Trait Test
SN
Ps
Phenotypes
Phe
noty
pes
? ?
Trait Network
Lung physiology-related traits I• Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online • Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted
Q551R SNP• Codes for amino-acid changes in the intracellular signaling portion of the receptor• Exon 12
Software for Genome-Phenome Association
SNP File
Gene Expression File
10600000 10700000
Chromosome 12
Gene module 1
Future Work: Correlated Genome-Transcriptome-Phenome Association Analysis
Genome Structure
Transcriptome Structure
Phenome Structure
Linkage Disequilibrium
Population Structure
Gene Modules
Clinical Traits
• Bi-clustering
• GFlasso• Tree lasso• Population lasso
• GFlasso• Tree lasso• Population lasso
Three-way Association!
Thanks! Software is available at http://sailing.cs.cmu.edu/gflasso Acknowledgements: Ross Curtis, Kyung-Ah Sohn, Sally Wenzel
Funding:Funding:
Reference Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of Royal
Statistical Society, Series B 58:267–288.
Weller J, Wiggans G, Vanraden P, Ron M (1996) Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multi-trait experiment. Theoretical and Applied Genetics 92:998–1002.
Mangin B, Thoquet B, Grimsley N (1998) Pleiotropic QTL analysis. Biometrics 54:89–99.
Chen Y, Zhu J, Lum P, Yang X, Pinto S, et al. (2008) Variations in DNA elucidate molecular networks that cause disease. Nature 452:429–35.
Lee SI, Dudley A, Drubin D, Silver P, Krogan N, et al. (2009) Learning a prior on regulatory potential from eQTL data. PLoS Genetics 5:e1000358.
Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452:423–28.
Traditional Approach
Multiple-Trait Association: Dependencies in Phenome
ACGTTTTACTGTACAATTcausal SNPcausal SNP
Association with Phenome
ACGTTTTACTGTACAATT
Multivariate complex syndrome (e.g., asthma)Multivariate complex syndrome (e.g., asthma)age at onset, history of eczemaage at onset, history of eczema
genome-wide expression profilegenome-wide expression profile
a univariate phenotype:a univariate phenotype:i.e., disease/control, i.e., disease/control, gene expression levelgene expression level
Multiple-trait Association: Graph-Constrained Fused Lasso
Step 1: Thresholded correlation graph of phenotypes
ACGTTTTACTGTACAATT
Step 2: Graph-constrained fused lasso
Lasso Penalty
Graph-constrained fusion penalty
Fusion
Multiple-trait Association: Graph-Weighted Fused Lasso
Step 1: Thresholded correlation graph of phenotypes with weights
Step 2: Graph-weighted fused lasso
ACGTTTTACTGTACAATT
Lasso Penalty
Graph-constrained fusion penalty
Weighted Fusion
Estimating Parameters (Association Strength)Quadratic programming formulation
Graph-constrained fused lasso
Graph-weighted fused lasso
Many publicly available software packages for solving convex optimization problems can be used
Trait Correlation Matrix
True Regression Coefficients
Single SNP-Single Trait Test
Ridge Regression
LassoGraph-constrained Fused Lasso
Graph-weighted Fused Lasso
Thresholded Trait Correlation Network
Simulation Results 50 SNPs taken
from HapMap chromosome 7, CEU population
10 traits
Phenotypes
SN
Ps
Lasso Graph-constrained Fused Lasso
Graph-weighted Fused Lasso
Results from Association
Trait Correlation Matrix
Single-Marker Single-Trait Test
SN
Ps
Phenotypes
Phe
noty
pes
? ?
Trait Network
Lung physiology-related traits II• Percent difference in FEV1: Spirometry•Post FEF 25-75 value•Postbronchodilation FEV1, % pred: Spirometry•Baseline FEV1, liters: Spirometry•Baseline predrug FEV1, liters•Maximum FEV1, liters: MPVLung•Baseline predrug FEV1, liters
Linkage Disequilibrium Structure in IL-4R gene
SNP Q551R
SNP rs3024660
SNP rs3024622
r2 =0.64
r2 =0.07
Computation Time
ConclusionsSummary
Dependencies in phenome: Graph-guided fused lasso framework incorporates correlation information among traits to detect pleiotropic effect of genotypic variations.
Analysis of the asthma dataset suggests the effectiveness of the method
Future WorkDependencies in genome?: Poster Q06 (This evening)Dependencies in both genome and phenomeLearn the trait correlation network and association strengths
jointly
Availability: http://www.sailing.cs.cmu.edu/