59
Machine Learning Machine Learning Application I: Application I: Computational Genomics Computational Genomics Computational Genomics Computational Genomics Eric Xing Eric Xing Lecture 18, August 16, 2010 Eric Xing © Eric Xing @ CMU, 2006-2010 1

Lecture18 xing

Embed Size (px)

Citation preview

Page 1: Lecture18 xing

Machine LearningMachine Learninggg

Application I: Application I: Computational GenomicsComputational GenomicsComputational GenomicsComputational Genomics

Eric XingEric Xing

Lecture 18, August 16, 2010

Eric Xing © Eric Xing @ CMU, 2006-2010 1

ectu e 8, ugust 6, 0 0

Page 2: Lecture18 xing

Biological Data AnalysisBiological Data Analysis Dynamic, noisy, heterogeneous, high-dimensional datay y g g

“High-resolution” inference Parsimonious

S l bilit Scalability Stability Sample complexity Sample complexity Confidence bound

Eric Xing © Eric Xing @ CMU, 2006-2010 2

Page 3: Lecture18 xing

Genetic Basis of DiseasesGenetic Basis of Diseases

HealthyACTCGTACGTAGACCTAGCATTTACGCAATAATGCGA

ACTCGAACCTAGACCTAGCATTTACGCAATAATGCGAACTCGAACCTAGACCTAGCATTTACGCAATAATGCGA

TCTCGTACGTAGACGTAGCATTTACGCAATTATCCGA

ACTCGAACCTAGACCTAGCATTTACGCAATTATCCGA

ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA

SickTCTCGTACCTAGACGTAGCATAACGCAATAATCCGA

ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA

Eric Xing © Eric Xing @ CMU, 2006-2010 3

Single nucleotide polymorphism (SNP)

Causal (or "associated") SNP

Page 4: Lecture18 xing

Genetic Association MappingData

Genetic Association Mapping

A . . . . T . . G . . . . . C . . . . . . T . . . . . A . . G .

Genotype Phenotype Standard Approach

A . . . . A . . C . . . . . C . . . . . . T . . . . . A . . G .

T . . . . T . . G . . . . . G . . . . . . T . . . . . T . . C .

causal SNPcausal SNP

A . . . . A . . C . . . . . C . . . . . . T . . . . . T . . C .

A . . . . T . . G . . . . . G . . . . . . A . . . . . A . . G .

T T C G A A C h th tT . . . . T . . C . . . . . G . . . . . . A . . . . . A . . C .

A . . . . A . . C . . . . . C . . . . . . A . . . . . T . . C .

• Cancer: Dunning et al. 2009.• Diabetes: Dupuis et al 2010

a univariate a univariate phenotypephenotype::e.g., disease/controle.g., disease/control

Eric Xing © Eric Xing @ CMU, 2006-2010 4

Diabetes: Dupuis et al. 2010.• Atopic dermatitis: Esparza-Gordillo et al. 2009.• Arthritis: Suzuki et al. 2008

Page 5: Lecture18 xing

Genetic Basis of Complex DiseasesGenetic Basis of Complex Diseases

HealthyACTCGTACGTAGACCTAGCATTACGCAATAATGCGA

ACTCGAACCTAGACCTAGCATTACGCAATAATGCGAACTCGAACCTAGACCTAGCATTACGCAATAATGCGA

TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA

ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA

ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA

CancerTCTCGTACCTAGACGTAGCATAACGCAATAATCCGA

ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA

Eric Xing © Eric Xing @ CMU, 2006-2010 5

Causal SNPs

Page 6: Lecture18 xing

Genetic Basis of Complex DiseasesGenetic Basis of Complex Diseases

HealthyHealthyACTCGTACGTAGACCTAGCATTACGCAATAATGCGA

ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA

TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA Biological TCTCGTACGTAGACGTAGCATTACGCAATTATCCGA

ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA

ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA

gmechanism

CancerTCTCGTACCTAGACGTAGCATAACGCAATAATCCGA

ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA

Eric Xing © Eric Xing @ CMU, 2006-2010 6

Causal SNPs

Page 7: Lecture18 xing

Genetic Basis of Complex DiseasesIntermediate Phenotype

Genetic Basis of Complex DiseasesAssociation to intermediate phenotypes

Healthy

Gene expressionphenotypes

HealthyACTCGTACGTAGACCTAGCATTACGCAATAATGCGA

ACTCGAACCTAGACCTAGCATTACGCAATAATGCGA

TCTCGTACGTAGACGTAGCATTACGCAATTATCCGATCTCGTACGTAGACGTAGCATTACGCAATTATCCGA

ACTCGAACCTAGACCTAGCATTACGCAATTATCCGA

ACTCGTACGTAGACGTAGCATAACGCAATAATGCGA

CancerTCTCGTACCTAGACGTAGCATAACGCAATAATCCGA

ACTCGAACCTAGACCTAGCATAACGCAATTATCCGA Clinical records

Eric Xing © Eric Xing @ CMU, 2006-2010 7

Causal SNPs

Page 8: Lecture18 xing

Structured AssociationStructured Association

Association with Phenome

ACGTTTTACTGTACAATTACGTTTTACTGTACAATT

Traditional Approachcausal SNPcausal SNP

ACGTTTTACTGTACAATTACGTTTTACTGTACAATT

a univariate phenotype:a univariate phenotype:gene expression  levelgene expression  level

Multivariate complex syndrome (e.g., asthma)Multivariate complex syndrome (e.g., asthma)age at onset, history of eczemaage at onset, history of eczema

genomegenome‐‐wide expression profilewide expression profile

Eric Xing © Eric Xing @ CMU, 2006-2010 8

Page 9: Lecture18 xing

Structured Association : a New Paradigma New Paradigm

Considerone phenotype at a

Considermultiple correlated vs.

Standard Approach New Approach

one phenotype at a time phenotypes (phenome)

jointly

vs.

Phenotypes Phenome

Eric Xing © Eric Xing @ CMU, 2006-2010 9

Page 10: Lecture18 xing

Genetic Association for Asthma Clinical Traits

TCGACGTTTTACTGTACAATT

Asthma Clinical Traits

S b t k f

Statistical challengeHow to find

associations to a multivariate entity in a graph?Subnetworks for

lung physiology Subnetwork for quality of life

in a graph?

Eric Xing © Eric Xing @ CMU, 2006-2010 10

Page 11: Lecture18 xing

Gene Expression TCGACGTTTTACTGTACAATT

Trait AnalysisGenes

Samples

St ti ti l h llStatistical challengeHow to find

associations to a multivariate entity in a tree?in a tree?

Eric Xing © Eric Xing @ CMU, 2006-2010 11

Page 12: Lecture18 xing

Sparse AssociationsSparse Associations

Pleotropic effectsPleotropic effects

ffffEpistatic effectsEpistatic effects

Eric Xing © Eric Xing @ CMU, 2006-2010 12

Page 13: Lecture18 xing

Sparse LearningSparse Learning

Linear Model: Linear Model:

Lasso (Sparse Linear Regression) [R.Tibshirani 96]

Wh l ti ? Why sparse solution? penalizing

Eric Xing © Eric Xing @ CMU, 2006-2010 13

constraining

Page 14: Lecture18 xing

Multi Task ExtensionMulti-Task Extension Multi-Task Linear Model:

Input:Output:

Coefficients for k th task:

p

Coefficients for k-th task:Coefficient Matrix:

Coefficients for a variable (2nd)

Eric Xing © Eric Xing @ CMU, 2006-2010 14Coefficients for a task (2nd)

Page 15: Lecture18 xing

Outline

Background: Sparse multivariate regression for disease

Outline

g p gassociation studies

Structured association – a new paradigm Association to a graph-structured phenome

Graph-guided fused lasso (Kim & Xing PLoS Genetics 2009) Graph guided fused lasso (Kim & Xing, PLoS Genetics, 2009)

Association to a tree-structured phenome Tree-guided group lasso (Kim & Xing, ICML 2010) Tree guided group lasso (Kim & Xing, ICML 2010)

Eric Xing © Eric Xing @ CMU, 2006-2010 15

Page 16: Lecture18 xing

Multivariate Regression for Single-Trait Association Analysis

GenotypeTrait

Trait Association AnalysisAssociation Strength

A G

T A

2.1 x= ?C

C A

T G

A ?

T G

A A

C

Xy x= β

Eric Xing © Eric Xing @ CMU, 2006-2010 16

Page 17: Lecture18 xing

Multivariate Regression for Single-Trait Association Analysis

GenotypeTrait

Trait Association AnalysisAssociation Strength

A G

T A

2.1 x=

C C

A T

G A

T

G A

A C

Eric Xing © Eric Xing @ CMU, 2006-2010 17

Many non-zero associations: Which SNPs are truly significant?

Page 18: Lecture18 xing

Lasso for Reducing False Positives (Tib hi i 1996)

GenotypeTrait

Positives (Tibshirani, 1996)

Association Strength

x=2.1 A G

T A

C

C A

T G

A

T G

A A

C Lasso Penalty for sparsity

+

J

j 1 |βj |

Eric Xing © Eric Xing @ CMU, 2006-2010 18

Many zero associations (sparse results), but what if there are multiple related traits?

Page 19: Lecture18 xing

Multivariate Regression for Multiple-Trait Association Analysis

Genotype

Trait Association AnalysisAssociation Strength

(3.4, 1.5, 2.1, 0.9, 1.8) A G

T A

x=

Association strength between SNP j and Trait i: βj,i

LD

C C

A T

G A

Allergy Lung physiology

T G

A A

C Syntheticlethal

+ ji,

|βj,i|

Eric Xing © Eric Xing @ CMU, 2006-2010 19

How to combine information across multiple traits to increase the power?

Page 20: Lecture18 xing

Multivariate Regression for Multiple-Trait Association Analysis

GenotypeTrait

Trait Association AnalysisAssociation Strength

(3.4, 1.5, 2.1, 0.9, 1.8) A G

T A

x=

Association strength between SNP j and Trait i: βj,i

C C

A T

G A

Allergy Lung physiology

T G

A A

C

+ ji,

|βj,i|

Eric Xing © Eric Xing @ CMU, 2006-2010 20

+We introducegraph-guided fusion penalty

Page 21: Lecture18 xing

Multiple-trait Association:Graph-Constrained Fused LassoGraph-Constrained Fused Lasso

Step 1: Thresholded correlation graph of phenotypes

ACGTTTTACTGTACAATT

Step 2: Graph‐constrained fused lasso

ACGTTTTACTGTACAATT

F iFusion

Lasso Graph-constrained

Eric Xing © Eric Xing @ CMU, 2006-2010 21

Penalty fusion penalty

Page 22: Lecture18 xing

Fusion PenaltyFusion PenaltySNP j

ACGTTTTACTGTACAATT

Association strength β

Association strength between SNP j and Trait m β

Trait m

between SNP j and Trait k: βjkSNP j and Trait m: βjm

Trait k

Fusion Penalty: | βjk - βjm |

Eric Xing © Eric Xing @ CMU, 2006-2010 22

For two correlated traits (connected in the network), the association strengths may have similar values.

Page 23: Lecture18 xing

Graph-Constrained Fused Lasso

Overall effect

Graph-Constrained Fused Lasso

ACGTTTTACTGTACAATT

Fusion effect propagates to the entire network

Eric Xing © Eric Xing @ CMU, 2006-2010 23

Association between SNPs and subnetworks of traits

Page 24: Lecture18 xing

Multiple-trait Association: Graph-Weighted Fused LassoGraph-Weighted Fused Lasso

Overall effect

ACGTTTTACTGTACAATT

Overall effect

Subnetwork structure is embedded as a densely connected nodes with large edge weights

Eric Xing © Eric Xing @ CMU, 2006-2010 24

Edges with small weights are effectively ignored

Page 25: Lecture18 xing

Estimating ParametersEstimating Parameters

Quadratic programming formulationQ p g g Graph-constrained fused lasso

Graph-weighted fused lasso

Many publicly available software packages for solving convex optimization problems can be used

Eric Xing © Eric Xing @ CMU, 2006-2010 25

convex optimization problems can be used

Page 26: Lecture18 xing

Improving ScalabilityImproving ScalabilityOriginal problem

Equivalentlyq y

Using a variational formulationUsing a variational formulation

Iterative optimization• Update βk• Update djk’s, djml’s

Eric Xing © Eric Xing @ CMU, 2006-2010 26

Page 27: Lecture18 xing

Previous Works vs Our ApproachPrevious Works vs. Our Approach

Previous approach Our approachPrevious approach Our approachPCA-based approach (Weller et al., 1996, Mangin et al., 1998)

Implicit representation of trait correlations

Hard to interpret the derived traits

Explicit representation of trait correlations

A i i hi h iExtension of module network for eQTL study (Lee et al., 2009)

Average traits within each trait cluster

Loss of information

Original data for traits are used

Network-based approach Separate association Joint association analysis (Chen et al., 2008, Emilsson et al., 2008)

analysis for each trait (no information sharing)

Single-trait association are

of multiple traits

combined in light of trait network modules

Eric Xing © Eric Xing @ CMU, 2006-2010 27

Page 28: Lecture18 xing

Simulation

• 50 SNPs t k f

Trait Correlation Matrix

Thresholded Trait Correlation Network

Results

taken from HapMap chromosome

PhenotypesS

NP

s

7, CEU population

• 10 traits• 10 traits

Highi ti

No

association

Eric Xing © Eric Xing @ CMU, 2006-2010 28

True Regression Coefficients

Single SNP-Single Trait Test

Significant at α = 0.01

Lasso Graph-guided Fused Lasso

association

Page 29: Lecture18 xing

Simulation ResultsSimulation Results

Eric Xing © Eric Xing @ CMU, 2006-2010 29

Page 30: Lecture18 xing

Asthma Association StudyAsthma Association Study

543 severe asthma patients from the Severe Asthma543 severe asthma patients from the Severe Asthma Research Program (SARP)

Genotypes : 34 SNPs in IL4R gene 40kb region of chromosome 16

I t i i t ith PHASE (Li d St h 2003) Impute missing genotypes with PHASE (Li and Stephens, 2003)

Traits : 53 asthma-related clinical traits Quality of Life: emotion, environment, activity, symptom Family history: number of siblings with allergy, does the father has asthma? Asthma symptoms: chest tightness, wheeziness

Eric Xing © Eric Xing @ CMU, 2006-2010 30

y p g

Page 31: Lecture18 xing

Asthma and IL4R GeneAsthma and IL4R Gene

In normal individuals: allergen ignored by the immune system

Allergen (ragweed, grass, etc.)

T cell

Eric Xing © Eric Xing @ CMU, 2006-2010 31

Page 32: Lecture18 xing

Asthma and IL4R GeneAsthma and IL4R Gene

In asthma patientsIn asthma patientsAllergen (ragweed, grass, etc.)

Th2 cell B cellT cell

ImmunoglobulinE (IgE) antibody

Inflammation!

Interleukin 4 Receptor

• Airway in lung narrows

Interleukin-4 Receptorregulates IgE antibody production

Eric Xing © Eric Xing @ CMU, 2006-2010 32

• Difficult to breathe• Asthma symptoms

Page 33: Lecture18 xing

IL4R GeneIL4R Gene

Chr16: 27,325,251-27,376,098exonsintrons

IL4RSNPs in intronSNPs in intron

SNPs in exon

SNPs in promotor region

Eric Xing © Eric Xing @ CMU, 2006-2010 33

Page 34: Lecture18 xing

Asthma Trait Network

Subnetwork for Asthma symptoms

Phenotype Correlation Network

Subnetwork for quality of life

Subnetwork for lung physiology

Eric Xing © Eric Xing @ CMU, 2006-2010 34

Page 35: Lecture18 xing

Results from Single-SNP/Trait TestgPhenotypes

pes

Lung physiology‐related traits I• Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online

Phe

noty

p

• Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted

Q551R SNP• Codes for amino‐acid changes in the intracellular signaling portion of the receptor

NP

s Highassociation

Trait Network receptor• Exon 11 

S

Noassociation

Eric Xing © Eric Xing @ CMU, 2006-2010 35

Single‐Marker Single‐Trait Test

Permutation test α = 0.05

Permutation test α = 0.01

Page 36: Lecture18 xing

Comparison of Gflasso with OtherspPhenotypes

pes

Lung physiology‐related traits I• Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online

Phe

noty

p

• Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted

Q551R SNP• Codes for amino‐acid changes in the intracellular signaling portion of the receptorTrait Network

NP

s

?

receptor• Exon 11 

Highassociation

S ?Noassociation

Eric Xing © Eric Xing @ CMU, 2006-2010 36

Lasso Graph‐guided Fused Lasso

Single‐Marker Single‐Trait Test

Page 37: Lecture18 xing

Linkage Disequilibrium Structure in IL-4R genein IL-4R gene

SNP rs3024660

SNP rs3024622

r2 =0.07

SNP Q551R

r2 =0.64

Eric Xing © Eric Xing @ CMU, 2006-2010 37

Page 38: Lecture18 xing

IL4R GeneIL4R Gene

Chr16: 27,325,251-27,376,098exonsintrons

IL4RSNPs in intronSNPs in intron

SNPs in exon

SNPs in promotor regionQ551R

rs3024660rs3024622

Eric Xing © Eric Xing @ CMU, 2006-2010 38

Page 39: Lecture18 xing

Association to a T t t d

TCGACGTTTTACTGTACAATT

Tree-structured Phenome

Eric Xing © Eric Xing @ CMU, 2006-2010 39

Page 40: Lecture18 xing

TCGACGTTTTACTGTACAATT

Tree-guided Group LassoTree-guided Group Lasso

Why tree?Why tree? Tree represents a clustering structure

Scalability to a very large number of phenotypes Graph : O(|V|2) edges Tree : O(|V|) edges(| |) g

Expression quantitative trait mapping (eQTL)Agglomerative hierarchical clustering is a popular tool Agglomerative hierarchical clustering is a popular tool

Eric Xing © Eric Xing @ CMU, 2006-2010 40

Page 41: Lecture18 xing

Tree-Guided Group LassoTree-Guided Group Lasso

In a simple case of two genesp g

h

h

• Low height • Large height• Tight correlation• Joint selection

g g• Weak correlation• Separate selection

type

s

ypes

Gen

ot

Gen

oty

Eric Xing © Eric Xing @ CMU, 2006-2010 41

Page 42: Lecture18 xing

Tree-Guided Group LassoTree-Guided Group Lasso

In a simple case of two genes

h Select the child nodes jointly or

p g

j yseparately?

Tree-guided group lassoTree guided group lasso

L1 penalty• Lasso penalty

S t l ti

L2 penalty • Group lasso

Joint selection

Eric Xing © Eric Xing @ CMU, 2006-2010 42

• Separate selection • Joint selection

Elastic net

Page 43: Lecture18 xing

Tree-Guided Group LassoTree-Guided Group Lasso

For a general tree

h2 Select the child nodes jointly or

g

h1

separately?

Tree-guided group lasso

Joint selection

Separate selection

Eric Xing © Eric Xing @ CMU, 2006-2010 43

Page 44: Lecture18 xing

Tree-Guided Group LassoTree-Guided Group Lasso

For a general tree

h2 Select the child nodes jointly or

g

h1

separately?

Tree-guided group lasso

Eric Xing © Eric Xing @ CMU, 2006-2010 44

Joint selection

Separate selection

Page 45: Lecture18 xing

Balanced ShrinkageBalanced Shrinkage

Previously, in Jenatton, Audibert & Bach, 2009

Eric Xing © Eric Xing @ CMU, 2006-2010 45

Page 46: Lecture18 xing

Estimating ParametersEstimating Parameters

Second-order cone programp g

Many publicly available software packages for solving convex optimization problems can be used

Also, variational formulation

Eric Xing © Eric Xing @ CMU, 2006-2010 46

Page 47: Lecture18 xing

Illustration with Simulated DataIllustration with Simulated Datas

Phenotypes

SN

Ps

Highi ti

No

association

Eric Xing © Eric Xing @ CMU, 2006-2010 47

True association strengths 

Lasso Tree‐guided group lasso 

association

Page 48: Lecture18 xing

Yeast eQTL AnalysisYeast eQTL Analysis

HierarchicalHierarchical clustering tree

Ps

Phenotypes

SN

P

Highassociation

Tree‐guided group lasso

Single‐Marker Single Trait Test

Noassociation

Eric Xing © Eric Xing @ CMU, 2006-2010 48

group lasso Single‐Trait Test

Page 49: Lecture18 xing

Ph SG St t Phenome StructureGraph

Genome StructureLinkage Disequilibrium

Structured A i ti

Graph-guided fused lasso (Kim & Xing, PLoS Genetics, 2009)

Stochastic block regression (Kim & Xing, UAI, 2008)

Association TreePopulation Structure

Tree-guided fused lasso (Kim & Xing, Submitted)

Multi-population group lasso (Puniyani, Kim, Xing, Submitted)

Epistasis Dynamic TraitEpistasisACGTTTTACTGTACAATT

Eric Xing © Eric Xing @ CMU, 2006-2010 49

Temporally smoothed lasso (Kim, Howrylak, Xing, Submitted)

Group lasso with networks(Lee, Kim, Xing, Submitted)

Page 50: Lecture18 xing

Ph SG St t Phenome StructureGraph

Genome StructureLinkage Disequilibrium

Structured A i ti

Graph-guided fused lasso (Kim & Xing, PLoS Genetics, 2009)

Stochastic block regression (Kim & Xing, UAI, 2008)

Association TreePopulation Structure

Tree-guided fused lasso (Kim & Xing, Submitted)

Multi-population group lasso (Puniyani, Kim, Xing, Submitted)

Epistasis Dynamic TraitEpistasisACGTTTTACTGTACAATT

Eric Xing © Eric Xing @ CMU, 2006-2010 50

Temporally smoothed lasso (Kim, Howrylak, Xing, Submitted)

Group lasso with networks(Lee, Kim, Xing, Submitted)

Page 51: Lecture18 xing

Computation TimeComputation Time

Eric Xing © Eric Xing @ CMU, 2006-2010 51

Page 52: Lecture18 xing

Proximal Gradient DescentProximal Gradient DescentOriginal gProblem:

Approximation Problem:

Gradient of theApproximation:

Eric Xing © Eric Xing @ CMU, 2006-2010 52

Page 53: Lecture18 xing

Geometric InterpretationGeometric Interpretation Smooth approximation

Uppermost LineNonsmooth

U tUppermost LineSmooth

Eric Xing © Eric Xing @ CMU, 2006-2010 53

Page 54: Lecture18 xing

Convergence RateConvergence RateTheorem: If we require and set , the number of iterations is upper bounded by:

Remarks: state of the art IPM method for for SOCP converges at a rate

Eric Xing © Eric Xing @ CMU, 2006-2010 54

Page 55: Lecture18 xing

Multi Task Time ComplexityMulti-Task Time Complexity Pre-compute:

Per-iteration Complexity (computing gradient)

Tree: IPM for SOCP

Proximal-Gradient

Graph:

Proximal Gradient

IPM for SOCP

P i l G di tProximal-Gradient

P i l G di t I d d t f S l Si

Eric Xing © Eric Xing @ CMU, 2006-2010 55

Proximal-Gradient: Independent of Sample Size Linear in #.of Tasks

Page 56: Lecture18 xing

ExperimentsExperiments

Multi-task Graph Structured Sparse Learning G p S Sp g(GFlasso)

Eric Xing © Eric Xing @ CMU, 2006-2010 56

Page 57: Lecture18 xing

ExperimentsExperiments

Multi-task Tree-Structured Sparse Learning (TreeLasso)

Eric Xing © Eric Xing @ CMU, 2006-2010 57

Page 58: Lecture18 xing

ConclusionsConclusions

Novel statistical methods for joint association analysis to j ycorrelated phenotypes Graph-structured phenome : graph-guided fused lasso Tree-structured phenome : tree-guided group lasso Tree structured phenome : tree guided group lasso

Advantages Greater power to detect weak association signals Fewer false positives Joint association to multiple correlated phenotypes

Other structures In phenotypes: dynamic trait

Eric Xing © Eric Xing @ CMU, 2006-2010 58

p yp y In genotypes: linkage disequilibrium, population structure, epistasis

58

Page 59: Lecture18 xing

Reference• Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, Series B

58:267–288.• Weller J, Wiggans G, Vanraden P, Ron M (1996) Application of a canonical transformation to detection of quantitative trait

loci with the aid of genetic markers in a multi-trait experiment. Theoretical and Applied Genetics 92:998–1002.• Mangin B, Thoquet B, Grimsley N (1998) Pleiotropic QTL analysis. Biometrics 54:89–99.g y ( ) y• Chen Y, Zhu J, Lum P, Yang X, Pinto S, et al. (2008) Variations in DNA elucidate molecular networks that cause disease.

Nature 452:429–35.• Lee SI, Dudley A, Drubin D, Silver P, Krogan N, et al. (2009) Learning a prior on regulatory potential from eQTL data.

PLoS Genetics 5:e1000358.• Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. (2008) Genetics of gene expression and its effect onEmilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. (2008) Genetics of gene expression and its effect on

disease. Nature 452:423–28.• Kim S, Xing EP (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS

Genetics 5(8): e1000587.• Kim S, Xing EP (2008) Sparse feature learning in high-dimensional space via block regularized regression. In Proceedings

of the 24th International Conference on Uncertainty in Artificial Intelligence (UAI) pages 325-332 AUAI Pressof the 24th International Conference on Uncertainty in Artificial Intelligence (UAI), pages 325 332. AUAI Press.• Kim S, Xing EP (2010) Exploiting a hierarchical clustering tree of gene-expression traits in eQTL analysis. Submitted.• Kim S, Howrylak J, Xing EP (2010) Dynamic-trait association analysis via temporally-smoothed lasso. Submitted.• Puniyani K, Kim S, Xing EP (2010) Multi-population GWA mapping via multi-task regularized regression. Submitted.• Lee S, Kim S, Xing EP (2010) Leveraging genetic interaction networks and regulatory pathways for joint mapping of

i t ti d i l QTL S b itt depistatic and marginal eQTLs. Submitted.• Dunning AM et al. (2009) Association of ESR1 gene tagging SNPs with breast cancer risk. Hum Mol Genet. 18(6):1131-9. • Esparza-Gordillo J et al. (2009) A common variant on chromosome 11q13 is associated with atopic dermatitis. Nature

Genetics 41:596-601.• Suzuki A et al. (2008) Functional SNPs in CD244 increase the risk of rheumatoid arthritis in a Japanese population. Nature

Eric Xing © Eric Xing @ CMU, 2006-2010 59

Genetics 40:1224-1229.• Dupuis J et al. (2010) New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.

Nature Genetics 42:105-116.

59