31
GSEA BMApathway RSS Height Bayesian variant-based gene set enrichment analysis using GWAS summary statistics Xiang Zhu University of Chicago March 3, 2016 Xiang Zhu RSS & GSEA March 3, 2016 1 / 15

Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Bayesian variant-based gene setenrichment analysis using GWASsummary statistics

Xiang ZhuUniversity of Chicago

March 3, 2016

Xiang Zhu RSS & GSEA March 3, 2016 1 / 15

Page 2: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

What is Gene Set Enrichment Analysis?(GSEA)

Xiang Zhu RSS & GSEA March 3, 2016 2 / 15

Page 3: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Similar ideas can be adopted in GWAS.

Wang et al. Nature Reviews Genetics 2010; 11; 843-854

Xiang Zhu RSS & GSEA March 3, 2016 3 / 15

Page 4: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

GSEA illustrates the importance ofcombining information.

GWAS + Pathway (this talk)Two reviews: Wang et al (2010); Mooney et al (2014)

GWAS + Functional AnnotationPickrell (2014); Finucane et al (2015) ... ...

GWAS + eQTLNicolae et al (2010); He et al (2013) ... ...

eQTL + Functional AnnotationVeyrieras et al (2008); Wen et al (2015) ... ...

... ...

Xiang Zhu RSS & GSEA March 3, 2016 4 / 15

Page 5: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Most pathway approaches to GWAS donot tell us which genes are relevant.

Real examplesName Source Database # of GenesDisease Reactome Pathway Commons 1206Gene Expression Reactome NCBI BioSystems 1213Homo sapiens miRTarBase Pathway Commons 11343

Two-stage process in Carbonetto and Stephens (2013):1. Identify pathway enrichment

2. Prioritize variants within the enriched pathwaysSoftware: BMApathway, https://github.com/pcarbo/bmapathway

Xiang Zhu RSS & GSEA March 3, 2016 5 / 15

Page 6: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Most pathway approaches to GWAS donot tell us which genes are relevant.

Real examplesName Source Database # of GenesDisease Reactome Pathway Commons 1206Gene Expression Reactome NCBI BioSystems 1213Homo sapiens miRTarBase Pathway Commons 11343

Two-stage process in Carbonetto and Stephens (2013):1. Identify pathway enrichment

2. Prioritize variants within the enriched pathwaysSoftware: BMApathway, https://github.com/pcarbo/bmapathway

Xiang Zhu RSS & GSEA March 3, 2016 5 / 15

Page 7: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Carbonetto and Stephens (2013)propose an integrated approach.

LikelihoodContinuous (e.g. height):

yi|xi, β, τ ∼ N(β0 + xᵀi β, τ−1)

Binary (e.g. diabetes):

yi|xi, β ∼ Bernoulli(η(xi, β)), logit(η(xi, β)) = β0 + xᵀi β

Prioreffect size distribution:

βj ∼ (1− πj)δ0 + πjN(0, σ2B)

probability of being “causal”:

logit10(πj) = θ0 + ajθ

where aj := 1{SNP j is in the pathway}Xiang Zhu RSS & GSEA March 3, 2016 6 / 15

Page 8: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

BMApathway is complicated by theaccess to full GWAS data.

GWAS individual-level data can be hard to obtain.

GWAS summary statistics are widely available.

Xiang Zhu RSS & GSEA March 3, 2016 7 / 15

Page 9: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Can the integrated method be appliedto GWAS summary data?

The method in Carbonetto & Stephens (2013):

p(Param|Individual Data) ∝ p(Individual Data|Param)·p(Param)

A possible modification?

p(Param|Summary Data) ∝ p(Summary Data|Param)·p(Param)

Xiang Zhu RSS & GSEA March 3, 2016 8 / 15

Page 10: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Regression with Summary Statistics(RSS) provides a solution.

bβ|S,R, β ∼ N(SRS−1β,SRS)

single-SNP data: bβ := (β1, . . . , βp)ᵀ

multiple-SNP parameter: β := (β1, . . . , βp)ᵀ

plug in {bS, bR} for {S,R}:bS := diag(bs), bs := (s1, . . . , sp)ᵀ, sj ≈ se(βj);

bR: the estimated LD matrix [Wen & Stephens (2010)]

Xiang Zhu RSS & GSEA March 3, 2016 9 / 15

Page 11: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Regression with Summary Statistics(RSS) provides a solution.

Likelihood:

bβ|S,R, β ∼ N(SRS−1β,SRS)

Prioreffect size distribution:

βj|S, θ0, θ,h ∼ (1− πj)δ0 + πjN(0, σ2B)

probability of being “causal”:

logit10(πj) := θ0 + ajθ

variance of causal effect:

σ2B:= h ·

∑pj=1πjn

−1s−2j

�−1

Xiang Zhu RSS & GSEA March 3, 2016 10 / 15

Page 12: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

RSS retains the integratedcharacteristics of BMApathway.

Test the pathway enrichment:

BF(a) := p(bβ|S,R,a, θ > 0) / p(bβ|S,R,a, θ = 0)

Estimate the enrichment level:

p(θ| bβ,S,R,a)

Estimate the effect of SNP j given the enrichment:

p(βj| bβ,S,R,a)

where a := (a1, . . . ,ap)ᵀ, aj := 1{SNP j is in the pathway}.

Xiang Zhu RSS & GSEA March 3, 2016 11 / 15

Page 13: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

We apply RSS to a GWAS of adulthuman height.

GWAS summary statistics (Wood et al, 2014)1,064,575 SNPs, 253,288 EUR individuals

Reference genotypes ( 1000 Genomes Project, 2010)phased haplotypes from 379 EUR individuals

18,732 protein-coding genes on autosomes

3,823 curated biological pathways

Xiang Zhu RSS & GSEA March 3, 2016 12 / 15

Page 14: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

258

257

241

183

149

65

ID Pathway Name lg(BF) theta theta0

3208 Endochondral ossification 64.5 0.75 (0.00) -2.00 (0.00)

3859 Cellular responses to stress 47.2 0.50 (0.00) -2.00 (0.00)

3856 Metabolism of RNA 44.9 0.50 (0.00) -2.00 (0.00)

3725 TGF-beta receptor signaling 44.7 0.51 (0.04) -2.00 (0.00)

3783 Cellular senescence 44.1 0.75 (0.03) -2.00 (0.00)

3839 Chromatin modifying enzymes 42.8 0.50 (0.00) -2.00 (0.00)

Example 1:Top enrichments

Page 15: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

What is Endochondral Ossification?

Source: http://www.wellcome-matrix.org/research_groups/ray-boot-handford.html

Xiang Zhu RSS & GSEA March 3, 2016 13 / 15

Page 16: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

ID Pathway Name Pathway Database lg(BF) theta theta0

3208 Endochondral ossification Wiki, BioSystems 64.5 0.75 (0.00) -2.00 (0.00)

3398 Hedgehog 'on' state Reactome, BioSystems 34.5 0.75 (0.01) -2.00 (0.00)

3677 Signaling by Hedgehog Reactome, BioSystems 32.3 0.50 (0.00) -2.00 (0.00)

2978 Hedgehog signaling KEGG, BioSystems 30.6 0.75 (0.00) -2.00 (0.00)

2067 Signaling events mediated by the Hedgehog family

PID, PC 23.8 0.75 (0.03) -2.00 (0.00)

2045 Signaling events mediated by the Hedgehog family

PID, BioSystems 23.7 0.75 (0.03) -2.00 (0.00)

3598 Signaling by Hedgehog Reactome, PC 22.3 0.50 (0.00) -2.00 (0.00)

Example 2:Hedgehog Signaling

The Hedgehog signaling pathway in bone formation.Yang et al. IJOS. 2015

Page 17: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Example 3:GTEx eQTL Genes

Deeb et al, 2007

Liver: log10(BF)=10.9, theta=0.50(0.00), theta0=-2.00(0.00)Whole Blood: log10(BF)=10.9, theta=0.25(0.00), theta0=-2.00(0.00)Muscle Skeletal: log10(BF)=10.3, theta=0.50(0.02), theta0=-2.00(0.00)Thyroid: log10(BF)=8.8, theta=0.25(0.00), theta0=-2.00(0.00)

GTEx tissue-specific eQTL gene lists courtesy of S. Urbut.

Enrichment = Tissue + eQTL ?

Page 18: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Additional associations can be informedby enriched pathways.

Xiang Zhu RSS & GSEA March 3, 2016 14 / 15

Page 19: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

GSEA BMApathway RSS Height

Acknowledgements

those who help me conceive the studyMatthew Stephens, Peter Carbonetto

those who teach me genetics/biologyXin He, Nicholas Knoblauch, Siming Zhao, Sarah Urbut

those who make their data publicGIANT, 1000 Genomes, Pathway Commons, BioSystems, GTEx

University of Chicago Research Computing Center

Xiang Zhu RSS & GSEA March 3, 2016 15 / 15

Page 20: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Appendix

Appendix A: Variational Bayes algorithms

Appendix B: Other applications of RSS

Appendix C: Simulations

Xiang Zhu RSS & GSEA March 3, 2016 1 / 12

Page 21: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Appendix A:Variational Bayes algorithms

References:Bishop, C. Pattern Recognition and Machine Learning, Ch. 10

Ormerod, J.T. & Wand, M.P. Am. Stat. 2010

Xiang Zhu RSS & GSEA March 3, 2016 2 / 12

Page 22: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Our posterior calculation exploitsvariational approximation.

γ := (γ1, . . . , γp)ᵀ, where γj = 1 if SNP j is causal

D := { bβ,S,R,a}

Posterior distribution:

p(β, γ|D) =∫

p(β, γ|D, θ0, θ,h)p(θ0, θ,h|D)dθ0dθdh

1. Estimate p(β, γ|D, θ0, θ,h)

2. Estimate p(θ0, θ,h|D)

Xiang Zhu RSS & GSEA March 3, 2016 3 / 12

Page 23: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Step 1: Estimate p(β, γ|D, θ0, θ,h)

Aim: approximate p(β, γ|D, θ0, θ,h) by q?(β, γ)

Decomposition of marginal likelihood:

logp(D|θ0, θ,h) = Eq log�

q(β, γ)

p(β, γ|D, θ0, θ,h)

︸ ︷︷ ︸

Kullback-Leibler (KL) divergence

+Eq log�

p(D, β, γ|θ0, θ,h)

q(β, γ)

︸ ︷︷ ︸

Evidence lower bound (LB)

Optimization over distributions:

q? = argminqKL(q;θ0, θ,h) = argmaxqLB(q;θ0, θ,h)

Mean field approximation:

q(β, γ) =∏p

j=1qj(βj, γj)

Xiang Zhu RSS & GSEA March 3, 2016 4 / 12

Page 24: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Step 2: Estimate p(θ0, θ,h|D)

Put uniform (grid) prior on {θ0, θ,h}:

p(θ0) ∝ 1, p(θ) ∝ 1, p(h) ∝ 1

Approximate p(θ0, θ,h|D):

p(θ0, θ,h|D) ∝ exp{LB(θ0, θ,h)}

What about the exact calculation?

p(θ0, θ,h|D) ∝ p(D|θ0, θ,h)

4! The estimate is discrete.

Xiang Zhu RSS & GSEA March 3, 2016 5 / 12

Page 25: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Appendix B:Other applications of RSS

Other features:Estimate SNP heritability (PVE) [ready to use]

Detect genetic association [ready to use]

Predict phenotype [in progress]https://github.com/xiangzhu/dscr_blm

For more details:Manuscript: will be posted on bioRxiv soon

Software: https://github.com/stephenslab/rss

Xiang Zhu RSS & GSEA March 3, 2016 6 / 12

Page 26: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

RSS can estimate SNP heritability.

(a) Individual-level data + GCTA(Yang et al, 2011)

(b) Summary statistics + RSS

RSS on summary data: 52.1%, [50.3%, 53.9%]

GCTA on subsets of full data: 49.8% (4.4%)Xiang Zhu RSS & GSEA March 3, 2016 7 / 12

Page 27: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

RSS can detect genetic association.

(a) BIMBAM vs RSS (b) GEMMA vs RSS (c) GEMMA vs RSS

BIMBAM: multiple-SNP fine mapping of geneshttp://www.haplotype.org/bimbam.html

GEMMA-BVSR: genome-wide multiple-SNP analysishttps://github.com/xiangzhou/GEMMA

Xiang Zhu RSS & GSEA March 3, 2016 8 / 12

Page 28: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Appendix C: Simulations

We compare RSS with BMAPathway through simulationsbased on real genotype data (WTCCC, 2007).

Null dataset:each SNP is equally likely to be causal

Enriched dataset:SNPs in the Signal Transduction pathway are morelikely to be associated with the phenotypes

Xiang Zhu RSS & GSEA March 3, 2016 9 / 12

Page 29: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Type I Error

Xiang Zhu RSS & GSEA March 3, 2016 10 / 12

Page 30: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Power

Xiang Zhu RSS & GSEA March 3, 2016 11 / 12

Page 31: Bayesian variant-based gene set enrichment …xiangzhu/NHS_20160304.pdf2016/03/04  · GSEABMApathwayRSSHeight Bayesian variant-based gene set enrichment analysis using GWAS summary

Variational Bayes RSS Applications Simulations

Enrichment Estimation

Xiang Zhu RSS & GSEA March 3, 2016 12 / 12