75
The role of environmental heterogeneity in meta-analysis of gene-environment interaction Bhramar Mukherjee Department of Biostatistics, University of Michigan E-mail: [email protected] June 26, 2014 Meta-analysis of G x E 1 / 43

The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

The role of environmental heterogeneity in meta-analysisof gene-environment interaction

Bhramar Mukherjee

Department of Biostatistics, University of Michigan

E-mail: [email protected]

Workshop on Emerging Statistical Challenges and Methods forAnalysis of Massive Genomic Data in Complex Human Disease

Studies, Banff

Summary Data: BIRS 14w5011

June 26, 2014 Meta-analysis of G x E 1 / 43

Page 2: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

The role of environmental heterogeneity in meta-analysisof gene-environment interaction

Bhramar Mukherjee

Department of Biostatistics, University of Michigan

E-mail: [email protected]

Workshop on Emerging Statistical Challenges and Methods forAnalysis of Massive Genomic Data in Complex Human Disease

Studies, Banff

Summary Data: BIRS 14w5011

June 26, 2014 Meta-analysis of G x E 1 / 43

Page 3: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

The role of environmental heterogeneity in meta-analysisof gene-environment interaction

Bhramar Mukherjee

Department of Biostatistics, University of Michigan

E-mail: [email protected]

Workshop on Emerging Statistical Challenges and Methods forAnalysis of Massive Genomic Data in Complex Human Disease

Studies, Banff

Summary Data: BIRS 14w5011

June 26, 2014 Meta-analysis of G x E 1 / 43

Page 4: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

I  saw  a  brown  bear!  

June 26, 2014 Meta-analysis of G x E 2 / 43

Page 5: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

It  was  magical!  

June 26, 2014 Meta-analysis of G x E 3 / 43

Page 6: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

Gene-Environment-Wide interaction studies (GEWIS), orGE-Whiz according to Duncan Thomas!

Statistical ChallengesData harmonization across cohorts.

Misclassification/measurement error in E.

Multiple testing, computational time.

Optimal search strategies for discovery and replication.

Timing of exposure measurement.

Model misspecification in main effects of E and interaction term.

Prohibitive sample size requirement.

Scale dependence of interaction.

June 26, 2014 Meta-analysis of G x E 4 / 43

Page 7: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

G x E Search Strategies for Case-Control Data

Single Step MethodsCase-Control, Case-Only, Empirical Bayes, Bayesian Model Averaging.

Two-Step/Hybrid MethodsUse screening step followed by subset or weighted p-value testing.

Screening test independent of final testing step.

Use marginal genetic association or GE-correlation or both as screen.

Kooperberg-Leblanc, Two-step, Hybrid, Cocktail, EDGxE.

Gene-discovery methods that use interactionJoint 2 df tests for main effects and G x E interaction.

June 26, 2014 Meta-analysis of G x E 5 / 43

Page 8: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

G x E Search Strategies for Case-Control Data

Single Step MethodsCase-Control, Case-Only, Empirical Bayes, Bayesian Model Averaging.

Two-Step/Hybrid MethodsUse screening step followed by subset or weighted p-value testing.

Screening test independent of final testing step.

Use marginal genetic association or GE-correlation or both as screen.

Kooperberg-Leblanc, Two-step, Hybrid, Cocktail, EDGxE.

Gene-discovery methods that use interactionJoint 2 df tests for main effects and G x E interaction.

June 26, 2014 Meta-analysis of G x E 5 / 43

Page 9: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Outline

G x E Search Strategies for Case-Control Data

Single Step MethodsCase-Control, Case-Only, Empirical Bayes, Bayesian Model Averaging.

Two-Step/Hybrid MethodsUse screening step followed by subset or weighted p-value testing.

Screening test independent of final testing step.

Use marginal genetic association or GE-correlation or both as screen.

Kooperberg-Leblanc, Two-step, Hybrid, Cocktail, EDGxE.

Gene-discovery methods that use interactionJoint 2 df tests for main effects and G x E interaction.

June 26, 2014 Meta-analysis of G x E 5 / 43

Page 10: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Meta Analysis of G x E

In press

June 26, 2014 Meta-analysis of G x E 6 / 43

Page 11: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Undeniable importance in the scientific literature

June 26, 2014 Meta-analysis of G x E 7 / 43

Page 12: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Meta-analysis of marginal genetic associations

June 26, 2014 Meta-analysis of G x E 8 / 43

Page 13: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Full efficiency results for fixed effect(s) models

June 26, 2014 Meta-analysis of G x E 9 / 43

Page 14: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Limited statistical literature on meta-analysis of GEI

June 26, 2014 Meta-analysis of G x E 10 / 43

Page 15: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Meta Analysis of G x E

Lots of interesting problems, thus...

Time for some creative statistics

June 26, 2014 Meta-analysis of G x E 11 / 43

Page 16: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Meta analysis of GEI

Possible choices:

Individual Patient Data (IPD) or mega analysis.

Meta-analysis: study level summary statistics (estimates, standard errorsof parameters).

Meta-regression: use study level covariates to explain the heterogeneityamong study-specific effects

June 26, 2014 Meta-analysis of G x E 12 / 43

Page 17: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Rich literature on meta-analysis in clinical trials

June 26, 2014 Meta-analysis of G x E 13 / 43

Page 18: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Notations

Y: quantitative trait

E: continuous environmental exposureG: SNP with genotypes AA, Aa and aa (A: minor allele)

dominant (G = 1 if AA and Aa; G = 0 if aa)additive (G = 2 if AA; G = 1 if Aa; G = 0 if aa)co-dominant (G1 = 1 if Aa and 0 otherwise; G2 = 1 if AA and 0 otherwise)

Z: covariates/confounders

K: number of independent studies

nk: number of participants in the k-th study, k = 1, ...,K

N = ∑Kk=1 nk: total number of participants

Subscript i,k: index for participant i in study k, i = 1, ...,nk, k = 1, ...,K

June 26, 2014 Meta-analysis of G x E 14 / 43

Page 19: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Notations

Y: quantitative trait

E: continuous environmental exposureG: SNP with genotypes AA, Aa and aa (A: minor allele)

dominant (G = 1 if AA and Aa; G = 0 if aa)additive (G = 2 if AA; G = 1 if Aa; G = 0 if aa)co-dominant (G1 = 1 if Aa and 0 otherwise; G2 = 1 if AA and 0 otherwise)

Z: covariates/confounders

K: number of independent studies

nk: number of participants in the k-th study, k = 1, ...,K

N = ∑Kk=1 nk: total number of participants

Subscript i,k: index for participant i in study k, i = 1, ...,nk, k = 1, ...,K

June 26, 2014 Meta-analysis of G x E 14 / 43

Page 20: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

� A set of studies investigating Type 2 Diabetes (T2D): 8 European cohortsquantitative traits case-control study

cohort country study availableb case control TotalDIAGEN Germany cohort 1510 421 622 1043D2D2007 Finland cohort 2693 287 1043 1330DPS Finland randomized trial 433 - - -FUSION-FS Finland subset of FUSION 172 - - -FUSION-S2 Finland matched case-control 2730 624 794 1418METSIM Finland male cohort 1456 632 603 1235HUNT Norway case-control 1324 511 721 1232TROMSO Norway matched case-control 1411 644 693 1337In total 11729 3119 4476 7595b quantitative traits: high-density lipoprotein cholesterol (HDL), LDL, total cholesterol available and

many other T2D related traits on most genotyped participants.

Today’s exampleSNPs in FTO: associated with T2D and BMI (Voight et al. 2010)

age, gender, BMI: associated with T2D and HDL (Kim et al. 2013)

Do SNPs in FTO (G) modify the effect of age or BMI (E) on log HDL(Y) after adjusting for gender, T2D status (Z)?

June 26, 2014 Meta-analysis of G x E 15 / 43

Page 21: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

� A set of studies investigating Type 2 Diabetes (T2D): 8 European cohortsquantitative traits case-control study

cohort country study availableb case control TotalDIAGEN Germany cohort 1510 421 622 1043D2D2007 Finland cohort 2693 287 1043 1330DPS Finland randomized trial 433 - - -FUSION-FS Finland subset of FUSION 172 - - -FUSION-S2 Finland matched case-control 2730 624 794 1418METSIM Finland male cohort 1456 632 603 1235HUNT Norway case-control 1324 511 721 1232TROMSO Norway matched case-control 1411 644 693 1337In total 11729 3119 4476 7595b quantitative traits: high-density lipoprotein cholesterol (HDL), LDL, total cholesterol available and

many other T2D related traits on most genotyped participants.

Today’s exampleSNPs in FTO: associated with T2D and BMI (Voight et al. 2010)

age, gender, BMI: associated with T2D and HDL (Kim et al. 2013)

Do SNPs in FTO (G) modify the effect of age or BMI (E) on log HDL(Y) after adjusting for gender, T2D status (Z)?

June 26, 2014 Meta-analysis of G x E 15 / 43

Page 22: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study: Descriptive summary

E Z G YAge (year) BMI (kg/m2) Gender rs1121980 HDL (mmol/l)

Study N mean (SDa) mean (SD) female (%) MAFa mean (SD)D2D2007 2693 59.9 (8.4) 27.5 (4.8) 52 0.41 1.44 (0.35)DIAGEN 1510 63.3 (14.3) 27.9 (5.2) 55 0.46 1.45 (0.47)DPS 433 55.1 (7.1) 31.3 (4.6) 68 0.44 1.22 (0.29)FUSION-FS 172 38.6 (10.9) 26.2 (4.9) 55 0.43 1.29 (0.32)FUSION-S2 2730 57.2 (8.4) 27.9 (5.1) 44 0.40 1.45 (0.41)HUNT 1324 67.2 (13.1) 28.0 (4.4) 48 0.47 1.26 (0.38)METSIM 1456 56.3 (6.6) 27.9 (4.7) 0.0 0.44 1.42 (0.40)TROMSO 1411 59.9 (12.5) 27.6 (4.7) 50 0.49 1.43 (0.42)Entire study 11729 59.6 (11.1) 27.9 (4.9) 44 0.43 1.41 (0.40)a SD: standard deviation; MAF: minor allele frequency.

June 26, 2014 Meta-analysis of G x E 16 / 43

Page 23: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

1. Underlying model and IPD analysis� Fixed-effect model (FEM)

Yki = αk +βGGki +βEEki +δGkiEki + εki (1)

αk: study specific interceptβG and βE: main effects of G and Eδ : GEIεki ∼ N(0,σ2

k )

� Interpretation of GEI parameter δ :effect of E in subgroups defined by G, e.g. dominantG = 0: βEG = 1: βE +δ

� IPD analysis: fit model (1) using individual level dataAdvantage: efficiency; ‘gold standard’Disadvantage: practical challenges with sharing raw data

June 26, 2014 Meta-analysis of G x E 17 / 43

Page 24: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

1. Underlying model and IPD analysis� Fixed-effect model (FEM)

Yki = αk +βGGki +βEEki +δGkiEki + εki (1)

αk: study specific interceptβG and βE: main effects of G and Eδ : GEIεki ∼ N(0,σ2

k )

� Interpretation of GEI parameter δ :effect of E in subgroups defined by G, e.g. dominantG = 0: βEG = 1: βE +δ

� IPD analysis: fit model (1) using individual level dataAdvantage: efficiency; ‘gold standard’Disadvantage: practical challenges with sharing raw data

June 26, 2014 Meta-analysis of G x E 17 / 43

Page 25: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

1. Underlying model and IPD analysis� Fixed-effect model (FEM)

Yki = αk +βGGki +βEEki +δGkiEki + εki (1)

αk: study specific interceptβG and βE: main effects of G and Eδ : GEIεki ∼ N(0,σ2

k )

� Interpretation of GEI parameter δ :effect of E in subgroups defined by G, e.g. dominantG = 0: βEG = 1: βE +δ

� IPD analysis: fit model (1) using individual level dataAdvantage: efficiency; ‘gold standard’Disadvantage: practical challenges with sharing raw data

June 26, 2014 Meta-analysis of G x E 17 / 43

Page 26: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Revisiting the efficiency results of Lin and Zheng

Yki = αk +βGGki +βEEki +δGkiEki + εki, εki ∼ N(0,σ2k ) (2)

(αk,σ2k ): study specific nuisance parameters

(βG and βE): common nuisance parameters

δ : common parameter of interest

Need to pool study specific estimates of the common parameters(βG,βE,δ ) as a vector with estimates of multivariate information matrix(3×3) as weights for retaining full efficiency as IPD.

Loss of efficiency by only pooling a subset of common parameters, sayjust univariate inverse variance weighted estimate of δ except somespecial cases.

June 26, 2014 Meta-analysis of G x E 18 / 43

Page 27: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Revisiting the efficiency results of Lin and Zheng

Yki = αk +βGGki +βEEki +δGkiEki + εki, εki ∼ N(0,σ2k ) (2)

(αk,σ2k ): study specific nuisance parameters

(βG and βE): common nuisance parameters

δ : common parameter of interest

Need to pool study specific estimates of the common parameters(βG,βE,δ ) as a vector with estimates of multivariate information matrix(3×3) as weights for retaining full efficiency as IPD.

Loss of efficiency by only pooling a subset of common parameters, sayjust univariate inverse variance weighted estimate of δ except somespecial cases.

June 26, 2014 Meta-analysis of G x E 18 / 43

Page 28: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

2a. Meta-analysis: UIVW estimator

� Univariate inverse-variance weighted (UIVW) estimatorCollect δk and v(δk) from each study k

Fixed-effect meta-analysis: δk ∼ N(δ ,v(δk))

v(δk): asymptotic model based variance of δk

‘standard condition’: v(δk) can be estimated by v(δk) with negligible error(e.g. Dersimonian and Laird 1986, Whitehead and Whitehead 1991, Lin and Zeng, 2010)

� δUIVW =

{∑k

v(δk)−1}−1

∑k

v(δk)−1

δk and v(δ UIVW) ={∑k

v(δk)−1}−1

Advantage: only need study level estimates (δk and v(δk))Disadvantage: potential efficiency loss

June 26, 2014 Meta-analysis of G x E 19 / 43

Page 29: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

2a. Meta-analysis: UIVW estimator

� Univariate inverse-variance weighted (UIVW) estimatorCollect δk and v(δk) from each study k

Fixed-effect meta-analysis: δk ∼ N(δ ,v(δk))

v(δk): asymptotic model based variance of δk

‘standard condition’: v(δk) can be estimated by v(δk) with negligible error(e.g. Dersimonian and Laird 1986, Whitehead and Whitehead 1991, Lin and Zeng, 2010)

� δUIVW =

{∑k

v(δk)−1}−1

∑k

v(δk)−1

δk and v(δ UIVW) ={∑k

v(δk)−1}−1

Advantage: only need study level estimates (δk and v(δk))Disadvantage: potential efficiency loss

June 26, 2014 Meta-analysis of G x E 19 / 43

Page 30: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

2b. Meta-analysis: MIVW estimator (Lin and Zeng, 2010)

� Multivariate inverse-variance weighted (MIVW) estimatorConsider the vector: β = (βG,βE,δ )

Collect βk = (βGk, βEk, δk) and its estimated 3×3 covariance matrix v(β k)

� βMIVW

={∑k

v(β k)−1}−1

∑k

v(β k)−1

β k and v(βMIVW

) ={∑k

v(β k)−1}−1

Obtain δ MIVW and v(δ MIVW) from corresponding elements of β MIVW.

Advantage: asymptotically fully efficient as δ IPD

Disadvantage: estimated multivariate covariance matrix v(β k) is rarelyavailable in published results

June 26, 2014 Meta-analysis of G x E 20 / 43

Page 31: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

2b. Meta-analysis: MIVW estimator (Lin and Zeng, 2010)

� Multivariate inverse-variance weighted (MIVW) estimatorConsider the vector: β = (βG,βE,δ )

Collect βk = (βGk, βEk, δk) and its estimated 3×3 covariance matrix v(β k)

� βMIVW

={∑k

v(β k)−1}−1

∑k

v(β k)−1

β k and v(βMIVW

) ={∑k

v(β k)−1}−1

Obtain δ MIVW and v(δ MIVW) from corresponding elements of β MIVW.

Advantage: asymptotically fully efficient as δ IPD

Disadvantage: estimated multivariate covariance matrix v(β k) is rarelyavailable in published results

June 26, 2014 Meta-analysis of G x E 20 / 43

Page 32: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

3. Meta-regression

� Intuition: interaction model (1) implies Y-G association depends linearlyon E

� Two-stage meta-regressionStage 1: Estimating marginal effect of G to obtain λk and v(λk)

Yki = λ0k +λkGki +ηki, i = 1, ...,nk

Stage 2: Examine whether marginal effect estimates (λk) depend on studylevel means of E (mk = ∑i Eki/nk)

λk = γ0 + γmk + εk, εk ∼ N(0, v(λk)), k = 1, ...,K,

� δ MR: weighted least squares estimator of γ

Advantage: can identify interaction with λk, v(λk) and mkDisadvantage: huge potential for ecological bias and hard to implementwith few studies

June 26, 2014 Meta-analysis of G x E 21 / 43

Page 33: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

3. Meta-regression

� Intuition: interaction model (1) implies Y-G association depends linearlyon E

� Two-stage meta-regressionStage 1: Estimating marginal effect of G to obtain λk and v(λk)

Yki = λ0k +λkGki +ηki, i = 1, ...,nk

Stage 2: Examine whether marginal effect estimates (λk) depend on studylevel means of E (mk = ∑i Eki/nk)

λk = γ0 + γmk + εk, εk ∼ N(0, v(λk)), k = 1, ...,K,

� δ MR: weighted least squares estimator of γ

Advantage: can identify interaction with λk, v(λk) and mkDisadvantage: huge potential for ecological bias and hard to implementwith few studies

June 26, 2014 Meta-analysis of G x E 21 / 43

Page 34: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

3. Meta-regression

� Intuition: interaction model (1) implies Y-G association depends linearlyon E

� Two-stage meta-regressionStage 1: Estimating marginal effect of G to obtain λk and v(λk)

Yki = λ0k +λkGki +ηki, i = 1, ...,nk

Stage 2: Examine whether marginal effect estimates (λk) depend on studylevel means of E (mk = ∑i Eki/nk)

λk = γ0 + γmk + εk, εk ∼ N(0, v(λk)), k = 1, ...,K,

� δ MR: weighted least squares estimator of γ

Advantage: can identify interaction with λk, v(λk) and mkDisadvantage: huge potential for ecological bias and hard to implementwith few studies

June 26, 2014 Meta-analysis of G x E 21 / 43

Page 35: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

4. Adaptively Weighted Estimator (AWE)

� Motivation: A combined estimator that uses only univariate summarystatistics.

� Proposal: δ AWE(w) = wδ UIVW +(1−w)δ MR, 0≤ w≤ 1Find w that minimizes v(δ AWE(w))

Need to know cov(δ UIVW, δ MR)

June 26, 2014 Meta-analysis of G x E 22 / 43

Page 36: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

4. Adaptively Weighted Estimator (AWE)

� Motivation: A combined estimator that uses only univariate summarystatistics.

� Proposal: δ AWE(w) = wδ UIVW +(1−w)δ MR, 0≤ w≤ 1Find w that minimizes v(δ AWE(w))

Need to know cov(δ UIVW, δ MR)

June 26, 2014 Meta-analysis of G x E 22 / 43

Page 37: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

June 26, 2014 Meta-analysis of G x E 23 / 43

Page 38: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Weights

LEMMA.

For k = 1, ...,K, cov(δk, λk)=0

⇒ cov(δ UIVW, δ MR) = 0

RESULT.

v(δ AWE(w)) attains its minimum at [v(δ UIVW)−1 + v(δ MR)−1]−1 if and only ifweight on δ UIVW is w = {v(δ UIVW)−1 +v(δ MR)−1}−1v(δ UIVW)−1.

� δ AWE = {v(δ UIVW)−1 + v(δ MR)−1}−1{v(δ UIVW)−1δ UIVW + v(δ MR)−1δ MR}

δ AWE: Inverse variance weighted estimator of δ UIVW and δ MR

June 26, 2014 Meta-analysis of G x E 24 / 43

Page 39: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Weights

LEMMA.

For k = 1, ...,K, cov(δk, λk)=0 ⇒ cov(δ UIVW, δ MR) = 0

RESULT.

v(δ AWE(w)) attains its minimum at [v(δ UIVW)−1 + v(δ MR)−1]−1 if and only ifweight on δ UIVW is w = {v(δ UIVW)−1 +v(δ MR)−1}−1v(δ UIVW)−1.

� δ AWE = {v(δ UIVW)−1 + v(δ MR)−1}−1{v(δ UIVW)−1δ UIVW + v(δ MR)−1δ MR}

δ AWE: Inverse variance weighted estimator of δ UIVW and δ MR

June 26, 2014 Meta-analysis of G x E 24 / 43

Page 40: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Weights

LEMMA.

For k = 1, ...,K, cov(δk, λk)=0 ⇒ cov(δ UIVW, δ MR) = 0

RESULT.

v(δ AWE(w)) attains its minimum at [v(δ UIVW)−1 + v(δ MR)−1]−1 if and only ifweight on δ UIVW is w = {v(δ UIVW)−1 +v(δ MR)−1}−1v(δ UIVW)−1.

� δ AWE = {v(δ UIVW)−1 + v(δ MR)−1}−1{v(δ UIVW)−1δ UIVW + v(δ MR)−1δ MR}

δ AWE: Inverse variance weighted estimator of δ UIVW and δ MR

June 26, 2014 Meta-analysis of G x E 24 / 43

Page 41: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Weights

LEMMA.

For k = 1, ...,K, cov(δk, λk)=0 ⇒ cov(δ UIVW, δ MR) = 0

RESULT.

v(δ AWE(w)) attains its minimum at [v(δ UIVW)−1 + v(δ MR)−1]−1 if and only ifweight on δ UIVW is w = {v(δ UIVW)−1 +v(δ MR)−1}−1v(δ UIVW)−1.

� δ AWE = {v(δ UIVW)−1 + v(δ MR)−1}−1{v(δ UIVW)−1δ UIVW + v(δ MR)−1δ MR}

δ AWE: Inverse variance weighted estimator of δ UIVW and δ MR

June 26, 2014 Meta-analysis of G x E 24 / 43

Page 42: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Notation: heterogeneity in E

� Classical ANOVA partitioning: TSS(E) = BSS(E)+WSS(E)

TSS = Ns2E, total sum of squares of E

BSS = ∑k nk(mk−m)2, between-study sum of squares of EWSS = ∑k nks2

Ek, within-study sum of squares of E

wheremk = ∑i Eki/nk, s2

Ek =1nk

∑i(Eki−mk)2, for the k-th study

m = ∑k, i Eki/N, s2E = 1

N ∑k, i(Eki−m)2, combining all studies

� TSS/Np→ tss, WSS/N

p→ wss, BSS/Np→ bss, as N→ ∞

assume nk/N→ rk ∈ (0,1) as N→ ∞

tss = σ2E , wss = ∑k rkσ2

Ek and bss = ∑k rk(µk−µ)2

June 26, 2014 Meta-analysis of G x E 25 / 43

Page 43: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Notation: heterogeneity in E

� Classical ANOVA partitioning: TSS(E) = BSS(E)+WSS(E)

TSS = Ns2E, total sum of squares of E

BSS = ∑k nk(mk−m)2, between-study sum of squares of EWSS = ∑k nks2

Ek, within-study sum of squares of E

wheremk = ∑i Eki/nk, s2

Ek =1nk

∑i(Eki−mk)2, for the k-th study

m = ∑k, i Eki/N, s2E = 1

N ∑k, i(Eki−m)2, combining all studies

� TSS/Np→ tss, WSS/N

p→ wss, BSS/Np→ bss, as N→ ∞

assume nk/N→ rk ∈ (0,1) as N→ ∞

tss = σ2E , wss = ∑k rkσ2

Ek and bss = ∑k rk(µk−µ)2

June 26, 2014 Meta-analysis of G x E 25 / 43

Page 44: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Notation: heterogeneity in E

� Classical ANOVA partitioning: TSS(E) = BSS(E)+WSS(E)

TSS = Ns2E, total sum of squares of E

BSS = ∑k nk(mk−m)2, between-study sum of squares of EWSS = ∑k nks2

Ek, within-study sum of squares of E

wheremk = ∑i Eki/nk, s2

Ek =1nk

∑i(Eki−mk)2, for the k-th study

m = ∑k, i Eki/N, s2E = 1

N ∑k, i(Eki−m)2, combining all studies

� TSS/Np→ tss, WSS/N

p→ wss, BSS/Np→ bss, as N→ ∞

assume nk/N→ rk ∈ (0,1) as N→ ∞

tss = σ2E , wss = ∑k rkσ2

Ek and bss = ∑k rk(µk−µ)2

June 26, 2014 Meta-analysis of G x E 25 / 43

Page 45: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Summary of the methods

Methods Data shared Bias AREa

IPD individual level data unbiased 1UIVW δk, v(δk) unbiased wss/tssMIVW βk, v(βk) unbiased 1MR λk, v(λk) and mk unbiased under G-E indepen-

dence, ecological bias in gen-eral

bss/tss

AWE δk, v(δk), λk, v(λk) and mk unbiased under G-E indepen-dence, bias adaptively con-trolled

1

a ARE: asymptotic relative efficiency relative to δ IPD under certain assumptions

Under certain assumptions

δ AWE ≈ WSSBSS+WSS

δUIVW +

BSSBSS+WSS

δMR.

June 26, 2014 Meta-analysis of G x E 26 / 43

Page 46: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Summary of the methods

Methods Data shared Bias AREa

IPD individual level data unbiased 1UIVW δk, v(δk) unbiased wss/tssMIVW βk, v(βk) unbiased 1MR λk, v(λk) and mk unbiased under G-E indepen-

dence, ecological bias in gen-eral

bss/tss

AWE δk, v(δk), λk, v(λk) and mk unbiased under G-E indepen-dence, bias adaptively con-trolled

1

a ARE: asymptotic relative efficiency relative to δ IPD under certain assumptions

Under certain assumptions

δ AWE ≈ WSSBSS+WSS

δUIVW +

BSSBSS+WSS

δMR.

June 26, 2014 Meta-analysis of G x E 26 / 43

Page 47: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Similar in spirit to the recommendation of Simmonds and Higgins (2007)

June 26, 2014 Meta-analysis of G x E 27 / 43

Page 48: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Adaptive control of bias in AWE

0.0 0.2 0.4 0.6 0.8 1.0

Additive model

BSS/TSS

Abs

olut

e R

elat

ive

Bia

s

MRAWE

00.

10.

20.

3

0.0 0.2 0.4 0.6 0.8 1.0

Dominant model

BSS/TSS

Abs

olut

e R

elat

ive

Bia

s

MRAWE

00.

10.

2

June 26, 2014 Meta-analysis of G x E 28 / 43

Page 49: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 50: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 51: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 52: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 53: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 54: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation study

� Relative performances of IPD, UIVW, MIVW, MR and AWE

� Simulation scenarios:1 Lack of common set of confounders to adjust across studies

Yki = αk +βGGki +βEEki +δGkiEki

+βTZk

Zk + εki

Zk gender race smokestudy 1

√ √ √

study 2√ √

NAstudy 3

√NA

......

......

δ NIPD: Naive IPD analysis based on subset of Z that is commonlyavailable to ALL studies

2 non-linear interaction

� K = 20 studies, N = 10,000 participants, nk ranging 200-2000

June 26, 2014 Meta-analysis of G x E 29 / 43

Page 55: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Simulation results: typical power graph

Type III R2 of G × E interaction (%)

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2

Dominant WSS < BSS

Additive WSS < BSS

0.0 0.1 0.2

Co−dominantWSS < BSS

Dominant WSS > BSS

0.0 0.1 0.2

Additive WSS > BSS

0.0

0.2

0.4

0.6

0.8

1.0Co−dominantWSS > BSS

IPDUIVW

REMMIVW

MRAWE

TS

Figure : Comparison of the proposed methods in terms of empirical power

June 26, 2014 Meta-analysis of G x E 30 / 43

Page 56: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Power: lack of common set of confounders to adjust

Type III R2 of G × E interaction (%)

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.1 0.2

Dominant WSS < BSS

Additive WSS < BSS

0.0 0.1 0.2

Co−dominantWSS < BSS

Dominant WSS > BSS

0.0 0.1 0.2

Additive WSS > BSS

0.0

0.2

0.4

0.6

0.8

1.0

Co−dominantWSS > BSS

IPDNIPD

UIVWREM

MIVWMR

AWETS

Figure : Power comparison under lack of common set of confounders to adjust

performance regarding estimation and testing of GEI was relativelyunchanged (also noted in VanderWeele et al. 2013)

June 26, 2014 Meta-analysis of G x E 31 / 43

Page 57: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Non-linear interactionFigure : Non-linear G-E interaction model and the covariate heterogeneity of E(sigmoid curve: true association βG(E) = 2exp(E−50)/{1+ exp(E−50)}+2)

40 45 50 55 60

Covariate heterogeneity of E across studies (boxplots)

Value of E

True

non

−lin

ear

rela

tions

hip

βG(E

) (cu

rve)

1234567891011121314151617181920Study

45.646.146.546.947.647.948.448.949.650.150.850.951.652.052.753.053.553.954.655.0

µk

1.82.01.52.03.51.51.71.42.13.53.51.71.92.13.51.92.01.61.71.3σEk

2

2.5

3

3.5

4

June 26, 2014 Meta-analysis of G x E 32 / 43

Page 58: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Non-linear interaction: power

Value of E

Pow

er to

det

ect G

EI (

barc

hart

)

45 46 47 48 49 50 51 52 53 54

0.0

0.1

0.2

0.3

0.4

0.5

0.0

0.1

0.2

0.3

0.4

0.5

True

val

ue o

f GE

I (cu

rve)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

800

400

500

200

200

200

400

500

500

2000

500

500

200

400

1000

400

400

500

200

200

1.8

2.0

1.5

2.0

3.5

1.5

1.7

1.4

2.1

3.5

3.5

1.7

1.9

2.1

3.5

1.9

2.0

1.6

1.7

1.3

kn k

σ Ek

barchart: power

green curve: value of truenon-linear interaction

red bars: four cohorts withrelatively larger variance of E

Method PowerIPD 0.96UIVW 0.68MIVW 0.96MR 0.82AWE 0.96

June 26, 2014 Meta-analysis of G x E 33 / 43

Page 59: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

Non-linear interaction: power

Value of E

Pow

er to

det

ect G

EI (

barc

hart

)

45 46 47 48 49 50 51 52 53 54

0.0

0.1

0.2

0.3

0.4

0.5

0.0

0.1

0.2

0.3

0.4

0.5

True

val

ue o

f GE

I (cu

rve)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

800

400

500

200

200

200

400

500

500

2000

500

500

200

400

1000

400

400

500

200

200

1.8

2.0

1.5

2.0

3.5

1.5

1.7

1.4

2.1

3.5

3.5

1.7

1.9

2.1

3.5

1.9

2.0

1.6

1.7

1.3

kn k

σ Ek

barchart: power

green curve: value of truenon-linear interaction

red bars: four cohorts withrelatively larger variance of E

Method PowerIPD 0.96UIVW 0.68MIVW 0.96MR 0.82AWE 0.96

June 26, 2014 Meta-analysis of G x E 33 / 43

Page 60: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study: meta-analysis

Age : log(HDLki) = αk +βGSNPki +βEAgeki +δSNPki×Ageki +βz1BMIki +βz2Genderki +βz3T2D+ εki

BMI : log(HDLki) = αk +βGSNPki +βEBMIki +δSNPki×BMIki +βz1Ageki +βz2Genderki +βz3T2D+ εki

−0.010 −0.005 0.000 0.005 0.010

rs1121980 X age interaction

−0.005 0.000 0.005 0.010 0.015

rs1121980 X BMI interaction

D2D2007DIAGEN

DPSFUSION_FSFUSION_S2

HUNTMETSIM

TROMSO

UIVWREM

MIVW2’MIVWAWE

AWE2’

IPD

June 26, 2014 Meta-analysis of G x E 34 / 43

Page 61: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study: meta-analysis

Age : log(HDLki) = αk +βGSNPki +βEAgeki +δSNPki×Ageki +βz1BMIki +βz2Genderki +βz3T2D+ εki

BMI : log(HDLki) = αk +βGSNPki +βEBMIki +δSNPki×BMIki +βz1Ageki +βz2Genderki +βz3T2D+ εki

−0.010 −0.005 0.000 0.005 0.010

rs1121980 X age interaction

−0.005 0.000 0.005 0.010 0.015

rs1121980 X BMI interaction

D2D2007DIAGEN

DPSFUSION_FSFUSION_S2

HUNTMETSIM

TROMSO

UIVWREM

MIVW2’MIVWAWE

AWE2’

IPD

June 26, 2014 Meta-analysis of G x E 34 / 43

Page 62: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study: meta-regression

40 45 50 55 60 65

−0.

025

−0.

015

−0.

005

0.00

5

study specific mean age (year)

Mar

gina

l SN

P e

ffect

FUSION_FS

27 28 29 30 31−

0.02

5−

0.01

5−

0.00

50.

005

study specific mean BMI (kg/m2)

DPS

June 26, 2014 Meta-analysis of G x E 35 / 43

Page 63: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study

� RS1121980×BMI interaction on log(HDL)

rs1121980 × BMI P-value*Method Estimate SEa 95% CIa additive co-dominantIPD 1.880 0.679 (0.548, 3.212) 0.006∗? 0.013∗

UIVW 1.936 0.700 (0.565, 3.308) 0.006∗ 0.018∗

MIVW 1.957 0.698 (0.589, 3.325) 0.005∗ 0.017∗

MR -0.062 3.484 (-6.890, 6.767) 0.986 0.630AWE 1.859 0.686 (0.514, 3.204) 0.007∗ 0.016∗

a SE: standard error (estimates and SEs multiplied by 1000); CI: confidence interval.

� Combining βE(BMI) with δBMI (BMI×SNP): for 1 kg/m2 increase in BMI,HDL level will decrease by (in terms of percentage change(yx+1− yx)/yx×100%):

1.73 (95% CI: (1.57, 1.90)) for GG1.54 (95% CI: (1.44, 1.64)) for AG or GA1.35 (95% CI: (1.17, 1.53)) for AA (minor allele: A)

� Presence of A attenuated the (negative) association between BMI and HDL

June 26, 2014 Meta-analysis of G x E 36 / 43

Page 64: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Methods

T2D study

� RS1121980×BMI interaction on log(HDL)

rs1121980 × BMI P-value*Method Estimate SEa 95% CIa additive co-dominantIPD 1.880 0.679 (0.548, 3.212) 0.006∗? 0.013∗

UIVW 1.936 0.700 (0.565, 3.308) 0.006∗ 0.018∗

MIVW 1.957 0.698 (0.589, 3.325) 0.005∗ 0.017∗

MR -0.062 3.484 (-6.890, 6.767) 0.986 0.630AWE 1.859 0.686 (0.514, 3.204) 0.007∗ 0.016∗

a SE: standard error (estimates and SEs multiplied by 1000); CI: confidence interval.

� Combining βE(BMI) with δBMI (BMI×SNP): for 1 kg/m2 increase in BMI,HDL level will decrease by (in terms of percentage change(yx+1− yx)/yx×100%):

1.73 (95% CI: (1.57, 1.90)) for GG1.54 (95% CI: (1.44, 1.64)) for AG or GA1.35 (95% CI: (1.17, 1.53)) for AA (minor allele: A)

� Presence of A attenuated the (negative) association between BMI and HDL

June 26, 2014 Meta-analysis of G x E 36 / 43

Page 65: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Summary

� Adaptively weighted estimator:Instead of a discrete choice, a weighted combination of UIVW and MR

Captures precision trade-offs in terms of covariate heterogeneity in E

Uses only univariate summary statistics

Test based on AWE valid under fixed effects model

Efficiency comes at a cost of potential bias, but well-controlled trade-off.

June 26, 2014 Meta-analysis of G x E 37 / 43

Page 66: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Summary

� Adaptively weighted estimator:Instead of a discrete choice, a weighted combination of UIVW and MR

Captures precision trade-offs in terms of covariate heterogeneity in E

Uses only univariate summary statistics

Test based on AWE valid under fixed effects model

Efficiency comes at a cost of potential bias, but well-controlled trade-off.

June 26, 2014 Meta-analysis of G x E 37 / 43

Page 67: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Summary

� Adaptively weighted estimator:Instead of a discrete choice, a weighted combination of UIVW and MR

Captures precision trade-offs in terms of covariate heterogeneity in E

Uses only univariate summary statistics

Test based on AWE valid under fixed effects model

Efficiency comes at a cost of potential bias, but well-controlled trade-off.

June 26, 2014 Meta-analysis of G x E 37 / 43

Page 68: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Summary

� Adaptively weighted estimator:Instead of a discrete choice, a weighted combination of UIVW and MR

Captures precision trade-offs in terms of covariate heterogeneity in E

Uses only univariate summary statistics

Test based on AWE valid under fixed effects model

Efficiency comes at a cost of potential bias, but well-controlled trade-off.

June 26, 2014 Meta-analysis of G x E 37 / 43

Page 69: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Summary

� Adaptively weighted estimator:Instead of a discrete choice, a weighted combination of UIVW and MR

Captures precision trade-offs in terms of covariate heterogeneity in E

Uses only univariate summary statistics

Test based on AWE valid under fixed effects model

Efficiency comes at a cost of potential bias, but well-controlled trade-off.

June 26, 2014 Meta-analysis of G x E 37 / 43

Page 70: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Big Picture Question: Where have all the GEI gone?

When will we ever learn?

I will keep exploring in my yellow submarine

Thank you for listening!

June 26, 2014 Meta-analysis of G x E 38 / 43

Page 71: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Heterogeneity in E exists!Example Courtesy: Nilanjan Chatterjee, NCI

Obesity, 2012

June 26, 2014 Meta-analysis of G x E 39 / 43

Page 72: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

India Health Study

Trivandrum

New Delhi

June 26, 2014 Meta-analysis of G x E 40 / 43

Page 73: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Participant Characteristics by Region

Characteristic New Delhi Trivandrum

Total (n=1,313) n=619 n=694

Age, years (mean, SD) 47.4 ± 10.0 48.8 ± 9.2

Household monthly income, %

<5,000 rupees 7.1 71.9

>10,000 rupees 76.7 3.1

Household items, %

Car 25 7

Refrigerator 87 58

Washing machine 79 14

Total physical activity, MET-hr/wk 42.5 ± 43.8 147.3 ± 85.2

Vigorous physical activity, MET-hr/wk 0.6 ± 6.8 26.2 ± 51.4

Sitting, hr/day 10.4 ± 2.0 5.0 ± 2.3

Centrally obese, % 82.1 60.2

June 26, 2014 Meta-analysis of G x E 41 / 43

Page 74: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

Association of FTO rs3751812 with Waist Circumference

Overall 1,209 +1.61 cm (0.67, 2.55) 0.0008

New Delhi

Overall 578 +2.53 cm (1.08, 3.97) 0.0006

Trivandrum

Overall 574 +0.87 cm (-0.35, 2.08) 0.16

By PA

< 91 MET-hrs/wk 517 +2.36 cm (0.82, 3.89) 0.003

92-151 MET-hrs/wk 32 +6.39 cm (1.94, 10.85) 0.005

152-217 MET-hrs/wk 24 -0.95 cm (-7.33, 5.42) 0.77

218+ MET-hrs/wk 5 N/A N/A

By PA

< 91 MET-hrs/wk 170 +3.50 cm (0.90, 6.10) 0.008

92-151 MET-hrs/wk 132 +1.13 cm (-1.08, 3.33) 0.32

152-217 MET-hrs/wk 141 +1.04 cm (-1.63, 3.70) 0.45

218+ MET-hrs/wk 131 -2.32 cm (-4.82, 0.18) 0.07

Characteristic NEffect size per T allele

(95% CI)Ptrend

0.009

0.59

0.004

Overall 1,209 +1.61 cm (0.67, 2.55) 0.0008

New Delhi

Overall 578 +2.53 cm (1.08, 3.97) 0.0006

Trivandrum

Overall 574 +0.87 cm (-0.35, 2.08) 0.16

Characteristic NEffect size per T allele

(95% CI)Ptrend

Interaction

by PA

June 26, 2014 Meta-analysis of G x E 42 / 43

Page 75: The role of environmental heterogeneity in meta-analysis ... · Workshop on Emerging Statistical Challenges and Methods for Analysis of Massive Genomic Data in Complex Human Disease

Conclusion

PloS Medicine, 2011

PloS Genetics, 2013

Note the much larger sample sizes

June 26, 2014 Meta-analysis of G x E 43 / 43