A model for interpretable high dimensional interactions

A Model for Interpretable High Dimensional

Interactions

Sahir Rai Bhatnagar

Joint work with Yi Yang, Mathieu Blanchette and Celia Greenwood

Poster Number 67

Motivation

one predictor variable at a time

Predictor Variable Phenotype

Test 1

Test 2

Test 3

Test 4

Test 5

one predictor variable at a time

Test 1

Test 2

Test 3

Test 4

Test 5

a network based view

Test 1

system level changes due to environment

Predictor Variable PhenotypeEnvironment

Test 1

system level changes due to environment

Predictor Variable PhenotypeEnvironment

Test 1

Motivating Dataset: Newborn epigenetic adaptations to gesta-

tional diabetes exposure (Luigi Bouchard, Sherbrooke)

Environment

Gestational

Diabetes

Large Data

Child’s epigenome

(p ≈ 450k)

Phenotype

Obesity measures

Differential Correlation between environments

(a) Gestational diabetes affected pregnancy (b) Controls

formal statement of initial problem

• n: number of subjects

• p: number of predictor variables

• Xn×p: high dimensional data set (p >> n)

• Yn×1: phenotype

• En×1: environmental factor that has widespread effect on X and can

modify the relation between X and Y

Objective

• Which elements of X that are associated with Y , depend on E?

Objective

Methods

ECLUST - our proposed method: 3 phases

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

Original Data

1) Gene Similarity

2) Cluster

Representation

n × 1 n × 1

3) Penalized

Regression

Yn×1∼ + ×E

the objective of statistical

methods is the reduction of data.

A quantity of data . . . is to be

replaced by relatively few quantities

which shall adequately represent

. . . the relevant information

contained in the original data.

- Sir R. A. Fisher, 1922

g(µ) =β0 + β1X1 + · · ·+ βpXp + βEE︸︷︷︸main effects

+α1E (X1E ) + · · ·+ αpE (XpE )︸︷︷︸interactions

Reparametrization1: αjE = γjEβjβE .

Strong heredity principle2:

αjE 6= 0 ⇒ βj 6= 0 and βE 6= 0

1Choi et al. 2010, JASA2Chipman 1996, Canadian Journal of Statistics

αjE 6= 0 ⇒ βj 6= 0 and βE 6= 0

Strong Heredity Model with Penalization

arg minβ0,β,γ

2‖Y − g(µ)‖2 +

λβ (w1β1 + · · ·+ wqβq + wEβE ) +

λγ (w1Eγ1E + · · ·+ wqEγqE )

∣∣∣∣∣ 1

∣∣∣∣∣ , wjE =

∣∣∣∣∣ βj βEαjE

∣∣∣∣∣

Results

Simulation Study: Jaccard Index and test set MSE

Open source software

• Software implementation in R: http://sahirbhatnagar.com/eclust/

• Allows user specified interaction terms

• Automatically determines the optimal tuning parameters through

cross validation

• Can also be applied to genetic data

Conclusions

Conclusions and Contributions

• Large system-wide changes are observed in many environments

• This assumption can possibly be exploited to aid analysis of large

• We develop and implement a multivariate penalization procedure for

predicting a continuous or binary disease outcome while detecting

interactions between high dimensional data (p >> n) and an

environmental factor.

• Dimension reduction is achieved through leveraging the

environmental-class-conditional correlations

• Also, we develop and implement a strong heredity framework

within the penalized model

• R software: http://sahirbhatnagar.com/eclust/

Limitations

• There must be a high-dimensional signature of the exposure

• Clustering is unsupervised

• Two tuning parameters

• Need more samples . . . Got data? (Poster 67)

Limitations

acknowledgements

• Dr. Celia Greenwood

• Dr. Blanchette and Dr. Yang

• Dr. Luigi Bouchard, Andre Anne

• Dr. Steele, Dr. Kramer,

Dr. Abrahamowicz

• Maxime Turgeon, Kevin

McGregor, Lauren Mokry,

Dr. Forest

• Greg Voisin, Dr. Forgetta,

Dr. Klein

• Mothers and children from the

A model for interpretable high dimensional interactions

Science

Exploring atom-photon interactions in one dimensional ... · Exploring atom-photon interactions in one dimensional waveguide and its applications Bong Kok Wei Supervised by Professor

Modular assembly of low-dimensional coordination ... · interactions between the adsorbates there can be strong and irreversible interactions with the surface, e.g. surface alloying

Interpretable Linear Modeling - Piazza

Interpretable machine learning

High Dimensional Forecasting via Interpretable Vector ... · High Dimensional Forecasting via Interpretable Vector Autoregression William B. Nicholsony, Ines Wilms z, Jacob Bien x,

Interpretable Biomanufacturing Process Risk and

Interpretable and Globally Optimal Prediction for Textual ...papers.nips.cc/paper/6787-interpretable-and... · Interpretable and Globally Optimal Prediction for Textual Grounding

GIF: Generative Interpretable Faces

Interpretable Intuitive Physics Model - openaccess.thecvf.comopenaccess.thecvf.com/content_ECCV_2018/papers/Tian_Ye_Interpretable... · Interpretable Intuitive Physics Model 3 unseen

Analysis of Evolutionary Algorithms on the One-Dimensional Spin Glass with Power-Law Interactions

High Dimensional Forecasting via Interpretable Vector ... · approaches, such as the lasso in VAR estimation. Traditional approaches address overpa-rameterization by selecting a low

Two-dimensional conductors with interactions and disorder ... · PDF filearXiv:1709.07005v1 [cond-mat.str-el] 20 Sep 2017 Two-dimensional conductors with interactions and disorder

Logic Regression and Interactions in High Dimensional ...iruczins/presentations/ruczinski.03.03.jhu... · Logic Regression and Interactions in High Dimensional Genomic Data ... STAGE

Seeking Interpretable Models for High Dimensional Data

Towards Interpretable Face Recognition

High-Dimensional Interactions Detection with Sparse ... · There also has been active development on methods for detecting and estimating the interactions in high-dimensional regression

orthogonal non-covalent interactions Three-dimensional ... · Supporting Information of Three-dimensional protein assembly directed by orthogonal non-covalent interactions Guang Yanga,

Seeking Interpretable Models for High Dimensional Data Bin Yu Statistics Department, EECS Department University of California, Berkeley binyu

Methods for High Dimensional Interactions

Collective phenomena in a quasi-two-dimensional system of ...cmt.harvard.edu/demler/PUBLICATIONS/ref154.pdfphysics of quasi-two-dimensional fermions with anisotropic interactions