23
Conditional Dependence Estimation via Shannon Capacity: Axioms, Estimators and Applications Pramod Viswanath University of Illinois at Urbana-Champaign 09/23/2016 Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 1 / 21

Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Conditional Dependence Estimation via ShannonCapacity: Axioms, Estimators and Applications

Pramod Viswanath

University of Illinois at Urbana-Champaign

09/23/2016

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 1 / 21

Page 2: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Motivation

A measure τ(PX ,Y ) captures how X influences Y .

Choices of τ(PX ,Y ):

I Linear: Pearson Correlation Coefficient: ρX ,Y = Cov(X ,Y )√Var(X )Var(Y )

.

I Nonlinear: Mutual Information I (X ;Y ) = E[log

pxypxpy

].

Sometimes they don’t work well.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 2 / 21

Page 3: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Motivation

Let X ,Y be binary random variables; X denoting smoking and Ydenoting lung-cancer. X → Y .

Hypothetical city: PX is small but PY |X is large.

Mutual Information is very small.

Depends on joint distribution PX ,Y : Factual influence measure.

Depends only on conditional distribution PY |X : Potential influencemeasure.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 3 / 21

Page 4: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Motivation

Let X ,Y be binary random variables; X denoting smoking and Ydenoting lung-cancer. X → Y .

Hypothetical city: PX is small but PY |X is large.

Mutual Information is very small.

Depends on joint distribution PX ,Y : Factual influence measure.

Depends only on conditional distribution PY |X : Potential influencemeasure.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 3 / 21

Page 5: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Potential Influence Measures

Focus of this talk: Potential influence measures.

Let the channel from X to Y be given as PY |X .

A potential influence measure on the space of conditionaldistributions.

τ : PY |X 7→ R+.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 4 / 21

Page 6: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

UMI: Uniform Mutual Information

UMI , I (UXPY |X ).

X ∼ UX −→ PY |X −→ Y .

Pros: Potential influence measure.

Cons: Requires support of X to be discrete or compact.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 5 / 21

Page 7: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

CMI: Capacitated Mutual Information

Shannon capacity CMI , maxQXI (QXPY |X ).

X ∼ QX −→ PY |X −→ Y .

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 6 / 21

Page 8: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Estimators

Question: How to estimate UMI and CMI from samples of (X ,Y )?

Real valued, high dimensional, (X ,Y ).

Key Issue: Samples are drawn from PXPY |X but need to estimateMI for UXPY |X .

For CMI: need to do optimization and estimation jointly.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 7 / 21

Page 9: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Possible Approaches

Joint kernel density estimation for PX ,Y .

I Need numerical integration.

I May overkill.

Discretization based algorithms.

I May be sensitive.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 8 / 21

Page 10: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Mutual Information Estimators

I (X ;Y ) = H(X ) + H(Y )− H(X ,Y ).

Pros: We have good entropy estimators.

I Nearest neighbor approach.

Cons: Not adaptable to UMI and CMI.

Want other MI estimators to be adaptable to UMI and CMI.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 9 / 21

Page 11: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Adapting to UMI and CMI

Inspired work by [Kraskov, Stogbauer and Grassberger, 2004]:

IKSG(X ;Y ) = log(k − 1

2) + log(N)− 1

N

N∑i=1

(log(nx ,i ) + log(ny ,i )

).

nx ,i : number of samples within `∞ distance of k-nearest distance ρk,iin X -dimension.

Importance sampling:

I Compute functional of QXPY |X whereas samples from PXPY |X .

I Weights: wi = QX (Xi )/PX (Xi ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 10 / 21

Page 12: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Adapting to UMI and CMI

Inspired work by [Kraskov, Stogbauer and Grassberger, 2004]:

IKSG(X ;Y ) = log(k − 1

2) + log(N)− 1

N

N∑i=1

(log(nx ,i ) + log(ny ,i )

).

nx ,i : number of samples within `∞ distance of k-nearest distance ρk,iin X -dimension.

Importance sampling:

I Compute functional of QXPY |X whereas samples from PXPY |X .

I Weights: wi = QX (Xi )/PX (Xi ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 10 / 21

Page 13: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Adapting to UMI and CMI

Inspired work by [Kraskov, Stogbauer and Grassberger, 2004]:

I(w)KSG(X ;Y ) = log(k − 1

2) + log(N)− 1

N

N∑i=1

wi

(log(nx ,i ) + log(ny ,i )

).

nx ,i : number of samples within `∞ distance of k-nearest distance ρk,iin X -dimension.

Importance sampling:

I Compute functional of QXPY |X whereas samples from PXPY |X .

I Weights: wi = QX (Xi )/PX (Xi ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 11 / 21

Page 14: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Adapting to UMI and CMI

Inspired work by [Kraskov, Stogbauer and Grassberger, 2004]:

I(w)KSG(X ;Y ) = log(k − 1

2) + log(N)− 1

N

N∑i=1

wi

(log(nx ,i ) + log(ny ,i )

).

nx ,i : Weighted number of samples within `∞ distance of k-nearestdistance ρk,i in X -dimension.

Importance sampling:

I Compute functional of QXPY |X whereas samples from PXPY |X .

I Weights: wi = QX (Xi )/PX (Xi ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 12 / 21

Page 15: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Adapting to UMI and CMI

UMI : Let QX = UX , weights are inversely proportional to densityestimate.

Optimize over weights for CMI:

ICMI(X ;Y ) = maxwi :

∑ni=1 wi=1

I(w)KSG(X ;Y ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 13 / 21

Page 16: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Application

Single Cell Flow Cytometry [Krishnaswamy et al, Science 2014]:

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 14 / 21

Page 17: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Single Cell Cytometry Application

Single Cell Flow Cytometry:

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 15 / 21

Page 18: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Single Cell Cytometry Application

Single Cell Flow Cytometry:

Succeed if Peak(X → Y ) ≤ Peak(Y → Z ) ≤ Peak(Z →W ).

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 16 / 21

Page 19: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Single Cell Cytometry Application

Significant reduction in sample complexity.

Figure: Figure 6(a,b) of [Krishnaswamy, et. al.]

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 17 / 21

Page 20: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Axiomatic View of potential influence measure

Why choose UMI and CMI as potential influence measures?

(0) Independence.

(1) Data Processing.

(2) Additivity.

(3) Monotonicity: τ monotonic function of the range. Range = ConvexHull (PY |X=x).

Sx: simplex of pdf on X Sy: simplex of pdf on X

PY|X

PX

PY

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 18 / 21

Page 21: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Axiomatic View of influence measure τ

UMI satisfies independence, data-processing, additivity axioms.

UMI is NOT monotonic.

CMI satisfies all the axioms.

Other measures?

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 19 / 21

Page 22: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Summary

Novel potential measures of influence of X on Y .

I UMI, CMI.

Estimators of UMI and CMI from sample.

I Nearest neighbor methods.I KSG mutual information estimator.I Importance sampling.

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 20 / 21

Page 23: Conditional Dependence Estimation via Shannon Capacity ...publish.illinois.edu/frontiers-big-data-symposium/... · Signi cant reduction in sample complexity. Figure:Figure 6(a,b)

Collaborators

Weihao Gao Sreeram Kannan Sewoong Oh

Pramod Viswanath (UIUC) Conditional Dependence Estimation September 23, 2016 21 / 21