92
Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle

Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Learning Statistical Property Testers

Sreeram KannanUniversity of Washington Seattle

Page 2: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Collaborators

ArmanRahimzamani

HimanshuAsnani

University of Washington, Seattle

Sudipto Mukherjee

Rajat Sen

KarthikeyanShanmugan

UT, Austin IBM Research

Page 3: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Statistical Property Testing

✤ Closeness testing

✤ Independence testing

✤ Conditional Independence testing

✤ Information estimation

Page 4: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Testing Total Variation Distance

255075

100

QP

Page 5: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Testing Total Variation Distance

255075

100

QP

n samples n samples

Page 6: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

255075

100

QP

n samples n samples

Estimate DTV (P,Q) ?

Testing Total Variation Distance

Page 7: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

255075

100

QP

n samples n samples

Search beyond Traditional Density Estimation Methods

Estimate DTV (P,Q) ?

Testing Total Variation Distance

P and Q can be arbitrary.

Page 8: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Testing Total Variation: Prior Art

✤ Lots of work in CS theory on DTV testing

✤ Based on closeness testing between P and Q

✤ Sample complexity = O(na), where n = alphabet size

✤ Curse of dimensionality if n = 2d Complexity is O(2ad)

* Chan et al, Optimal Algorithms for testing closeness of discrete distributions, SODA 2014

* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009

Page 9: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Classifiers beat curse-of-dimensionality

✤ Deep NN and boosted random forests achieve state-of-the-art performance

✤ Works very well even in practice when X is high dimensional.

✤ Exploits generic inductive bias:

✤ Invariance

✤ Hierarchical Structure

✤ Symmetry

Theoretical guarantees lag severely behind practice!

Page 10: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Distance Estimation via Classification

255075

100

n samples ⇠ P n samples ⇠ Q

Page 11: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Distance Estimation via Classification

255075

100

n samples ⇠ P n samples ⇠ Q

(Label 0) (Label 1)

Classifier

Page 12: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Distance Estimation via Classification

255075

100

n samples ⇠ P n samples ⇠ Q

(Label 0) (Label 1)

Classifier

12 � 1

2DTV(P,Q).Classification Error of Optimal Bayes

Classifier=

Page 13: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Distance Estimation via Classification

255075

100

n samples ⇠ P n samples ⇠ Q

(Label 0) (Label 1)

Deep NN, Boosted Trees etc.

12 � 1

2DTV(P,Q).Classification Error of

Optimal Classifier =

* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017

* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009

Page 14: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Distance Estimation via Classification

255075

100

n samples ⇠ P n samples ⇠ Q

(Label 0) (Label 1)

Deep NN, Boosted Trees etc.

12 � 1

2DTV(P,Q).Classification Error of

Any Classifier >=

* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017

* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009

Can getP-valuecontrol

Page 15: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testing

n samples {xi, yi}ni=1

* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017

* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009

Page 16: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testing

n samples {xi, yi}ni=1

nH0 : X || Y (PCI)

H1 : X 6?? Y (P)

Page 17: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testing

n samples {xi, yi}ni=1

nH0 : X || Y (PCI)

H1 : X 6?? Y (P)

Classify

P(p(x, y))

PCI(p(x)p(y))

Page 18: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testing

n samples {xi, yi}ni=1

nH0 : X || Y (PCI)

H1 : X 6?? Y (P)

Classify

P(p(x, y))

PCI(p(x)p(y))PCI(p(x)p(y))

Page 19: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testing

n samples {xi, yi}ni=1

nH0 : X || Y (PCI)

H1 : X 6?? Y (P)

Classify

P(p(x, y))

PCI(p(x)p(y))

Permutation

PCI(p(x)p(y))

Page 20: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

Split Equally

Page 21: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Page 22: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Label 0

Page 23: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Label 0 yi’s are permuted

Page 24: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Label 0 yi’s are permuted

PCI(p(x)p(y))

Page 25: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Label 0 yi’s are permuted

PCI(p(x)p(y))

Label 1

Page 26: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Independence Testingn samples {xi, yi}ni=1

P(p(x, y))

Split Equally

Label 0 yi’s are permuted

PCI(p(x)p(y))

Label 1

*Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017

* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009

P-value control

Page 27: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

vs

Page 28: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)p(y|z))

Page 29: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)p(y|z))How to get PCI(p(z)p(x|z)p(y|z)?

Page 30: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)p(y|z))Given samples ⇠ p(x, z)How to emulate p(y|z)?

Page 31: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

✤ KNN Based Methods

✤ Kernel Methods

P(p(x, y, z))

PCI(p(z)p(x|z)p(y|z))

Emulate p(y|z) as q(y|z)

Page 32: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)q(y|z))

PCI(p(z)p(x|z)q(y|z))

Emulate p(y|z) as q(y|z)✤ KNN Based

Methods

✤ Kernel Methods

Page 33: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)q(y|z))

PCI(p(z)p(x|z)q(y|z))

Emulate p(y|z) as q(y|z)✤ KNN Based

Methods

✤ Kernel Methods

✤ [KCIT] Gretton et al, Kernel-based conditional independence test and application in causal discovery, NIPS 2008

✤ [KCIPT] Doran et al, A permutation-based kernel conditional independence test, UAI 2014

✤ [CCIT] Sen et al, Model-Powered Conditional Independence Test, NIPS 2017

✤ [RCIT] Strobl et al, Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery, arXiv

Page 34: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional Independence Testing

n samples {xi, yi, zi}ni=1

n H0 : X || Y |Z (PCI)

H1 : X 6?? Y |Z (P)

Classify

vs

P(p(x, y, z))

PCI(p(z)p(x|z)q(y|z))

PCI(p(z)p(x|z)q(y|z))

Emulate p(y|z) as q(y|z)✤ KNN Based

Methods

✤ Kernel Methods

✤ Limited to low-dimensional Z.

In practice, Z is often high dimensional.

(Eg. In graphical model, conditioning set can be entire graph.)

Page 35: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Generative Models beat curse-of-dimensionality

Generatorz xLow-dimensional

Latent SpaceHigh-dimensional

data Space

Page 36: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Generative Models beat curse-of-dimensionality

Generatorz x

✤ Trained Real Samples of x

✤ Can generate any number of new samples

Low-dimensionalLatent Space

High-dimensionaldata Space

Page 37: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Generative Models beat curse-of-dimensionality

Generatorz x

✤ Trained Real Samples of x

✤ Can generate any number of new samples

Low-dimensionalLatent Space

High-dimensionaldata Space

Page 38: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

How loose can the estimate be for PCI or q(y|z)?

Page 39: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

As long as the density function q(y|z) > 0 whenever p(y, z) > 0.

How loose can the estimate be for PCI or q(y|z)?

Mimic-and-Classify works

Page 40: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

As long as the density function q(y|z) > 0 whenever p(y, z) > 0.

Mimic Functions : GANs, Regressors etc.

How loose can the estimate be for PCI or q(y|z)?

Novel Bias Cancellation Method in Mimic-and-Classify works

Page 41: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and ClassifyMimic Step

Classify Step

Page 42: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100Mimic Step

Classify Step

D ⇠ p(x, y, z)

Page 43: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

50100

50100

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

Page 44: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

50100

50100

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

Dataset D2

(xi, yi, zi) zi y’i (xi, y’i, zi)

Dataset D’MIMIC

Page 45: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Page 46: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Page 47: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Page 48: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Page 49: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Classification Error : Exyz

Page 50: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D D�x

Drop x

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Classification Error : Exyz

Page 51: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D D�x

Drop x

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Classification Error : Classification Error : Exyz

Eyz

Page 52: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D D�x

Drop x

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Classification Error : Classification Error : Exyz

EyzD(p(xyz)|p(xz)q(y|z)) D(p(yz)|p(z)q(y|z))

Page 53: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify50

100

MIMIC

50100

50100

50100

(Label 0) (Label 1)D = D1 [D0

D D�x

Drop x

Mimic Step

Classify Step

D1 ⇠ p(x, y, z)

D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)

D

0 ⇠ p(z)p(x|z)q(y|z)

Classification Error : Classification Error : Exyz

EyzD(p(xyz)|p(xz)q(y|z)) D(p(yz)|p(z)q(y|z))

Statistic = Exyz-Eyz Cancels bias due to q(y|z)

Page 54: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and ClassifyMimic Step

As long as the density function q(y|z) > 0 whenever p(y, z) > 0.

Classify Step

Page 55: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and ClassifyMimic Step

*The errors here are the corresponding optimal Bayes classifier errors.

As long as the density function q(y|z) > 0 whenever p(y, z) > 0.

Classify Step |E

D

[Exyz

]�ED

[Eyz

]| = 0 $ H0 is true

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 56: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify (Theory)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 57: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify (Theory)

�Z

y,z

min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 58: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify (Theory)

�Z

y,z

min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)

Where: ✏(y, z) = max

⇡2⇧(p(x|z),p(x0|y,z))E⇡[1{x=x

0}|y, z]

Conditional dependence $ ✏(y, z) < 1 with non-zero probability

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 59: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mimic and Classify (Theory)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))

�Z

y,z

min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)

Where: ✏(y, z) = max

⇡2⇧(p(x|z),p(x0|y,z))E⇡[1{x=x

0}|y, z]

Conditional dependence $ ✏(y, z) < 1 with non-zero probability

As long as the density function q(y|z) > 0 whenever p(y, z) > 0,

then conditional dependence implies that 2|ED[e]� ED[e�x

)]| > 0.

Theorem 1

2|ED

[Exyz

]�ED

[Eyz

]|

As long as the density function q(y|z) > 0 whenever p(y, z) > 0,

then conditional dependence implies that 2|ED[e]� ED[e�x

)]| > 0.

2|ED

[Exyz

]�ED

[Eyz

]| > 0

Page 60: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).

DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

Mimic and Classify (Theory)

Page 61: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).

DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

Mimic and Classify (Theory)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 62: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).

DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

Mimic and Classify (Theory)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 63: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).

DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))= DTV(p(x,y, z), p(x|z)p(z)q(y|z))

Mimic and Classify (Theory)

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E

D

[Exyz

]�ED

[Eyz

]|

Page 64: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Conditional independence implies that 2|ED[e]� ED[e�x

)]| = 0

Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).

DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))

2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))

= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))= DTV(p(x,y, z), p(x|z)p(z)q(y|z))

Mimic and Classify (Theory)

Theorem 2

2|ED

[Exyz

]�ED

[Eyz

]| = 0

2|ED

[Exyz

]�ED

[Eyz

]|

Page 65: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Combining Theorem 1 and Theorem 2

Mimic and Classify (Theory)

Theorem 3

As long as the density function q(y|z) > 0 when p(y, z) > 0

|ED

[Exyz

]�ED

[Eyz

]| = 0 $ H0 is true

Page 66: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

MIMIFY - CGAN

Deep Learning based MIMIC Functions

GeneratorG(z,s)

s

z

DiscriminatorD(y,z)

(y,z)

[0,1]Gaussian Latent Space

Page 67: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

MIMIFY - CGAN

Deep Learning based MIMIC Functions

GeneratorG(z,s)

s

z

DiscriminatorD(y,z)

(y,z)

[0,1]

⇠ q(y|z)

Gaussian Latent Space

Page 68: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

MIMIFY - CGAN

Deep Learning based MIMIC Functions

MIMIFY - REG

GeneratorG(z,s)

s

z

DiscriminatorD(y,z)

(y,z)

[0,1]

⇠ q(y|z)

Gaussian Latent Space

Regress to estimate r(z) = E[Y |Z = z]

Page 69: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

MIMIFY - CGAN

Deep Learning based MIMIC Functions

MIMIFY - REG

GeneratorG(z,s)

s

z

DiscriminatorD(y,z)

(y,z)

[0,1]

⇠ q(y|z)

Gaussian Latent Space

Regress to estimate r(z) = E[Y |Z = z]

y = r(z)+ Gaussian Noise ⇠ q(y|z)

Page 70: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

MIMIFY - CGAN

Deep Learning based MIMIC Functions

MIMIFY - REG

GeneratorG(z,s)

s

z

DiscriminatorD(y,z)

(y,z)

[0,1]

⇠ q(y|z)

Gaussian Latent Space

Regress to estimate r(z) = E[Y |Z = z]

y = r(z)+ Gaussian Noise ⇠ q(y|z)(or, laplacian noise)

Page 71: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Post-Nonlinear Noise Synthetic Experiments: AUROC

Experiments

Page 72: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Flow-cytometry Data

Experiments

Page 73: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Gene Regulatory Network Inference (DREAM)

Experiments

Page 74: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Estimating Information Measures

Page 75: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

255075

100

QP

n samples n samples

Estimating Kullback-Leibler Distance

Estimate DKL(P k Q) ?

Page 76: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

255075

100

QP

n samples n samples

Estimating Kullback-Leibler Distance

Estimate DKL(P k Q) ?

Curse of dimensionality: Sample complexity O(n/log n) with n = 2d

Page 77: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Neural Network Approximation

255075

100

n samples ⇠ P n samples ⇠ Q

Page 78: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

Neural Network Approximation

255075

100

n samples ⇠ P n samples ⇠ Q

Donsker-Varadhan Dual Representation:

DKL(P k Q) = supT EP [T ]� log(EQ[eT ])

Page 79: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

MINE: Neural Network Approximation

255075

100

n samples ⇠ P n samples ⇠ Q

Donsker-Varadhan Dual Representation:

DKL(P k Q) = supT EP [T ]� log(EQ[eT ])

• T Rich NN class

• E Sample Averages

• supT Obtained via Stochastic Gradient search

* *Benghazi et al, MINE : Mutual Information * Neural Estimation, ICML 2018

Page 80: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

255075

100

MINE is Unstable to Train

255075

100

n samples ⇠ P n samples ⇠ Q

* *Benghazi et al, MINE : Mutual Information * Neural Estimation, ICML 2018

(x100) (x100) (x100)

True !(#; %)

Page 81: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Divergence estimation via Classification

§"#$(& ' = sup,∈ℱ

/0∼2 0 [4 5 ] − log(/0∼; 0 [<, 0 ])

4∗ 5 = ?@A & 5' 5 BC@&DBEF? (RN derivative)

Classifiers can estimate f*

!"

!#

$ = 1

$ = 0

(), (+ ∼ - (), (+

(), (+ ∼ . (), (+

-(0 = )|(), (+)-(0 = 3|(), (+)

Label 0 for q(x)

Label 1 for p(x)

f* = p(l = 1 |x)p(l = 0 |x)

Plug in to DV-bound

Highly stable training! True lower bound

Page 82: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Classifiers require calibration

Require classifiers that are calibrated => Require p(l=1|x)

Can get well-calibrated neural networks

Lakshminarayan et al, Simple and scalable predictive uncertainty estimation using deep ensembles, NeurIPS 17

Page 83: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Mutual Information Estimation

I(X;Y ) = DKL(PXY k PXPY )

Permutated samples

Samples

Classifier-MI: use classifier to estimate I(X;Y)

Theorem-1: NeuralNet-Classifier-MI is consistent

Theorem-2: Classifier-MI is a “true” lower-bound on MI

Page 84: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: MI Estimation

!", $" ∼ & 0, 1 )) 1 , * = !,, !-, …!0 , 1 = $,, $-, …$0 ,!"⊥ $3(5 ≠ 7)

Page 85: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: Conditional MI Estimation

§ModularEstimation

§/ 0; 2 3 = / 0; 2, 3 − /(0; 3)

CGAN/CVAE/1-NN

9:, ;: :<=> 9:′, ;: :<=

> ∼ A(9|;)

Classifier-MIC:, 9:, ;: :<=

>DEF A C, 9, ; A C, ; A(9|;))C:, 9:′, ;: :<=

>

Estimate I(X;Y|Z) = DKL (pXYZ|pXZpY|Z)

Page 86: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: CMI

Model – I (Linear)

! ∼ # 0, 1 , ( ∼ ) −0.5, 0.5 -., / ∼ # (0, 0.01

1 = ! + /

Model – II (Linear)

! ∼ # 0, 1 , ( ∼ # 0, 1 -., / ∼ # 45(, 0.01

1 = ! + /

Non-Linear models :( ∼ # 6, 7-. , ! = 80 90 1 = 8: ;<=! + ;>=( + 9:

80, 8: ∼ {@ABℎ, DEF, exp(−| ⋅ |)}, 90, 9: ∼ #(0, 0.1)

Page 87: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: CMI

"# = %& ' = %&, &&&

Model-1 (linear)

Page 88: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: CMI

Model-2 (linear)

"# = %& ' = %&, &&&

Page 89: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: CMI

Model-3 (Non-linear)

Page 90: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: Conditional Indep Testing

§! ∼ # $, &'( , ) = cos ./! + 12

3 = 5cos 67! + 18 9:) ⊥ 3|!cos =) + 67! + 18 9:) ⊥ 3|!

./, 67 ∼ > 0, 1 '(, ||.|| = 1, 6 = 1,= ∼ > 0,2 , 1B∼ # 0, 0.25 .

Page 91: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Performance: Real Data (Flow Cytometry)

§ 50 CI and 50 NI relations. § Examples –pkc ⊥ akt|raf, mek, p38, pka, jnkpip3 ⊥ pip2|p38, raf, pkc, jnk, erk, akt

§ CCMI (Mean AuROC) = 0.75CCIT (Mean AuROC) = 0.66

Page 92: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,

Open Problems

✤ Closeness testing problems✤ Beyond DTV: Distance measure estimation using classifiers?✤ Stable trainability✤ Time-series data (Directed information estimation and testing)

✤ Statistical property testing✤ Uniformity testing