12
The Information Sieve Greg Ver Steeg and Aram Galstyan Soup = data “Main ingredien t” extracted at each layer

ICML 2016: The Information Sieve

Embed Size (px)

Citation preview

Page 1: ICML 2016: The Information Sieve

The Information SieveGreg Ver Steeg and Aram Galstyan

Soup = data

“Main ingredient” extracted at each layer

Page 2: ICML 2016: The Information Sieve

Factorial code

• Carry recipe instead of soup• Missing ingredients?• Make more soup

• Compression• Prediction• Generative model

Recipe-Ingredient 1-Ingredient 2-… Invertible transform that makes

components independent

Finding such a transform is a generally intractable problem.We use a sequence that incrementally removes dependence

Page 3: ICML 2016: The Information Sieve

Two Steps1.Find the most informative function of the input data2. Transform the data to remove the

information in Yk, and then repeat

Soup

Main ingredient

Page 4: ICML 2016: The Information Sieve

The main ingredient: multivariate information• Multivariate mutual information, or Total Correlation (Watanabe, 1960)

• TC(X|Y) = 0 if and only if Y “explains” all the dependence in X• So we search for Y that minimizes TC(X|Y) • Equivalently, we define the total correlation explained by Y as:

Page 5: ICML 2016: The Information Sieve

The main ingredient:Total Correlation Explanation (CorEx)

• Optimize over all probabilistic functions• Solution has special form that makes it tractable• Computational complexity is linear in the number of variables

Page 6: ICML 2016: The Information Sieve

Sift out the main ingredient: remainder info

The remainder is a transformation of the inputs with 2 properties:

Soup

Remainder contains no info about Y

Transformation is invertible

Page 7: ICML 2016: The Information Sieve

Iterative sifting as:

Multivariate mutual information in data (Total Correlation)

Contribution from each layer of the sieve (optimized)

Remainder (at layer r)

Decomposition of information

Page 8: ICML 2016: The Information Sieve

Iterative sifting as:

Dependence at each layer of the sieve decreases until we get to zero, i.e. complete independence

Dependence (at layer r)

Extracting dependence

Page 9: ICML 2016: The Information Sieve

Recover spatial clusters from fMRI data

Ground truth ICA Sieve

Example of recovering spatial clusters in brain data from temporal activation patterns

Page 10: ICML 2016: The Information Sieve

Lossy compression and in-painting• Sieve representation with 12 layers/bits/binary latent factors on

MNIST digits

We can use the sieve for standard prediction and generative model tasks

Page 11: ICML 2016: The Information Sieve

Lossless compression (on MNIST)• Same size codebooks for Random and Sieve-based codes• (gzip is sequence-based, shown for reference)Proof of principle for lossless compression; though specialized

compression techniques are better on MNIST.

Method Naive gzip Random codebook

Sieve codebook

Bits per digit 784 328 267 243

Page 12: ICML 2016: The Information Sieve

Conclusion

• Incrementally decomposing multivariate information is useful, practical, and delicious• Could improve with joint optimization and better

transformations for remainder info

Link to all papers and codehttp://bit.ly/corex_info

Contact: [email protected], [email protected]

• The extension to continuous random variables is nontrivial but more practical and demonstrates connections to “common information”: “Sifting Common Information from Many Variables”, arXiv:1606.02307.