Multi-Label Prediction via Compressed Sensing

Multi-Label Prediction via Compressed Sensing

By

Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang

(NIPS 2009)

Presented by: Lingbo LiECE, Duke University

01-22-2010

* Some notes are directly copied from the original paper.

Outline

• Introduction

• Preliminaries

• Learning Reduction

• Compression and Reconstruction

• Empirical Results

• Conclusion

Introduction

• Large database of images;

• Goal: predict who or what is in a given image

Samples: images with corresponding labels

is the total number of entities in the whole database.

• One-against-all algorithm:

Learn a binary predictor for each label (class).

Computation is expensive when is large. e.g. ,

• Assume the output vector is sparse.

310 410

xd

dyyyy }1,0{),...,,( 21

d

d

Introduction

{ , , J , , , }Mike James ulie Nick Joe Linda

17y

11

1

11

1

8y5y

97y

31y

56y

{0,1}dy

image x

Main idea: “Learn to predict compressed label vectors, and then use sparse reconstruction algorithm to recover uncompressed labels from these predictions”

Compressed sensing:For any sparse vector , it is highly possible to compress to logarithmic in dimension with perfect reconstruction of .

y

d y

Preliminaries

• : input space; • : output (label) space, where • Training data: • Goal: to learn the predictor with low mean-

squared error

Assume• is very large;• Expected value is sparse, with only a few non-zero

entries.

d

Learning reduction

• Linear compression function where

• Goal: to learn a predictor

Predict the label y with the Predictor F

Predict the compressed label Ay with

the Predictor H

Samples Compressed Samples

To minimize To minimize

Reduction-training and prediction

Reconstruction Algorithm R:

If is close to , then should be close to

Compression Functions

Examples of valid compression functions:

Reconstruction Algorithms

Examples of valid reconstruction algorithms: iterative and greedy algorithms

• Orthogonal Matching Pursuit (OMP)

• Forward-Backward Greedy (FoBa)

• Compressive Sampling Matching Pursuit (CoSaMP)

General Robustness Guarantees

What if the reduction create a problem harder to solve than the original problem?

Sparsity error is defined as

where is the best k-sparse approximation of y

Linear Prediction

• If there is a perfect linear predictor of , then there will be a perfect linear predictor of :

•

•

Experimental Results• Experiment 1: Image data (collected by the ESP Game)

65k images, 22k unique labels; Keep the 1k most frequent labels;

the least frequent occurs 39 times while the most frequent occurs about 12k times, 4 labels on average per image;

Half of the data as training and half as testing.

• Experiment 2: Text data (collected from http://delicious.com/)

16k labeled web page, 983 unique labels;

the least frequent occurs 21 times, the most frequent occurs about 6500 times, 19 labels on average per web page;

Half of the data as training and half as testing.

• Compression function A: select m random rows of the Hadamard matrix.

• Test the greedy and iterative reconstruction algorithm: OMP, FoBa, CoSaMp and Lasso.

• Use correlation decoding (CD) as a baseline method for comparisons.

10241024

Experimental Results

MeasureMeasure the precision

2

2

k22 y distance yl

Top two: image data; Bottom: text data

Conclusion

• Application of compressed sensing to multi-label prediction problem with output sparsity;

• Efficient reduction algorithm with the number of predictions equal to logarithmic in original labels;

• Robustness Guarantees from compressed case to the original case; and vice versa for the linear prediction setting.

Documents

Multi-Label Prediction via Compressed Sensing