Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins...

Preview:

Citation preview

Ariadna QuattoniXavier Carreras

An Efficient Projection for l1,∞ Regularization

Michael Collins Trevor Darrell

MIT CSAIL

Goal : Efficient training of jointly sparse models in high dimensional spaces.

Joint Sparsity

Why? : Learn from fewer

examples. Build more efficient

classifiers. Interpretability.

3

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 11w

2,4w

2,3w

2,2w

3,1w

3,5w

3,4w

3,3w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

4

Church Airport Grocery Store

Flower-Shop

2,1w

2,5w

11w 11w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,1w

4,5w

4,4w

4,3w

4,2w

1,1w

1,5w

1,4w

1,3w

1,2w

5

Church Airport Grocery Store

Flower-Shop

11w 11w

2,4w

2,3w

2,2w

3,5w

3,4w

3,2w

4,4w

4,2w

1,4w

1,2w

l1,∞ Regularization How do we promote joint (i.e. row) sparsity ?

Coefficients forfeature 2

Coefficients for task 2

An l1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.

Use few features

The l∞ norm on each row promotes non-sparsity on each row.

Share parameters

Contributions An efficient projected gradient method for l1,∞ regularization

Our projection works on O(n log n) time, same cost as l1 projection

Experiments in Multitask image classification problems

We can discover jointly sparse solutions

l1,∞ regularization leads to better performance than l2 and l1 regularization

Multitask Application

},...,,{ 21 mDDDD

Joint SparseApproximation

1D2D mD

Collection of Tasks

l1,∞ Regularization: Constrained Convex

Optimization Formulation

We use a Projected SubGradient method. Main advantages: simple, scalable, guaranteed convergence rates.

A convex function

Convex constraints

Projected SubGradient methods have been recently proposed:

l2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007]

l1 regularization [Duchi et al. 2008]

Ariadna
computing the subgradients is trivial.So the question is weather the projection can be computed efficiently.

10

Euclidean Projection into the l1-∞ ball

11

Characterization of the solution

Series1

02468

10

Series1

02468

10

Series1

02468

10

Series1

02468

10

Characterization of the solution

Feature I Feature II Feature III Feature IV

Mapping to a simpler problem We can map the projection problem to the following problem which finds the optimal maximums μ:

Efficient Algorithm

14

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Efficient Algorithm

15

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

Se-ries

1

02468

10

16

Complexity

The total cost of the algorithm is dominated by sorting the entries of A.

The total cost is in the order of:

17

Synthetic Experiments

Generate a jointly sparse parameter matrix W:

For every task we generate pairs:where:

We compared three different types of regularization :

l1,∞ projection l1 projection l2 projection

18

Synthetic Experiments

19

Dataset: Image Annotation

40 top content words Raw image representation: Vocabulary Tree(Nister and Stewenius 2006)

11000 dimensions

president actress team

20

Results

Most of the differences are statistically significant

21

Results

22

Dataset: Indoor Scene Recognition

67 indoor scenes. Raw image representation: similarities to a set of unlabeled images. 2000 dimensions.

bakery bar Train station

Results

Conclusions

We proposed an efficient global optimization algorithm for l1,∞ regularization.

We presented experiments on image classification tasks and shown that our method can recover jointly sparse solutions.

A simple an efficient tool to implement an l1,∞ penalty, similar to standard l1 and l2 penalties.