47
RegML 2016 Class 4 Regularization for multi-task learning Lorenzo Rosasco UNIGE-MIT-IIT June 28, 2016

RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

RegML 2016Class 4

Regularization for multi-task learning

Lorenzo RosascoUNIGE-MIT-IIT

June 28, 2016

Page 2: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Supervised learning so far

I Regression f : X → Y ⊆ RI Classification f : X → Y = {−1, 1}

What next?

I Vector-valued f : X → Y ⊆ RT

I Multiclass f : X → Y = {1, 2, . . . , T}I ...

L.Rosasco, RegML 2016 2

Page 3: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Multitask learning

GivenS1 = (x1i , y

1i )n1i=1, . . . , ST = (xTi , y

Ti )nTi=1

findf1 : X1 → Y1, . . . , fT : XT → YT

I vector valued regression,

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ RT

MTL with equal inputs! Output coordinates are “tasks”

I multiclass

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ {1, . . . , T}

L.Rosasco, RegML 2016 3

Page 4: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Multitask learning

GivenS1 = (x1i , y

1i )n1i=1, . . . , ST = (xTi , y

Ti )nTi=1

findf1 : X1 → Y1, . . . , fT : XT → YT

I vector valued regression,

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ RT

MTL with equal inputs! Output coordinates are “tasks”

I multiclass

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ {1, . . . , T}

L.Rosasco, RegML 2016 4

Page 5: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Why MTL?

Task 1

Task 2

X

X

Y

L.Rosasco, RegML 2016 5

Page 6: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Why MTL?

0 5 10 15 20 250

20

40

60Multi-task

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

50

100

Time (hours)

0 5 10 15 20 250

20

40

60Single-task

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

50

100

Time (hours)

0 5 10 15 20 250

20

40

60Multi-task

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

50

100

Time (hours)

0 5 10 15 20 250

20

40

60Single-task

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

20

40

60

0 5 10 15 20 250

50

100

Time (hours)

Real data!

L.Rosasco, RegML 2016 6

Page 7: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Why MTL?

Related problems:

I conjoint analysis

I transfer learning

I collaborative filtering

I co-kriging

Examples of applications:

I geophysics

I music recommendation (Dinuzzo 08)

I pharmacological data (Pillonetto at el. 08)

I binding data (Jacob et al. 08)

I movies recommendation (Abernethy et al. 08)

I HIV Therapy Screening (Bickel et al. 08)

L.Rosasco, RegML 2016 7

Page 8: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Why MTL?

VVR, e.g. vector fields estimation

Figure 3. Divergence-free synthetic data.Top: samples and our div-free solution; Bot-tom: div of scalar SVR and associated field.

Figure 4. Fluid simulation data. Top: sam-ples and our div-free solution; Bottom: divof scalar SVR and associated vector field.

Figure 5. Measurement PIV data. Top: sam-ples and our div-free solution; Bottom: divof scalar SVR and associated vector field.

Figure 6. Curl-free synthetic data. Top: sam-ples and our curl-free solution; Bottom: curlof scalar SVR and associated vector field.

L.Rosasco, RegML 2016 8

Page 9: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Why MTL?

Component 1

Component 2

X

X

Y

L.Rosasco, RegML 2016 9

Page 10: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Penalized regularization for MTL

err(w1, . . . , wT ) + pen(w1, . . . , wT )

We start with linear models

f1(x) = w>1 x, . . . , fT (x) = w>T x

L.Rosasco, RegML 2016 10

Page 11: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Empirical error

E(w1, . . . , wT ) =

T∑i=1

1

ni

ni∑j=1

(yij − w>i xij)2

I could consider other losses

I could try to “couple” errors

L.Rosasco, RegML 2016 11

Page 12: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Least squares error

We focus on vector valued regression (VVR)

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ RT

1

n

T∑t=1

n∑i=1

(yti − w>t xi)2 =1

n‖ X︸︷︷︸n×d

W︸︷︷︸d×T

− Y︸︷︷︸n×T

‖2F

‖W‖2F = Tr(W>W ), W = (w1, . . . , wT ), Yit = yti i = 1 . . . n t = 1 . . . T

L.Rosasco, RegML 2016 12

Page 13: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Least squares error

We focus on vector valued regression (VVR)

Sn = (xi, yi)ni=1, xi ∈ X, yi ∈ RT

1

n

T∑t=1

n∑i=1

(yti − w>t xi)2 =1

n‖ X︸︷︷︸n×d

W︸︷︷︸d×T

− Y︸︷︷︸n×T

‖2F

‖W‖2F = Tr(W>W ), W = (w1, . . . , wT ), Yit = yti i = 1 . . . n t = 1 . . . T

L.Rosasco, RegML 2016 13

Page 14: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

MTL by regularization

pen(w1 . . . wT )

I Coupling task solutions by regularization

I Borrowing strength

I Exploit structure

L.Rosasco, RegML 2016 14

Page 15: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Regularizations for MTL

pen(w1, . . . , wT ) =

T∑t=1

‖wt‖2

Single tasks regularization!

minw1,...,wT

1

n

T∑t=1

n∑i=1

(yti − w>t xi)2 + λT∑t=1

‖wt‖2 =

T∑t=1

(minwt

1

n

n∑i=1

(yti − w>t xi)2 + λ‖wt‖2)

L.Rosasco, RegML 2016 15

Page 16: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Regularizations for MTL

pen(w1, . . . , wT ) =

T∑t=1

‖wt‖2

Single tasks regularization!

minw1,...,wT

1

n

T∑t=1

n∑i=1

(yti − w>t xi)2 + λ

T∑t=1

‖wt‖2 =

T∑t=1

(minwt

1

n

n∑i=1

(yti − w>t xi)2 + λ‖wt‖2)

L.Rosasco, RegML 2016 16

Page 17: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Regularizations for MTL

I Isotropic coupling

(1− α)

T∑j=1

‖wj‖2 + α

T∑j=1

∥∥∥∥∥wj − 1

T

T∑i=1

wi

∥∥∥∥∥2

I Graph coupling - Let M ∈ RT×T an adjacency matrix, with Mts ≥ 0

T∑t=1

T∑s=1

Mts‖wt − ws‖2 + γ

T∑t=1

‖wt‖2

special case: output divided in clusters

L.Rosasco, RegML 2016 17

Page 18: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Regularizations for MTL

I Isotropic coupling

(1− α)

T∑j=1

‖wj‖2 + α

T∑j=1

∥∥∥∥∥wj − 1

T

T∑i=1

wi

∥∥∥∥∥2

I Graph coupling - Let M ∈ RT×T an adjacency matrix, with Mts ≥ 0

T∑t=1

T∑s=1

Mts‖wt − ws‖2 + γ

T∑t=1

‖wt‖2

special case: output divided in clusters

L.Rosasco, RegML 2016 18

Page 19: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

A general form of regularization

All the regularizers so far are of the form

T∑t=1

T∑s=1

Atsw>t ws

for a suitable positive definite matrix A

L.Rosasco, RegML 2016 19

Page 20: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

MTL regularization revisited

I Single tasks∑Tj=1 ‖wj‖2 =⇒ A = I

I Isotropic coupling

(1− α)

T∑j=1

‖wj‖2 + α

T∑j=1

∥∥∥∥∥∥wj − 1

T

T∑j=1

wj

∥∥∥∥∥∥2

=⇒ A = I − α

T1

I Graph coupling

T∑t=1

T∑s=1

Mts‖wt − ws‖2 + γT∑t=1

‖wt‖2

=⇒ A = L+ γI,

where L graph Laplacian of M

L = D −M, D = diag(∑j

M1,j , . . . ,∑j

MT,j , )

L.Rosasco, RegML 2016 20

Page 21: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

MTL regularization revisited

I Single tasks∑Tj=1 ‖wj‖2 =⇒ A = I

I Isotropic coupling

(1− α)

T∑j=1

‖wj‖2 + α

T∑j=1

∥∥∥∥∥∥wj − 1

T

T∑j=1

wj

∥∥∥∥∥∥2

=⇒ A = I − α

T1

I Graph coupling

T∑t=1

T∑s=1

Mts‖wt − ws‖2 + γT∑t=1

‖wt‖2

=⇒ A = L+ γI,

where L graph Laplacian of M

L = D −M, D = diag(∑j

M1,j , . . . ,∑j

MT,j , )

L.Rosasco, RegML 2016 21

Page 22: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

MTL regularization revisited

I Single tasks∑Tj=1 ‖wj‖2 =⇒ A = I

I Isotropic coupling

(1− α)

T∑j=1

‖wj‖2 + α

T∑j=1

∥∥∥∥∥∥wj − 1

T

T∑j=1

wj

∥∥∥∥∥∥2

=⇒ A = I − α

T1

I Graph coupling

T∑t=1

T∑s=1

Mts‖wt − ws‖2 + γ

T∑t=1

‖wt‖2

=⇒ A = L+ γI,

where L graph Laplacian of M

L = D −M, D = diag(∑j

M1,j , . . . ,∑j

MT,j , )

L.Rosasco, RegML 2016 22

Page 23: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

A general form of regularization

Let W = (w1, . . . , wT ), A ∈ RT×T

Note thatT∑t=1

T∑s=1

Atsw>t ws = Tr(WAW>)

Indeed

Tr(WAW>) =d∑i=1

Wi>AWi =

d∑i=1

T∑t,s=1

AtsWitWis

=

T∑t,s=1

Ats

d∑i=1

WisWir =

T∑t,s=1

Atsw>t ws

L.Rosasco, RegML 2016 23

Page 24: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

A general form of regularization

Let W = (w1, . . . , wT ), A ∈ RT×T

Note thatT∑t=1

T∑s=1

Atsw>t ws = Tr(WAW>)

Indeed

Tr(WAW>) =

d∑i=1

Wi>AWi =

d∑i=1

T∑t,s=1

AtsWitWis

=

T∑t,s=1

Ats

d∑i=1

WisWir =

T∑t,s=1

Atsw>t ws

L.Rosasco, RegML 2016 24

Page 25: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations

1

n‖XW − Y ‖2F + λTr(WAW>)

Consider the SVD A = UΣU>, Σ = diag(σ1, . . . , σT ) let

W = WU, Y = Y U

then we can rewrite the above problem as

1

n‖XW − Y ‖2F + λTr(WΣW>)

L.Rosasco, RegML 2016 25

Page 26: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations

1

n‖XW − Y ‖2F + λTr(WAW>)

Consider the SVD A = UΣU>, Σ = diag(σ1, . . . , σT )

let

W = WU, Y = Y U

then we can rewrite the above problem as

1

n‖XW − Y ‖2F + λTr(WΣW>)

L.Rosasco, RegML 2016 26

Page 27: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations

1

n‖XW − Y ‖2F + λTr(WAW>)

Consider the SVD A = UΣU>, Σ = diag(σ1, . . . , σT ) let

W = WU, Y = Y U

then we can rewrite the above problem as

1

n‖XW − Y ‖2F + λTr(WΣW>)

L.Rosasco, RegML 2016 27

Page 28: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations (cont.)

Fially, rewrite1

n‖XW − Y ‖2F + λTr(WΣW>)

asT∑t=1

(1

n

n∑i=1

(yti − w>t xi)2 + λσt‖wt‖2)

Finally W = WU>

Compare to single task regularization

L.Rosasco, RegML 2016 28

Page 29: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations (cont.)

Eλ(W ) =1

n‖XW − Y ‖2F + λTr(WAW>)

Alternatively

∇Eλ(W ) =2

nX>(XW − Y ) + 2λWA

Wt+1 = Wt − γ∇Eλ(Wt)

Trivially extends to other loss functions.

L.Rosasco, RegML 2016 29

Page 30: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Beyond Linearity

ft(x) = w>t Φ(x), Φ(x) = (φ1(x), . . . , φp(x))

Eλ(W ) =1

n‖ΦW − Y ‖2 + λTr(WAW>),

with Φ matrix with rows Φ(x1), . . . ,Φ(xn)

L.Rosasco, RegML 2016 30

Page 31: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Nonparametrics and kernels

ft(x) =

n∑i=1

K(x, xi)Cit

with

C`+1 = C` − γ(

2

nKC` − Y + 2λC`A

)

I C` ∈ Rn×T

I K ∈ Rn×n, Kij = K(xi, xj)

I Y ∈ Rn×T , Yij = yji

L.Rosasco, RegML 2016 31

Page 32: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Spectral filtering for MTL

Beyond penalization

minW

1

n‖XW − Y ‖2 + λTr(WAW>),

other forms of regularizations can be considered

I projection

I early stopping

L.Rosasco, RegML 2016 32

Page 33: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Multiclass and MTL

Y = {1, . . . , T}

L.Rosasco, RegML 2016 33

Page 34: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

From Multiclass to MTL

Encoding For j = 1, . . . , T

j 7→ ej

canonical vector of RT

the problem reduces to vector valued regression

Decoding For f(x) ∈ RT

f(x) 7→ argmaxt=1,...t

e>t f(x) = argmaxt=1,...t

ft(x)

L.Rosasco, RegML 2016 34

Page 35: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Single MTL and OVA

Write

minW

1

n‖XW − Y ‖2 + λTr(WW>),

asT∑t=1

minwt

1

n

nt∑i=1

(w>t xti − yti)2 + λ‖wt‖2

This is known as one versus all (OVA)

L.Rosasco, RegML 2016 35

Page 36: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Beyond OVA

Consider

minW

1

n‖XW − Y ‖2 + λTr(WAW>),

that isT∑t=1

minwt

T∑t=1

(1

n

n∑i=1

(yti − w>t xi)2 + λσt‖wt‖2)

Class relatedness encoded in A

L.Rosasco, RegML 2016 36

Page 37: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Back to MTL

T∑t=1

1

nt

nt∑j=1

(ytj − w>i xtj)2

‖( X︸︷︷︸n×d

W︸︷︷︸d×T

− Y︸︷︷︸n×T

)� M︸︷︷︸n×T

‖2F , n =

T∑t=1

nt

I � Hadamard product

I M mask

I Y having one non-zero value for each row

L.Rosasco, RegML 2016 37

Page 38: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Computations

minW‖(XW − Y )�M‖2F + λTr(WAW>)

I can be rewritten using tensor calculus

I computation for vector valued regression easily extended

I sparsity of M can be exploited

L.Rosasco, RegML 2016 38

Page 39: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

From MTL to matrix completion

Special case Take d = n and X = I

‖(XW − Y ) ◦M‖2F

⇓T∑t=1

n∑i=1

(wij − yij)2Mij

L.Rosasco, RegML 2016 39

Page 40: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Summary so far

A regularization framework for

I VVR

I Multiclass

I MTL

I Matrix completion

if the structure of the “tasks” is known.

What if it is not?

L.Rosasco, RegML 2016 40

Page 41: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

The structure of MTL

Consider

minW

1

n‖XW − Y ‖2 + λTr(WAW>),

the matrix A encodes structure.

Can we learn it?

L.Rosasco, RegML 2016 41

Page 42: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Learning structure of MTL

Consider

minW,A

1

n‖XW − Y ‖2 + λTr(WAW>) + γpen(A)

Estimate a positive definite matrix A using a regularizer pen(A)

L.Rosasco, RegML 2016 42

Page 43: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Regularizers for MTL

For example consider

minW,A

1

n‖XW − Y ‖2 + λTr(WAW>) + γTr(A−2)

using the same change of coordinates as before we have

minw1,...,wT , σ1,...,σt

T∑t=1

T∑t=1

1

n

n∑i=1

(yti − w>t xi)2 + λσt‖wt‖2 + γ

T∑t=1

(1

σ2t

)we avoid each task having too little weight

L.Rosasco, RegML 2016 43

Page 44: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Alternating minimization

Solving

minW,A

1

n‖XW − Y ‖2 + λTr(WAW>) + γpen(A)

I Fix A = A0

I Compute W1 solving

minW

1

n‖XW − Y ‖2 + λTr(WA0W

>)

I Compute A1 solving

minAλTr(W1AW

>1 ) + γpen(A)

I Repeat. . .

L.Rosasco, RegML 2016 44

Page 45: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Alternating minimization

Solving

minW,A

1

n‖XW − Y ‖2 + λTr(WAW>) + γpen(A)

I Fix A = A0

I Compute W1 solving

minW

1

n‖XW − Y ‖2 + λTr(WA0W

>)

I Compute A1 solving

minAλTr(W1AW

>1 ) + γpen(A)

I Repeat. . .

L.Rosasco, RegML 2016 45

Page 46: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

This class

I Why MTL?

I Regularization for MTL to exploit structure

I MTL and other problems

I Learning tasks AND their structure

L.Rosasco, RegML 2016 46

Page 47: RegML 2016 Class 4 Regularization for multi-task learninglcsl.mit.edu/courses/regml/regml2016/slides/lec4.pdf · RegML 2016 Class 4 Regularization for multi-task learning Lorenzo

Next class

Sparsity!

L.Rosasco, RegML 2016 47