37
Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I. http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/

Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I. scanu

Embed Size (px)

Citation preview

Page 1: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

Support Vector Machines

S.V.M.

Special session

Bernhard Schölkopf & Stéphane Canu

GMD-FIRST I.N.S.A. - P.S.I.

http://svm.first.gmd.de/ http://psichaud.insa-rouen.fr/~scanu/

Page 2: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

2ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

radial SVM

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Page 3: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

3ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Road map

• linear discrimination: the separable case• linear discrimination: the NON separable case• quadratic discrimination• radial SVM

– principle– 3 regularization hyperparametres– some benchmark results (glass data)

• SMV for regression

Page 4: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

4ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

What ’s new with SVM

Artificial Neural Networks

Support Vector Machine

From biology to Machine learning

– It works ! Some reason

– formalization of learning : statistical learning theory - learning from data

From maths ! to Machine learning = minimization

– universality learn every thing : Kernel trick

– complexity control but not any thing : Margin

minimization + constraints

Page 5: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

5ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Space functional

)(=)(),(=)(),(=)(),(

spaceHilbert ngreproduici F case) l(orthogona 1

F )()(

)()(),( and

)()(),( such that then

0)()(),( ,

definite positive offunction -bi a be ),(Let

Theorem sMercer'

11

1=k

2

F1

1

,1

2

yfdxxyxKdxxyxKdxxfyxK

ffxxf

yxyxK

ydxxyxK

dxdyygxfyxKgf

LyxK

kk

kkk

k

k

k

kk

kk

kkk

k

kkkkk

Kernel’s trick

Page 6: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

6ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Minimization with constraints

)0*)(or 0(either

0)( and 0 with

)()(),( min max

0)( sconstraint e with th, )(min

0*)*,(

0*)*,(

such that *)*,( couple thefind

)()(),( minmax

0)( sconstraintunder )(min

0*)(' such that * find )(min

x

,0

xg

xg

xgxfxL

xgxf

xLx

xL

x

xgxfxL

xgxf

xfxxf

x

convexe

L(x,) : the Lagrangian (Lagrange, 1788)

Page 7: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

7ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Minimization with constraintsdual formulation

0)( and 0 with )()( max

)()('

)('0)(')('

) 0*)(or 0*(either

0)( and 0 with

)()(),( min max

0)( contraints e with th, )(min

x

ggf

xxg

xfxgxf

xg

xg

xgxfxL

xgxf convex

Phase 1

Phase 2

Page 8: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

8ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Linear discriminationthe separable case

+

+

+ +

+

++

+

+

+

+

+

+wx+ b=0

Well classifyall examples

Page 9: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

9ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Margin

Margin

+

+

+ +

+

++

+

+

+

+

+

+

Linear discriminationthe separable case

wx+ b=0

With thelargestMARGIN

Well classifyall examples

Page 10: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

10ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

+

Linear discriminationthe separable case

y

x

+ + +

1

- 1

Page 11: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

11ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

+

Linear discriminationthe separable case

y = wx y

x

+ + +

1

- 1

MARGIN

MARGIN1

w

1

w

Page 12: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

12ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Margin

Margin

With thelargestMARGIN

+

+

+ +

+

++

+

+

+

+

+

+

Linear discriminationthe separable case

wx+ b=0

max 1

w 2

min w 2

Well classifyall examples

Page 13: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

13ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Linear classification- the separable case

0

,,,, Tucker andKuhn

'y 2

1, ,maxmin

,1 = 'y sconstraint ith the wmin

,1 = 0'y sconstraint ith the wmin

minimize set learning wholeheclassify t well

1,1 , , '

i1

2

0

i2

i2

2

,1

b

bwL

w

bwL

bxwwwLwL

nibxww

nibxww

w

yRxyxbxwsignd

i

n

ii

w

i

i

dniii

1

1

and

Page 14: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

14ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Equality constraint integration

min

max

L(,) = 1

2' H c' + y'

L( ,)

H c + y 0 H + y c

L( ,)

y' 0

0 0

=H c

y

y

Page 15: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

15ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Inequality constraint integration

min

max,

L(, ,) = 1

2' H c' + y'

Optimality conditions * 0 (completed system solution) 0 (multiplyers have to be positive)

While () do not verify optimality conditions

= M-1 b and = - H + c + y

if <0, a constraint is blocked : (i=0) (an active variable is eliminated)else if < 0, a constraint is relaxed

O n3

QP

Page 16: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

16ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Linear classification : the non separable

case

0 and C 0ith w''2

1 min

,1 = -1' sconstraint he with tC + min

,1 = -1'et withmin

errortion classifica : 1,n whe

,1 = -1' sconstraint relax the

1

n

1=ii

2

w,

n

1=ii

n

iii

iii

iii

i

iii

ycH

nibxwyw

nibxwycw

nibxwy

Error variables

Page 17: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

17ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

quadratic SVM

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-3

-2

-1

0

1

2

3

Page 18: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

18ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

polynomial classificationd sign w' x1 x2 x1

2 x22 x1x2 b xi , yi i1,n x Rd , y 1,1

well classify the training set and minimize w 2

min w 2 under contraints yi w' x1i , x2i b 1 i = 1, n

x1, x2 x1 x2 x12 x2

2 x1x2 '

H = diag(y) diag(y) '

1

n

1 5

Rang(H) = 5regularization needed

Page 19: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

19ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Gaussian Kernel based S.V.M.

S

ssss

s

m

kksk

S

s

m

kkskss

m

kk

S

sskss

S

sskssk

m

kjkikij

m

kkk

xxKyxr

xxKxx

bxxy

bxxyxrxyw

xxHH

bxwxr

1

1

1 1

1 11

1

1

),(sign)(ˆ :about forget

),()()( : theoremsMercer'

)()(sign

)()(sign)(ˆ)(

)()( '

)(sign)(ˆ

Page 20: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

20ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

1 d example

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

SVM en 1 d : en vert l'ensemble d'apprentissage, les points entrourees sont supports

Class 1 : mixture of 2 gaussian

Class 2 : gaussian

Training set

Output of the SVM

for the test set

Margin

Supportvectors

Page 21: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

21ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

3 regularization parameters• C : the superior bound

• : the kernel bandwidth: K(x,y)

• the linear system regularization H=b => (H+I)=b

K(x, y) exp x y 2

2

min R, w, n R2 w 2

nR2 min

amax

xi

K(xi , xi ) K(a, a) 2K(xi , a)

C0

Page 22: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

22ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Small bandwidth and large C

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

Page 23: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

23ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Large bandwidth and large C

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

Page 24: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

24ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Large bandwidth and small C

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Page 25: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

25ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

n

iiiii

n

iiiii

n

iiiii

n

iiiii

n

iii

iii

iii

xwfyxwfy

cL

c

bxwy

bxwy

xfy

xfyfyxC

ii

11

**

1

**2

1

**

2

1

*

,

i

*i

*

),(),(

wC- ,

w with min

0 .

0 .

else )(

)( if ),,(

*

SVMfor

regression

Page 26: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

26ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Example...

0 0.5 1 1.5 2 2.5 3 3.58.5

9

9.5

10

10.5

11

11.5Support Vector Machine Regression

Page 27: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

27ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

small and also

0 0.5 1 1.5 2 2.5 3 3.58.5

9

9.5

10

10.5

11

11.5Support Vector Machine Regression

Page 28: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

28ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Geostatistics

Page 29: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

29ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

An other way to see things (Girosi, 97)

n

1=i1,1

n

1=iF1,F1

11

n

1=i

2

F1

,2

1

min

,,,2

1,),(

min

,et )(=F

,)(2

1

min

i

n

jijiji

n

iii

i

i

n

jiiiji

n

iii

i

ii

iii

ii

i

n

iii

i

xxKy

xxKxxKxxKxr

yxyxKxcxff

xxKxr

Page 30: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

30ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

SVM history and trends

Vapnik, V.; Lerner, A. 1963statistical learning theory Mangasarian, O. 1965, 1968

optimization Kimeldorf, G; Wahba, G; 1971non parametric regression : splines

Boser, B.; Guyon, I..; Vapnik, V. 1992Bennett, K.; Mangasarian, O. 1992

Learning Theory : Cortes, C. 1995. • soft margin classifier,• effective VC-dimensions• other formalisms, ...

The pioneers

The 2nd start : ANN, learning & computers...

Trends...

Applications :• on-line handwritten C. R.• Face recognition • Text mining• ...

Optimization : • Vapnik• Osuna, E. & Girosi, • John C. Platt • Linda Kaufman• Thorsten Joachims

Page 31: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

31ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Optimization issuesQP with constraints

0 and C 0 with ''2

1 min

1

n

iii ycH

• Box constraints

• H is positive semidefinite (beware commercial solver)

• Size of H ! But a lot of are 0 or C–active constraint set, starting with = 0–do not compute (store) the whole H –chunk

• multiclass issue !

Page 32: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

32ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Optimization issuesSolve the whole problem

• commercial : LOQO (primal-dual approach), MINOS, Matlab !!!• Vapnik : More and Toraldo (1991)

Decompose the problem• Chunking (Vapnik, 82, 92),• Ozuna & Girosi (implemented in SVMlight by Thorsten Joachims, 98)• Sequential Minimal Optimization (SMO) John C. Platt, 98

No H : Start from 0 - active set technique (Linda Kaufman, 98)• minimize the cost function

– 2nd order : Newton,– conjugate gradient, projected conjugate gradient PCG, Burges, 98

• select the relevant constraints

Interior point methodsMoré, 91, Z. Dostal, 97 and others...

Page 33: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

33ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Some benchmark considerations (Platt 98)

• Osuna’s decomposition technique permits the solution of SVMs via fixed-size QP subproblems

• Using two-variable QP subproblems (SMO) does not require QP library

• SMO trades off QP time for kernel evaluation time

• Optimizations can dramatically reduce kernel time– Linear SVMs (useful for text categorization)– Sparse dot products– Kernel caching (good for smaller problems, Thorsten Joachims, 98)

• SMO can be much faster than other techniques for some problems

• what about active set and interior points technique ?

Page 34: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

34ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

open issues• VC Entropy for Margin Classifiers: learning bounds• other margin classifiers: boosting • Non “L2” (quadratic) cost function: Sparse coding (Drezet & Harrsion) • curse of dimensionality: local vs global• kernel influence (Tsuda)• applications:

– classification (Weston & Watkins),– …to regression (Pontil & al.)– face detection (Fernandez & Viennet)

• algorithms (Christiani & Campbell)• making bridges - other formalisms:

– bayesian (Kwok), – statistical mechanics (Buhot & Gordon), – logic (Sebag), …

Page 35: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

35ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Books in Support Vector ResearchV. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995, Statistical Learning Theory. Wiley, 1998.

SVM introductive chapter in :• S. Haykin, Neural Networks, a Comprehensive Foundation. Macmillan, New York, NY., 1998 (2nd ed).• V. Cherkassky and F. Mulier; Learning from Data: Concepts, Theory, and Methods. Wiley, 1998.

C.J.C. Burges; 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge, Discovery, Vol 2 Number 2.

Schölkopf, B.; 1997. Support Vector Learning. PhD Thesis. Published by: R. Oldenbourg Verlag, Munich, 1997. ISBN 3-486-24632-1.

Smola, A. J.; 1998. Learning with Kernels. PhD Thesis. Published by: GMD, Birlinghoven, 1999

NIPS’ 97 workshop’s book : B. Schölkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA; December 1998,

NIPS’ 98 workshop’s book on large margin classifier… is coming

Page 36: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

36ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Events in Support Vector Research

ACAI '99 WORKSHOP Support Vector Machine Theory and Applications Workshop on Support Vector Machines - IJCAI'99, August 2, 1999, Stockholm, Sweden EUROCOLT'99 workshop on Kernel Methods , March 27, 1999, Nordkirchen Castle, Germany

Page 37: Support Vector Machines S.V.M. Special session Bernhard Schölkopf & Stéphane Canu GMD-FIRST I.N.S.A. - P.S.I.  scanu

37ESANN'99 : Special session 7 on Support Vector Machines, Thursday 22nd April 1999

Conclusion

SVM select relevant patterns in a robust way

- svm.cs.rhbnc.ac.uk

Matlab code available under request

- [email protected]

Multi class problemsSmall error