18
Support Vector Classification (Linearly Separable Case, Primal) The hyperplane that solves the minimization problem: (w ;b) min (w ;b)2R n+ 1 2 1 jjw jj 2 2 D(Aw + eb) > e; alizes the maximal margin hyperplane w geometric margin í= jjw jj 2 1

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

  • View
    224

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Support Vector Classification(Linearly Separable Case, Primal)

The hyperplane that solves the minimization problem:

(w;b)

min(w;b)2R n+1

21 jjwjj22

D(Aw+ eb)>e;

realizes the maximal margin hyperplane withgeometric margin í = jjwjj2

1

Page 2: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Support Vector Classification(Linearly Separable Case, Dual Form)

The dual problem of previous MP:

maxë2R l

e0ë à 21ë0DAA0Dë

subject to

e0Dë = 0; ë>0:Applying the KKT optimality conditions, we have

w = A0Dë. But where isb?

06ë ? D(Aw+ eb) à e>0Don’t forget

Page 3: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Dual Representation of SVM

(Key of Kernel Methods: )

The hypothesis is determined by(ëã;bã)

h(x) = sgn(êx;A0Dëã

ë+ bã)

= sgn(P

i=1

l

yiëãi

êxi;x

ë+ bã)

= sgn(P

ëãi >0

yiëãi

êxi;x

ë+ bã)

w = A0Dëã =P

i=1

`

yiëiA0i

Remember : A0i = xi

Page 4: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Compute the Geometric Margin via Dual Solution

The geometric margin í = jjwãjj21 and

êwã;wã

ë= (ëã)0DAA0Dëã, hence we can

computeí by usingëã. Use KKT again (in dual)!

0 6 ëã ? D(AA0Dëã + bãe) à e> 0 Don’t forgete0Dëã = 0

í = (e0ëã)à 21

= (P

ëãi >0

ëãi )

à 21

Page 5: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Soft Margin SVM(Nonseparable Case)

If data are not linearly separable Primal problem is infeasible Dual problem is unbounded above

Introduce the slack variable for each training point

yi(w0xi + b)>1à øi; øi>0 8 i

The inequality system is always feasible

w = 0; b= 0 & ø= ee.g.

Page 6: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

xj

x

x

x

x

x

x

x

x

o

o

o

o

o

o

o

oi

í

í

øj

øi

Page 7: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Two Different Measures of Training Error

min(w;b;ø)2R n+1+l

21jjwjj22 + 2

Cjjøjj22

D(Aw+ eb) + ø>e

2-Norm Soft Margin:

1-Norm Soft Margin:min

(w;b;ø)2R n+1+l21jjwjj22 + Ce0ø

D(Aw+ eb) + ø>e

ø> 0

Page 8: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

2-Norm Soft Margin Dual Formulation

The Lagrangian for 2-norm soft margin:

L (w;b;ø;ë) = 21w0w+ 2

Cø0ø+ë0[eà D(Aw+ eb) à ø]

where ë>0

The partial derivatives with respect to primalvariables equal zeros

@w@L (w;b;ø;ë) = wà A0Dë = 0

@b@L (w;b;ø;ë) = e0Dë = 0; @ø

@L (w;b;ø;ë) = Cøà ë = 0

Page 9: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Dual Maximization ProblemFor 2-Norm Soft Margin

Dual:

ë>0

maxë2R l

e0ë à 21ë0D(AA0+ C

1I )Dë

e0Dë = 0

The corresponding KKT complementarity:

06ë ? D(Aw+ eb) + øà e>0 Use above conditions to find bã

Page 10: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

f (x) =ð P

i=1

?wiþi(x)

ñ+ b

Linear Machine in Feature Space

Let þ : X ! Fbe a nonlinear map from the

input space to some feature space

The classifier will be in the form (Primal):

Make it in the dual form:

f (x) =ð P

i=1

lë iyi

êþ(xi) áþ(x)

ëñ+ b

Page 11: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

K (x;z) =êþ(x) áþ(z)

ë

Kernel: Represent Inner Product in Feature Space

The classifier will become:

f (x) =ð P

i=1

lë iyiK (xi;x)

ñ+ b

Definition: A kernel is a functionK : X â X ! Rsuch thatfor all x;z 2 X

where þ : X ! F

Page 12: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Introduce Kernel into DualFormulation

Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample in the feature space

implicitly defined by the kernel K (x;z).The SV classifier is determined byëã that

solvesmaxë2R l

e0ë à 21ë0DK (A;A0)Dë

subject to

e0Dë = 0; ë>0:

Page 13: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

The value of kernel function represents the inner product in feature space

Kernel functions merge two steps 1. map input data from input space to feature space (might be infinite dim.) 2. do inner product in the feature space

Kernel TechniqueBased on Mercer’s Condition (1909)

Page 14: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Mercer’s Conditions Guarantees the Convexity of QP

and k(x;z)is a symmetric function onX .

K 2 Rnâ n

be a finite spaceX = f x1; x2; . . .; xngLet

Then k(x;z)is a kernel function if and only if

is positive semi-definite.;K i j = k(xi;xj)

Page 15: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Introduce Kernel in Dual FormulationFor 2-Norm Soft Margin

ë>0

maxë2R l

e0ë à 21ë0D(K (A;A0) + C

1I )Dë

e0Dë = 0

Then the decision rule is defined by

Use above conditions to find

The feature space implicitly defined byk(x;z) Supposeëãsolves the QP problem:

h(x) = sgn(K (x;A0)Dëã + bã)

Page 16: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Introduce Kernel in Dual Formulationfor 2-Norm Soft Margin

for any

bã is chosen so that

yi[K (A0i;A

0)Dëã + bã] = 1à Cëã

i

i with ëãi 6= 0

06ëã ? D(K (A;A0)Dëã + ebã)+ øã à e> 0

Because:

and ëã = Cøã

Page 17: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Geometric Margin in Feature Spacefor 2-Norm Soft Margin

The geometric margin in the feature space is defined by

í = jjwãjj21 =

àe0ëã à C

1jjëãjj22áà 2

1

jjwãjj22 = (ëã)0DK (A;A0)Dëã

...= e0ëã à C

1 jjëãjj22

Why e0øã > jjøãjj22 ?

Page 18: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane

Discussion about Cfor 2-Norm Soft Margin

The only difference between “hard margin” and 2-norm soft margin is the objective function in the optimization problem

Larger C will give you a smaller margin in the feature space

CompareK (A;A0) & (K (A;A0) + C1I )

Smaller C will give you a better numerical condition