37
ETC3250: Support Vector Machines Semester 1, 2020 Professor Di Cook Econometrics and Business Statistics Monash University Week 7 (b)

Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

ETC3250: Support VectorMachinesSemester 1, 2020

Professor Di Cook

Econometrics and Business Statistics Monash University

Week 7 (b)

Page 2: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Separating hyperplanes

In a -dimensional space, a hyperplane is a �at af�ne subspace ofdimension .

(ISLR: Fig 9 1)

p

p − 1

2 / 36

Page 3: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Separating hyperplanes

The equation of -dimensional hyperplane is given by

If and for , then

Equivalently,

p

β0 + β1X1 + ⋯ + βpXp = 0

xi ∈ Rp yi ∈ {−1, 1} i = 1, … , n

β0 + β1xi1 + ⋯ + βpxip > 0 if yi = 1,

β0 + β1xi1 + ⋯ + βpxip < 0 if yi = −1

yi(β0 + β1xi1 + ⋯ + βpxip) > 0

3 / 36

Page 4: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Separating hyperplanes

A new observation is assigned a class depending on which side of thehyperplane it is located Classify the test observation based on the sign of

If , class , and if , class , i.e.

x∗

s(x∗) = β0 + β1x∗1 + ⋯ + βpx∗

p

s(x∗) > 0 1 s(x∗) < 0 −1h(x∗) = sign(s(x∗)).

4 / 36

Page 5: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Separating hyperplanes

What about the magnitude of ?

lies far from the hyperplane + morecon�dent about our classi�cation near the hyperplane + less con�dentabout our classi�cation

s(x∗)

s(x∗) far from zero  → x

s(x∗) close to zero  → x

5 / 36

Page 6: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Separating hyperplane classi�ers

We will explore three different types of hyperplane classi�ers, witheach method generalising the one before it.

Maximal marginal classi�er for when the data is perfectly separatedby a hyperplane Support vector classi�er/ soft margin classi�er for when data is NOTperfectly separated by a hyperplane but still has a linear decisionboundary, and Support vector machines used for when the data has non-lineardecision boundaries.

In practice, SVMs are used to refer to all three methods, however wewill distinguish between the three notions in this lecture.

6 / 36

Page 7: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Maximal margin classi�er

All are separating hyperplanes. Which is best?

7 / 36

Page 8: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Maximal margin classi�er

Source: Machine Learning Memes for Convolutional Teens

8 / 36

Page 9: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

From LDA to SVM

Linear discriminant analysis uses the difference between means toset the separating hyperplane. Support vector machines uses gaps between points on the outeredge of clusters to set the separating hyperplane.

9 / 36

Page 10: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

SVM vs Logistic Regression

If our data can be perfectly separated using a hyperplane, then therewill in fact exist an in�nite number of such hyperplanes. We compute the (perpendicular) distance from each trainingobservation to a given separating hyperplane. The smallest suchdistance is known as the margin. The optimal separating hyperplane (or maximal margin hyperplane)is the separating hyperplane for which the margin is largest. We can then classify a test observation based on which side of themaximal margin hyperplane it lies. This is known as the maximal marginclassi�er.

10 / 36

Page 11: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Support vectors

11 / 36

Page 12: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Support vectors

The support vectors are equidistant from the maximal marginhyperplane and lie along the dashed lines indicating the width of themargin. They support the maximal margin hyperplane in the sense that ifthese points were moved slightly then the maximal margin hyperplanewould move as well

The maximal margin hyperplane depends directly on the supportvectors, but not on the other observations

12 / 36

Page 13: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Support vectors

13 / 36

Page 14: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Example: Support vectors (and slack vectors)

14 / 36

Page 15: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Maximal margin classi�er

If and for , the separatinghyperplane is de�ned as

where and is the number of support vectors. Then

the maximal margin hyperplane is found by

maximising , subject to , and

.

xi ∈ Rp yi ∈ {−1, 1} i = 1, … , n

{x : β0 + xT β = 0}

β = ∑s

i=1(αiyi)xi s

M ∑p

j=1 β2j = 1

yi(xTi β + β0) ≥ M, i = 1, . . . , n

15 / 36

Page 16: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Non-separable case

The maximal margin classi�er only workswhen we have perfect separability in our data.

What do we do if data is not perfectlyseparable by a hyperplane?

The support vector classi�er allows points toeither lie on the wrong side of the margin, oron the wrong side of the hyperplanealtogether.

Right: ISLR Fig 9.6

16 / 36

Page 17: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Support vector classi�er - optimisation

Maximise , subject to , and

, AND .

tells us where the th observation is located and is a nonnegativetuning parameter.

: correct side of the margin, : wrong side of the margin (violation of the margin), : wrong side of the hyperplane.

M ∑p

i=1 β2i = 1

yi(x′iβ + β0) ≥ M(1 − ϵi), i = 1, . . . , n ϵi ≥ 0,∑n

i=1 ϵi ≤ C

εi i C

εi = 0εi > 0εi > 1

17 / 36

Page 18: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Non-separable case

Tuning parameter: decreasing the value of C

18 / 36

Page 19: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Non-linear boundaries

The support vector classi�er doesn't work well for non-linearboundaries. What solution do we have?

19 / 36

Page 20: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Enlarging the feature space

Consider the following 2D non-linear classi�cation problem. We cantransform this to a linear problem separated by a maximal marginhyperplane by introducing an additional third dimension.

Source: Grace Zhang @zxr.nju

20 / 36

Page 21: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

The inner product

Consider two -vectors

The inner product is de�ned as

A linear measure of similarity, and allows geometric constrctions suchas the maximal marginal hyperplane.

p

x = (x1, x2, … , xp) ∈ Rp

and y = (y1, y2, … , yp) ∈ Rp.

⟨x, y⟩ = x1y1 + x2y2 + ⋯ + xpyp =

p

∑j=1

xjyj

21 / 36

Page 22: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Kernel functions

A kernel function is an inner product of vectors mapped to a (higherdimensional) feature space .

Non-linear measure of similarity, and allows geometric constructions inhigh dimensional space.

H = Rd, d > p

K(x, y) = ⟨ψ(x),ψ(y)⟩

ψ : Rp → H

22 / 36

Page 23: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Examples of kernels

Standard kernels include:

Linear K(x, y) = ⟨x, y⟩

Polynomial K(x, y) = (⟨x, y⟩ + 1)d

Radial K(x, y) = exp(−γ||x − y||2)

23 / 36

Page 24: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Support Vector Machines

The kernel trick

The linear support vector classi�er can berepresented as follows:

We can generalise this by replacing the innerproduct with the kernel function as follows:

f(x) = β0 +∑i∈S

αi⟨x, xi⟩.

f(x) = β0 +∑i∈S

αiK(x, xi).

24 / 36

Page 25: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Your turn

Let and be vectors in . By expanding

show that this is equilvalent to an inner product in .

Remember: .

x y R2

K(x,y) = (1 + ⟨x,y⟩)2

H = R6

⟨x,y⟩ =∑p

j=1 xjyj

03:0025 / 36

Page 26: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Solution

where .

K(x, y) = (1 + ⟨x, y⟩)2

= (1 +2

∑j=1

xjyj)

2

= (1 + x1y1 + x2y2)2

= (1 + x21y2

1 + x22y2

2 + 2x1y1 + 2x2y2 + 2x1x2y1y2)

= ⟨ψ(x), ψ(y)⟩

ψ(x) = (1, x21, x2

2, √2x1, √2x2, √2x1x2)

26 / 36

Page 27: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

The kernel trick - why is it a trick?

We do not need to know what the high dimensional enlarged featurespace really looks like.

We just need to know the which kernel function is most appropriate asa measure of similarity.

The Support Vector Machine (SVM) is a maximal margin hyperplane in built by using a kernel function in the low dimensional feature space .

H

H

Rp

27 / 36

Page 28: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Non-linear boundaries

Polynomial and radial kernel SVMs

28 / 36

Page 29: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Non-linear boundaries

Italian olive oils: Regions 2, 3 (North and Sardinia)

29 / 36

Page 30: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

SVM in high dimensions

Examining misclassi�cations and which points are selected to besupport vectors

30 / 36

Page 31: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

SVM in high dimensions

Examining boundaries

31 / 36

Page 32: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

SVM in high dimensions

Boundaries of a radial kernel in 3D

32 / 36

Page 33: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

SVM in high dimensions

Boundaries of a polynomial kernel in 5D

33 / 36

Page 34: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Comparing decision boundaries

34 / 36

Page 35: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

Increasing the value of cost in svm

35 / 36

Page 36: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps

� Made by a human with a computerSlides at https://iml.numbat.space.

Code and data at https://github.com/numbats/iml.

Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

36 / 36

Page 37: Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference between means to set the separating hyperplane. Support vector machines uses gaps