Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference...

ETC3250: Support VectorMachinesSemester 1, 2020

Professor Di Cook

Econometrics and Business Statistics Monash University

Week 7 (b)

Separating hyperplanes

In a -dimensional space, a hyperplane is a �at af�ne subspace ofdimension .

(ISLR: Fig 9 1)

p − 1

2 / 36

The equation of -dimensional hyperplane is given by

If and for , then

Equivalently,

β0 + β1X1 + ⋯ + βpXp = 0

xi ∈ Rp yi ∈ {−1, 1} i = 1, … , n

β0 + β1xi1 + ⋯ + βpxip > 0 if yi = 1,

β0 + β1xi1 + ⋯ + βpxip < 0 if yi = −1

yi(β0 + β1xi1 + ⋯ + βpxip) > 0

3 / 36

A new observation is assigned a class depending on which side of thehyperplane it is located Classify the test observation based on the sign of

If , class , and if , class , i.e.

s(x∗) = β0 + β1x∗1 + ⋯ + βpx∗

s(x∗) > 0 1 s(x∗) < 0 −1h(x∗) = sign(s(x∗)).

4 / 36

What about the magnitude of ?

lies far from the hyperplane + morecon�dent about our classi�cation near the hyperplane + less con�dentabout our classi�cation

s(x∗)

s(x∗) far from zero → x

s(x∗) close to zero → x

5 / 36

Separating hyperplane classi�ers

We will explore three different types of hyperplane classi�ers, witheach method generalising the one before it.

Maximal marginal classi�er for when the data is perfectly separatedby a hyperplane Support vector classi�er/ soft margin classi�er for when data is NOTperfectly separated by a hyperplane but still has a linear decisionboundary, and Support vector machines used for when the data has non-lineardecision boundaries.

In practice, SVMs are used to refer to all three methods, however wewill distinguish between the three notions in this lecture.

6 / 36

Maximal margin classi�er

All are separating hyperplanes. Which is best?

7 / 36

Source: Machine Learning Memes for Convolutional Teens

8 / 36

From LDA to SVM

Linear discriminant analysis uses the difference between means toset the separating hyperplane. Support vector machines uses gaps between points on the outeredge of clusters to set the separating hyperplane.

9 / 36

SVM vs Logistic Regression

If our data can be perfectly separated using a hyperplane, then therewill in fact exist an in�nite number of such hyperplanes. We compute the (perpendicular) distance from each trainingobservation to a given separating hyperplane. The smallest suchdistance is known as the margin. The optimal separating hyperplane (or maximal margin hyperplane)is the separating hyperplane for which the margin is largest. We can then classify a test observation based on which side of themaximal margin hyperplane it lies. This is known as the maximal marginclassi�er.

10 / 36

Support vectors

11 / 36

Support vectors

The support vectors are equidistant from the maximal marginhyperplane and lie along the dashed lines indicating the width of themargin. They support the maximal margin hyperplane in the sense that ifthese points were moved slightly then the maximal margin hyperplanewould move as well

The maximal margin hyperplane depends directly on the supportvectors, but not on the other observations

12 / 36

Support vectors

13 / 36

Example: Support vectors (and slack vectors)

14 / 36

If and for , the separatinghyperplane is de�ned as

where and is the number of support vectors. Then

the maximal margin hyperplane is found by

maximising , subject to , and

xi ∈ Rp yi ∈ {−1, 1} i = 1, … , n

{x : β0 + xT β = 0}

β = ∑s

i=1(αiyi)xi s

M ∑p

j=1 β2j = 1

yi(xTi β + β0) ≥ M, i = 1, . . . , n

15 / 36

Non-separable case

The maximal margin classi�er only workswhen we have perfect separability in our data.

What do we do if data is not perfectlyseparable by a hyperplane?

The support vector classi�er allows points toeither lie on the wrong side of the margin, oron the wrong side of the hyperplanealtogether.

Right: ISLR Fig 9.6

16 / 36

Support vector classi�er - optimisation

Maximise , subject to , and

, AND .

tells us where the th observation is located and is a nonnegativetuning parameter.

: correct side of the margin, : wrong side of the margin (violation of the margin), : wrong side of the hyperplane.

M ∑p

i=1 β2i = 1

yi(x′iβ + β0) ≥ M(1 − ϵi), i = 1, . . . , n ϵi ≥ 0,∑n

i=1 ϵi ≤ C

εi i C

εi = 0εi > 0εi > 1

17 / 36

Non-separable case

Tuning parameter: decreasing the value of C

18 / 36

Non-linear boundaries

The support vector classi�er doesn't work well for non-linearboundaries. What solution do we have?

19 / 36

Enlarging the feature space

Consider the following 2D non-linear classi�cation problem. We cantransform this to a linear problem separated by a maximal marginhyperplane by introducing an additional third dimension.

Source: Grace Zhang @zxr.nju

20 / 36

The inner product

Consider two -vectors

The inner product is de�ned as

A linear measure of similarity, and allows geometric constrctions suchas the maximal marginal hyperplane.

x = (x1, x2, … , xp) ∈ Rp

and y = (y1, y2, … , yp) ∈ Rp.

⟨x, y⟩ = x1y1 + x2y2 + ⋯ + xpyp =

∑j=1

21 / 36

Kernel functions

A kernel function is an inner product of vectors mapped to a (higherdimensional) feature space .

Non-linear measure of similarity, and allows geometric constructions inhigh dimensional space.

H = Rd, d > p

K(x, y) = ⟨ψ(x),ψ(y)⟩

ψ : Rp → H

22 / 36

Examples of kernels

Standard kernels include:

Linear K(x, y) = ⟨x, y⟩

Polynomial K(x, y) = (⟨x, y⟩ + 1)d

Radial K(x, y) = exp(−γ||x − y||2)

23 / 36

Support Vector Machines

The kernel trick

The linear support vector classi�er can berepresented as follows:

We can generalise this by replacing the innerproduct with the kernel function as follows:

f(x) = β0 +∑i∈S

αi⟨x, xi⟩.

f(x) = β0 +∑i∈S

αiK(x, xi).

24 / 36

Your turn

Let and be vectors in . By expanding

show that this is equilvalent to an inner product in .

Remember: .

x y R2

K(x,y) = (1 + ⟨x,y⟩)2

H = R6

⟨x,y⟩ =∑p

j=1 xjyj

03:0025 / 36

Solution

where .

K(x, y) = (1 + ⟨x, y⟩)2

= (1 +2

∑j=1

= (1 + x1y1 + x2y2)2

= (1 + x21y2

1 + x22y2

2 + 2x1y1 + 2x2y2 + 2x1x2y1y2)

= ⟨ψ(x), ψ(y)⟩

ψ(x) = (1, x21, x2

2, √2x1, √2x2, √2x1x2)

26 / 36

The kernel trick - why is it a trick?

We do not need to know what the high dimensional enlarged featurespace really looks like.

We just need to know the which kernel function is most appropriate asa measure of similarity.

The Support Vector Machine (SVM) is a maximal margin hyperplane in built by using a kernel function in the low dimensional feature space .

27 / 36

Polynomial and radial kernel SVMs

28 / 36

Italian olive oils: Regions 2, 3 (North and Sardinia)

29 / 36

SVM in high dimensions

Examining misclassi�cations and which points are selected to besupport vectors

30 / 36

Examining boundaries

31 / 36

Boundaries of a radial kernel in 3D

32 / 36

Boundaries of a polynomial kernel in 5D

33 / 36

Comparing decision boundaries

34 / 36

Increasing the value of cost in svm

35 / 36

� Made by a human with a computerSlides at https://iml.numbat.space.

Code and data at https://github.com/numbats/iml.

Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

36 / 36

Machines ETC3250: Support VectorFrom LDA to SVM Linear discriminant analysis uses the dif ference...

Documents

KEY CONCEPT Modern technology uses compound machines

TABLE GAMES HEARING APRIL 7, 2010gamingcontrolboard.pa.gov/files/...Sands_Table_Games_Presentation.pdf · – 3,000 slot machines and electronic table games ... – Sources and Uses

Parameter estimation method for induction machines using ... · PDF fileParameter estimation method for induction ... identification method uses the quasi-stationary ... in (1) and

Modern Technology uses Compound Machines

NON-POSITIVE DISPLACEMENT MACHINES OR ENGINES, …Non-positive-displacement machines or engines, e.g. steam turbines: i.e. machine or engine that uses ... multi-bladed impulse steam

2104D Overlocker - Brotherwelcome.brother.com/.../my-en/products-services/sewing-machines/2… · Free arm / Flat bed convertible sewing surface ... • Uses standard sewing machine

SIMPLE MACHINES Sikes & West S4P3a: Identify simple machines and their uses

Growth potential in tissue and board - · PDF fileGrowth potential in tissue and board ... 40% of all world production uses Valmet machines 700board machines ... Wide, fast, and

HYDRAULICS. Hydraulic System uses a liquid under pressure to move loads. It increases the mechanical advantage of the levers in machines.

and your data Digital Scholarship - E-research · • ContentMine – Uses machines to liberate 100,000,000 facts from the scientific literature. • DataBank – Analysis and visualisation

Liebherr specialist machines for horticultural and ...€¦ · The compact machines of the Liebherr range of earthmoving machinery are perfect predestined for such special uses. 1

Machines that Make machines

Machines Are but Machines

Machines: 17CT Machines - Unitec Partsunitecparts.com/wp-content/uploads/2013/02/10-Machines-W...Machines 66 Machines: 22CT (Yonkers) 74 Machines Machines © Unitec Parts Company,

Why virtual machines really matter – for several disciplines · Why virtual machines really matter – ... virtual machine that acquires, manipulates, stores and uses information

Sewing Machines, Embroidery Machines, Apparel Machines and Home Textile Machines in Dubai, UAE

Self-Adjusting Machines...Change propagation updates output and trace Matthew A. Hammer Self-Adjusting Machines 6 High-level versus low-level languages Existing work uses/targets high-level

KEY CONCEPT Six simple machines have many uses. · simple machines include the wheel and axle, pulley, wedge, and screw. As you will see, the wheel and axle and pulley are related

Microsoft Azure Essentials Security... · Protecting Your Virtual Machines As you move into Virtual Machines and compute, the Azure Security Center uses machine learning to continuously

Cut Quality Estimation in Industrial Laser Cutting Machines: A Machine …openaccess.thecvf.com/content_CVPRW_2019/papers/MULA/Santoli… · Numerical Control machine tool that uses