prev

next

of 25

View

39Download

0

Embed Size (px)

DESCRIPTION

CSE 881: Data Mining. Lecture 10: Support Vector Machines. Find a linear hyperplane (decision boundary) that will separate the data. Support Vector Machines. One Possible Solution. Support Vector Machines. Another possible solution. Support Vector Machines. Other possible solutions. - PowerPoint PPT Presentation

CSE 881: Data MiningLecture 10: Support Vector Machines

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Support Vector MachinesFind a linear hyperplane (decision boundary) that will separate the data

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Support Vector MachinesOne Possible Solution

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

Support Vector MachinesAnother possible solution

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B2

Support Vector MachinesOther possible solutions

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B2

Support Vector MachinesWhich one is better? B1 or B2?How do you define better?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

B2

Support Vector MachinesFind hyperplane maximizes the margin => B1 is better than B2

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

B2

b11

b12

b21

b22

margin

SVM vs PerceptronPerceptron: SVM: Minimizes least square error (gradient descent) Maximizes margin

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Support Vector Machines

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

b11

b12

Learning Linear SVMWe want to maximize:

Which is equivalent to minimizing:But subjected to the following constraints:

or

This is a constrained optimization problemSolve it using Lagrange multiplier method

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Unconstrained Optimizationmin E(w1,w2) = w12 + w22

w1=0, w2=0 minimizes E(w1,w2)Minimum value is E(0,0)=0

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Constrained Optimizationmin E(w1,w2) = w12 + w22subject to 2w1 + w2 = 4 (equality constraint)

Dual problem: w2w1E(w1,w2)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Linear SVMLagrange multiplier:

Take derivative w.r.t and b:

Additional constraints:

Dual problem:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Linear SVMBigger picture:Learning algorithm needs to find w and bTo solve for w:

But:

is zero for points that do not reside on

Data points where s are not zero are called support vectors

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Example of Linear SVMSupport vectors

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Linear SVMBigger pictureDecision boundary depends only on support vectors If you have data set with same support vectors, decision boundary will not change

How to classify using SVM once w and b are found? Given a test record, xi

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Support Vector MachinesWhat if the problem is not linearly separable?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Support Vector MachinesWhat if the problem is not linearly separable?Introduce slack variables Need to minimize:

Subject to:

If k is 1 or 2, this leads to same objective function as linear SVM but with different constraints (see textbook)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Nonlinear Support Vector MachinesWhat if decision boundary is not linear?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Nonlinear Support Vector MachinesTrick: Transform data into higher dimensional spaceDecision boundary:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Nonlinear SVMOptimization problem:

Which leads to the same set of equations (but involve (x) instead of x)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning NonLinear SVMIssues:What type of mapping function should be used?How to do the computation in high dimensional space? Most computations involve dot product (xi) (xj) Curse of dimensionality?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Nonlinear SVMKernel Trick:(xi) (xj) = K(xi, xj) K(xi, xj) is a kernel function (expressed in terms of the coordinates in the original space) Examples:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Example of Nonlinear SVMSVM with polynomial degree 2 kernel

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Learning Nonlinear SVMAdvantages of using kernel:Dont have to know the mapping function Computing dot product (xi) (xj) in the original space avoids curse of dimensionality

Not all functions can be kernelsMust make sure there is a corresponding in some high-dimensional spaceMercers theorem (see textbook)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002