# CSE 881: Data Mining

• View
39

0

Embed Size (px)

DESCRIPTION

CSE 881: Data Mining. Lecture 10: Support Vector Machines. Find a linear hyperplane (decision boundary) that will separate the data. Support Vector Machines. One Possible Solution. Support Vector Machines. Another possible solution. Support Vector Machines. Other possible solutions. - PowerPoint PPT Presentation

### Text of CSE 881: Data Mining

• CSE 881: Data MiningLecture 10: Support Vector Machines

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Support Vector MachinesFind a linear hyperplane (decision boundary) that will separate the data

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Support Vector MachinesOne Possible Solution

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

• Support Vector MachinesAnother possible solution

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B2

• Support Vector MachinesOther possible solutions

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B2

• Support Vector MachinesWhich one is better? B1 or B2?How do you define better?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

B2

• Support Vector MachinesFind hyperplane maximizes the margin => B1 is better than B2

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

B2

b11

b12

b21

b22

margin

• SVM vs PerceptronPerceptron: SVM: Minimizes least square error (gradient descent) Maximizes margin

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Support Vector Machines

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

B1

b11

b12

• Learning Linear SVMWe want to maximize:

Which is equivalent to minimizing:But subjected to the following constraints:

or

This is a constrained optimization problemSolve it using Lagrange multiplier method

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Unconstrained Optimizationmin E(w1,w2) = w12 + w22

w1=0, w2=0 minimizes E(w1,w2)Minimum value is E(0,0)=0

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Constrained Optimizationmin E(w1,w2) = w12 + w22subject to 2w1 + w2 = 4 (equality constraint)

Dual problem: w2w1E(w1,w2)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Linear SVMLagrange multiplier:

Take derivative w.r.t and b:

Dual problem:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Linear SVMBigger picture:Learning algorithm needs to find w and bTo solve for w:

But:

is zero for points that do not reside on

Data points where s are not zero are called support vectors

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Example of Linear SVMSupport vectors

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Linear SVMBigger pictureDecision boundary depends only on support vectors If you have data set with same support vectors, decision boundary will not change

How to classify using SVM once w and b are found? Given a test record, xi

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Support Vector MachinesWhat if the problem is not linearly separable?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Support Vector MachinesWhat if the problem is not linearly separable?Introduce slack variables Need to minimize:

Subject to:

If k is 1 or 2, this leads to same objective function as linear SVM but with different constraints (see textbook)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Nonlinear Support Vector MachinesWhat if decision boundary is not linear?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Nonlinear Support Vector MachinesTrick: Transform data into higher dimensional spaceDecision boundary:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Nonlinear SVMOptimization problem:

Which leads to the same set of equations (but involve (x) instead of x)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning NonLinear SVMIssues:What type of mapping function should be used?How to do the computation in high dimensional space? Most computations involve dot product (xi) (xj) Curse of dimensionality?

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Nonlinear SVMKernel Trick:(xi) (xj) = K(xi, xj) K(xi, xj) is a kernel function (expressed in terms of the coordinates in the original space) Examples:

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Example of Nonlinear SVMSVM with polynomial degree 2 kernel

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

• Learning Nonlinear SVMAdvantages of using kernel:Dont have to know the mapping function Computing dot product (xi) (xj) in the original space avoids curse of dimensionality

Not all functions can be kernelsMust make sure there is a corresponding in some high-dimensional spaceMercers theorem (see textbook)

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

(C) Vipin Kumar, Parallel Issues in Data Mining, VECPAR 2002

Recommended

Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents