14
MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised Learning Technique with: 1. Improved generalization ability Use low complexity functions (hyperplanes). Mimimize bounds on the true risk. Alexandre Bernardino, [email protected] Machine Learning, 2009/2010 2. Global solution Convex quadratic programming formulation 3. Can cope with non-linear problems through kernels Transform the original data into a higher dimension space (feature space). Perform the optimization in the high dimension spaces, where linear methods can be employed.

Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

MACHINE LEARNING 09/ 10

Support Vector MachinesHyperplane Classifiers

Support Vector Machines

� Supervised Learning Technique with:

1. Improved generalization ability

� Use low complexity functions (hyperplanes).

� Mimimize bounds on the true risk.

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

2. Global solution

� Convex quadratic programming formulation

3. Can cope with non-linear problems through kernels

� Transform the original data into a higher dimension space (feature space).

� Perform the optimization in the high dimension spaces, where linear methods can be employed.

Page 2: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Motivation

� Animations in:

� http://www.youtube.com/watch?v=3liCbRZPrZA

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Applications

� Pattern Recognition / Classification

� xi in Rd

� yi in {-1,1}

x2

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

x1

x2

Page 3: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Applications

� Regression

� xi in Rd

� yi in R

y

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

x

y

Historical Background

� Based on the Generalized Portrait Algorithm (60’s, Russia, Vapnik, Lerner, Chervonenkis)

� Developed in the 90’s at AT&T Bell Labs (Vapnik, Boser, Guyon, Cortes, Schölkopf)

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Developed in the 90’s at AT&T Bell Labs (Vapnik, Boser, Guyon, Cortes, Schölkopf)

� Initial Industrial Context (OCR) (mid 90’s)

� Excelent performances found in regression and time series prediction (late 90’s)

Page 4: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Why “Support Vector” Machine (SVM) ?

� Supervised Learning� Collect data from real experiments {(xi, yi)}, i = 1 L n

xi yif

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Use training data to estimate an approximation of f

� The SVM selectivelly chooses from the input vectors, the ones that are “important” – SUPPORT VECTORS – all the others are disregarded.

SVMxi

yi

f’

Distinguishing Features

� Sound Theoretical Formulation (statistical learning theory)

� Bounds on performance

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Bounds on performance

� Addresses the generalization problem (structural risk minimization)

Page 5: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

The Generalization Problem

yTraining Samples

Test Samples

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Lessons from NN

� Too few units (parameters): high training error and high test error

� Too many units (parameters): low training error and high test error

x

Statistical Learning Theory Framework

� Machine to learn the map xi a yi = f(x,α)

� Data drawn iid from P(x,y)

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Actual Risk

� Empirical Risk

Page 6: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Bound on Generalization Performance

� 2-class pattern recognition problem : yi in {-1,1}

� Choose η in [0,1]. With probability 1-η the following bound holds:

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� h is the Vapnik-Chervonenkis (VC) dimension and is a measure of the “capacity” of the machine.

� Within a set of learning machines, the best is the one that minimizes the right hand side

VC Confidence

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 7: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

VC Dimension

� Property of a set of functions F = {f(x,α)}

� A set N of n points can be labeled in 2n possible ways

If for each labeling, a function of the set F can

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� If for each labeling, a function of the set F can correcly assign those labels, then N is shattered by F

� VC Dimension is the maximum number of training points that can be shattered by F.

Shattering with Oriented Hyperplanes in R2

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 8: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Hiperplane VC dimension

� The VC dimension of an Hyperplane in R is d+1

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� The VC dimension of an Hyperplane in R is d+1

VC Dimension and the number of parameters

� VC dimension ≠ number of parameters

� Striking example : 1 parameter function that shatters infinite points

� f(x,α) ´ sin(αx), x,α 2 R

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 9: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Linear Separable SVM’s

� Separating Hyperplane H: w.x + b = 0

� Margin: shortest distance from H to the closest positive (negative) sample.

� SVM computes H with largest symmetric margin.

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

H H1

H2

support vectors H: w.x + b = 0

H1: w.x + b = 1

H2: w.x + b = -1

Margin: m = 2/||w||

The optimization problem

� Maximize the margin =>

=> Minimize ||w||2

� Constraints:

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 10: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Primal Lagrangian

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Convex quadratic programming problem

Wolfe dual formulation

� Easier solution and better extension to the non-linear case.

� Primal problem equivalent to:Maximize LP w.r.t. α

subject to: αi > 0

Gradw,bLP = 0

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 11: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Primal vs Dual Problems

� There are two equivalent ways to compute the solution:� Maximize the distance between two parallel supporting planes

(primal).

� Find the closest point in the convex hull of the two classes (dual).

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� Ref: Duality and Geometry in SVM Classifiers, Bennett and Bredensteiner, 2000.

Karhush-Khun-Tucker Conditions

� Solve dual problem => obtain αi

� Use Karhush-Kuhn-Tucker conditions (necessary and sufficient in the SVM problem)

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Page 12: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Computing the Solution

� To compute the solution we must rely on numerical methods.

� There are several numerical packages that solve the quadratic programming problem.

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

quadratic programming problem.

� In the lab it will be used the Support Vector Machine Toolbox by Steve Gunn, that uses interior point optimization.

Complementarity Condition

� KKT Complementarity condition

� Can be satisfied if:

α

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� αi = 0 -> data vector are not at the boundary, are irrelevant to the solution

� yi (xi w + b) – 1 = 0 -> αi = 0 and data vectors are on the boundary, they are the support vectors.

� Support Vectors( ){ }

( ) Sxxf

ixS

ii

ii

∈∀=

>=

,1

0:: α

Page 13: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Primal Solution

� After solving the dual problem, the primal problem solution is:

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

Test Phase

x1 +

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

x1

x2

x3

xi1

xi2

xi3

α1 y1

αi yi

αn yn

+

+

+

+

b

ysgn

Page 14: Support Vector Machines Hyperplane Classifiersalex/aauto0910/lecture13SVM.pdf · MACHINE LEARNING 09/10 Support Vector Machines Hyperplane Classifiers Support Vector Machines Supervised

Example

� Support Vector Machine � Perceptron

Mechanical Analogy

� Each support vector exerts a force

on the separating hyperplane.

� The resulting force and torque is zero!

i

iii

w

wyα

Alexandre Bernardino, [email protected] Machine Learning, 2009/2010

� The resulting force and torque is zero!

H H1

H2

∑∑ ==⇒=i i

iii

i

iiw

wyFy 00 αα

0=×=×=⇒

⇒=

∑∑

i

i

i

iii

i

iii

i

i

i

iii

w

wxy

w

wyxT

xyw

αα

α