30
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review

Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Announcements - Homework

• Homework 1 is graded, please collect at end of lecture

• Homework 2 due today

• Homework 3 out soon (watch email)

• Ques 1 – midterm review

Page 2: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

HW1 score distribution

2

0

5

10

15

20

25

30

35

40

0~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~100 100~110

HW1 total score

Page 3: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Announcements - Midterm

• When: Wednesday, 10/20

• Where: In Class

• What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :)

• What NOT: No computers, iphones, or anything else that has an internet connection.

• Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick

3

Page 4: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Recitation Tomorrow!

• Boosting, SVM (convex optimization),

Midterm review!

• Strongly recommended!!

• Place: NSH 3305 (Note: change from last time)

• Time: 5-6 pm

Rob

Page 5: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Support Vector Machines

Aarti Singh

Machine Learning 10-701/15-781Oct 13, 2010

Page 6: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

At Pittsburgh G-20 summit …

6

Page 7: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Linear classifiers – which line is better?

7

Page 8: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Pick the one with the largest margin!

8

Page 9: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Parameterizing the decision boundary

9

w.x + b > 0 w.x + b < 0

Data:Example i (= 1,2,…,n):

w.x = j w(j) x(j)

Page 10: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Parameterizing the decision boundary

10

w.x + b > 0 w.x + b < 0

Page 11: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Maximizing the margin

11

margin = g = 2a/ǁwǁ

w.x + b > 0 w.x + b < 0

g g

Distance of closest examples from the line/hyperplane

Page 12: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Maximizing the margin

12

w.x + b > 0 w.x + b < 0

g gmax g = 2a/ǁwǁ

w,b

s.t. (w.xj+b) yj ≥ a j

margin = g = 2a/ǁwǁ

Note: ‘a’ is arbitrary (can normalize equations by a)

Distance of closest examples from the line/hyperplane

Page 13: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Support Vector Machines

13

w.x + b > 0 w.x + b < 0

g g

min w.ww,b

s.t. (w.xj+b) yj ≥ 1 j

Solve efficiently by quadratic programming (QP)

– Well-studied solution algorithms

Linear hyperplane defined by “support vectors”

Page 14: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Support Vectors

14

w.x + b > 0 w.x + b < 0

g g

Linear hyperplane defined by “support vectors”

Moving other points a little doesn’t effect the decision boundary

only need to store the support vectors to predict labels of new points

How many support vectors in linearly separable case?

≤ m+1

Page 15: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

What if data is not linearly separable?

15

Use features of features of features of features….

But run risk of overfitting!

x12, x2

2, x1x2, …., exp(x1)

Page 16: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

What if data is still not linearly separable?

16

min w.w + C #mistakes w,b

s.t. (w.xj+b) yj ≥ 1 j

Allow “error” in classification

Maximize margin and minimize # mistakes on training data

C - tradeoff parameter

Not QP

0/1 loss (doesn’t distinguish between

near miss and bad mistake)

Page 17: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

What if data is still not linearly separable?

17

min w.w + C Σξjw,b

s.t. (w.xj+b) yj ≥ 1-ξj j

ξj ≥ 0 j

j

Allow “error” in classification

ξj - “slack” variables = (>1 if xj misclassifed)

pay linear penalty if mistake

C - tradeoff parameter (chosen by

cross-validation)

Still QP

Soft margin approach

Page 18: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Slack variables – Hinge loss

18

min w.w + C Σξjw,b

s.t. (w.xj+b) yj ≥ 1-ξj j

ξj ≥ 0 j

j

Complexity penalization

Hinge loss

0-1 loss

0-1 1

Page 19: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

SVM vs. Logistic Regression

19

SVM : Hinge loss

0-1 loss

0-1 1

Logistic Regression : Log loss ( -ve log conditional likelihood)

Hinge lossLog loss

Page 20: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

What about multiple classes?

20

Page 21: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

One against all

21

Learn 3 classifiers separately: Class k vs. rest

(wk, bk)k=1,2,3

y = arg max wk.x + bkk

But wks may not be based on the same scale.Note: (aw).x + (ab) is also a solution

Page 22: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Learn 1 classifier: Multi-class SVM

22

Simultaneously learn 3 sets of weights

y = arg max w(k).x + b(k)

Margin - gap between correct class and nearest other class

Page 23: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Learn 1 classifier: Multi-class SVM

23

Simultaneously learn 3 sets of weights

y = arg max w(k).x + b(k)

Joint optimization: wks have the same scale.

Page 24: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

What you need to know

• Maximizing margin

• Derivation of SVM formulation

• Slack variables and hinge loss

• Relationship between SVMs and logistic regression– 0/1 loss

– Hinge loss

– Log loss

• Tackling multiple class– One against All

– Multiclass SVMs

24

Page 25: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

SVMs reminder

25

min w.w + C Σξjw,b

s.t. (w.xj+b) yj ≥ 1-ξj j

ξj ≥ 0 j

Hinge lossRegularization

Soft margin approach

Page 26: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Today’s Lecture

• Learn one of the most interesting and exciting recent advancements in machine learning

– The “kernel trick”

– High dimensional feature spaces at no extra cost!

• But first, a detour

– Constrained optimization!

26

Page 27: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Constrained Optimization

27

Page 28: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Lagrange Multiplier – Dual Variables

28

Moving the constraint to objective functionLagrangian:

Solve:

Constraint is tight when a > 0

Page 29: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Duality

29

Primal problem: Dual problem:

Weak duality –

For all feasible points

Strong duality – (holds under KKT conditions)

Page 30: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework

Lagrange Multiplier – Dual Variables

30

Solving:b -ve b +ve

When a > 0, constraint is tight