CS434a/541a: Pattern Recognition Prof. Olga Vekslerolga/Courses/CS434a_541a/Lecture4.pdf · 3 Why...

CS434a/541a: Pattern RecognitionProf. Olga Veksler

Lecture 4

Outline

� Normal Random Variable� Properties� Discriminant functions

Why Normal Random Variables?

� Analytically tractable� Works well when observation comes

form a corrupted single prototype (µµµµ)� Is an optimal distribution of data for

many classifiers used in practice

The Univariate Normal Density

Where: µµµµ = mean (or expected value) of x

σσσσ2 = variance

��

�� −−−−−−−−====σσσσ

µµµµσσσσππππ

� x is a scalar (has dimension 1)

Several Features

� What if we have several features x1, x2, …, xd� each normally distributed� may have different means� may have different variances� may be dependent or independent of each other

� How do we model their joint distribution?

covariance of x1 and xd

inverse of ΣΣΣΣ

determinant of ΣΣΣΣ

� Multivariate normal density in d dimensions is:

��

�� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ

The Multivariate Normal Density

��

��====

σσσσσσσσ

σσσσσσσσΣΣΣΣ

��

µµµµ = [µµµµ1, µµµµ2, …, µµµµd]t

x = [x1, x2, …, xd]t

� Each xi is N(µµµµi,σσσσi2)

� to prove this, integrate out all other features from the joint density

More on ΣΣΣΣ

� plays role similar to the role that σσσσ2 plays in one dimension

� From ΣΣΣΣ we can find out 1. The individual variances of features

x1, x2, …, xd

2. If features xi and xj are � independent σσσσij=0� have positive correlation σσσσij>0� have negative correlation σσσσij<0

��

��====

σσσσσσσσ

σσσσσσσσΣΣΣΣ

��

� If ΣΣΣΣ is diagonal then the features xi,…, xj are independent, and

∏∏∏∏====

��

�� −−−−−−−−====d

σσσσµµµµ

ππππσσσσ

��

000000

σσσσσσσσ

σσσσ

scalar s (single number), the closer s to 0 the larger is p(x)normalizing

constant

��

�� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ

[[[[ ]]]]��

��

−−−−−−−−−−−−

��

−−−−−−−−−−−−−−−−⋅⋅⋅⋅====

−−−−

232313

232212

131221

332211

expc)x(pµµµµµµµµµµµµ

σσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσ

µµµµµµµµµµµµ

� Thus P(x) is larger for smaller )x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

� ΣΣΣΣ is positive semi definite (xtΣΣΣΣ x>=0)� If xtΣΣΣΣ x=0 for nonzero x then det(ΣΣΣΣ)=0. This case is

not interesting, p(x) is not defined1. one feature vector is a constant (has zero

variance) 2. or two components are multiples of each other

� so we will assume ΣΣΣΣ is positive definite (xtΣΣΣΣ x >0)

� If ΣΣΣΣ is positive definite then so is Σ Σ Σ Σ −−−−1111

1 ΜΜΜΜΜΜΜΜΦΛΦΛΦΛΦΛΦΛΦΛΦΛΦΛΣΣΣΣ ====��

��

��====

−−−−−−−−−−−−

� Thus if ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 denotes matrix s.t. ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 = ΛΛΛΛ−−−−1111

� Positive definite matrix of size d by d has d distinct real eigenvalues and its d eigenvectors are orthogonal

� Thus if ΦΦΦΦ is a matrix whose columns are normalized eigenvectors of ΣΣΣΣ, then Φ Φ Φ Φ -1= Φ Φ Φ Φ t

� ΣΦΣΦΣΦΣΦ =ΦΛΦΛΦΛΦΛ where ΛΛΛΛ is a diagonal matrix with corresponding eigenvalues on the diagonal

� Thus ΣΣΣΣ =ΦΛΦΦΛΦΦΛΦΦΛΦ−−−−1111 and ΣΣΣΣ −−−−1111 =ΦΛΦΛΦΛΦΛ−−−−1111 Φ Φ Φ Φ −−−−1111

� Thus

====−−−−−−−−====−−−−−−−− −−−− )x(MM)x()x()x( tt1t µµµµµµµµµµµµΣΣΣΣµµµµ

(((( )))) (((( )))) 2tttt )x(M)x(M)x(M µµµµµµµµµµµµ −−−−====−−−−−−−−====

21 )()()( µµµµµµµµµµµµ −−−−====−−−−ΣΣΣΣ−−−− −−−− xMxx tt

tM −−−−−−−−==== ΦΦΦΦΛΛΛΛ

� Points x which satisfy lie on an

ellipse

constxMt ====−−−−2

)( µµµµ

rotation matrix

scaling matrix

)x()x( t µµµµµµµµ −−−−−−−−

points x at equal Eucledian

distance from µµµµlie on a circle

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse: Σstretches circles to ellipses

usual (Eucledian) distance between x and µµµµ

p(x) decreases

−−−−−−−− −−−− )x()x( 1t µµµµµµµµMahalanobis distance

between x and µµµµ

p(x) decreases slowp(

eigenvectors of ΣΣΣΣ

2-d Multivariate Normal Density

� Can you see much in this graph?

� At most you can see that the mean is around [0,0], but can’t really tell if x1 and x2 are correlated

� How about this graph?

� Level curves graph� p(x) is constant along

each contour� topological map of 3-d

surface

� Now we can see much more � x1 and x1 are independent� σσσσ1

2 and σσσσ22 are equal

µµµµ

��

��==== 10

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ��

��==== 10

01ΣΣΣΣ

[[[[ ]]]]1,1====µµµµ

��

��==== 40

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

��

��==== 10

04ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

��

��==== 15.0

5.01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

��

��==== 19.0

9.01ΣΣΣΣ��

��==== 49.0

9.01ΣΣΣΣ

��

��−−−−

−−−−==== 15.05.01ΣΣΣΣ

��

��−−−−

−−−−==== 19.09.01ΣΣΣΣ

��

��−−−−

−−−−==== 49.09.01ΣΣΣΣ

� If X has density N(µµµµ,ΣΣΣΣ) then AX has density N(ΑΑΑΑtµµµµ,ΑΑΑΑtΣΑΣΑΣΑΣΑ)� Thus X can be transformed into a spherical normal

variable (covariance of spherical density is the identity matrix I) with whitening transform

21−−−−

ΦΛΦΛΦΛΦΛ====wA

��

��==== 19.0

9.01ΣΣΣΣ��

��==== 10

01ΣΣΣΣ

Discriminant Functions� Classifier can be viewed as network which

computes m discriminant functions and selects category corresponding to the largest discriminant

features

discriminantfunctions

select classgiving maximim

(((( ))))xg2 (((( ))))xgm(((( ))))xg1

1x 2x 3x dx

� gi(x) can be replaced with any monotonically increasing function, the results will be unchanged

Discriminant Functions

� The minimum error-rate classification is achieved by the discriminant function

gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)

� Since the observation x is independent of the class, the equivalent discriminant function is

gi(x) = P(x|ci)P(ci)

� For normal density, convinient to take logarithms. Since logarithm is a monotonically increasing function, the equivalent discriminant function is

gi(x) = ln P(x|ci)+ ln P(ci)

Discriminant Functions for the Normal Density

��

�� −−−−ΣΣΣΣ−−−−−−−−ΣΣΣΣ

==== −−−− )()(21

exp)2(

1)|( 1

2/12/ iit

di xxcxp µµµµµµµµππππ

)(lnln21

)()(21

)( 1iiii

tii cP

dxxxg ++++ΣΣΣΣ−−−−−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− ππππµµµµµµµµ

� Plug in p(x|ci) and P(ci) get

� Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

� Suppose we for class ci its class conditional density p(x|ci) is N(µi,Σi)

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

constant for all i

� That is

Case ΣΣΣΣi = σσσσ2I

��

��⋅⋅⋅⋅====

��

��====

100010001

000000

i σσσσσσσσ

σσσσσσσσ

� In this case, features x1, x2. ,…, xd are independent with different means and equal variances σ2

µµµµ

� Discriminant function

)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Can simplify discriminant function

(((( )))) )(lnln21

)()(21

)( 22 i

tii cPx

Ixxg ++++−−−−−−−−−−−−−−−−==== σσσσµµµµ

σσσσµµµµ

constant for all i

====++++−−−−−−−−−−−−==== )(ln)()(2

1)( 2 ii

tii cPxxxg µµµµµµµµ

σσσσ

2 ii cPx ++++−−−−−−−−==== µµµµσσσσ

� Det(ΣΣΣΣi)=σσσσ2d and ΣΣΣΣi-1=(1/σσσσ2)I

��

σσσσ

Case ΣΣΣΣi = σσσσ2I Geometric Interpretation

(((( )))) 2ii xxg µµµµ−−−−−−−−====

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx2

2i2i ++++−−−−−−−−==== µµµµ

σσσσ

then),c(Pln)c(PlnIf ji ≠≠≠≠

decision region for c2

1µµµµ

2µµµµ

3µµµµ

1µµµµ

2µµµµ

3µµµµ

voronoi diagram: points in each cell are closer to the mean in that cell

than to any other mean

Case ΣΣΣΣi = σσσσ2I====++++−−−−−−−−−−−−==== )(ln)()(

)( 2 iit

ii cPxxxg µµµµµµµµσσσσ

)c(Pln)xxxx(2

t2 ++++++++−−−−−−−−−−−−==== µµµµµµµµµµµµµµµµ

σσσσconstant

for all classes

====++++++++−−−−−−−−==== )c(Pln)x2(2

1)x(g ii

ti2i µµµµµµµµµµµµ

σσσσ

0)( itii wxwxg ++++====

))c(Pln2

(x i2i

ti ++++−−−−++++

σσσσµµµµµµµµ

σσσσµµµµ

discriminant function is linear

0)( itii wxwxg ++++====

� Thus discriminant function is linear,� Therefore the decision boundaries

gi(x)=gj(x) are linear� lines if x has dimension 2� planes if x has dimension 3� hyper-planes if x has dimension larger than 3

linear in x:

ti xwxw

constant in x

Case ΣΣΣΣi = σσσσ2I: Example� 3 classes, each 2-dimensional Gaussian with

��

��==== 21

1µµµµ��

��==== 64

2µµµµ��

��−−−−==== 4

23µµµµ ��

��============ 30

03321 ΣΣΣΣΣΣΣΣΣΣΣΣ

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

��

��++++−−−−++++==== )c(Pln

2x)x(g i2

i σσσσµµµµµµµµ

σσσσµµµµ

� Discriminant function is

[[[[ ]]]])38.1

)x(g1 −−−−−−−−++++====

� Plug in parameters for each class[[[[ ]]]]

)38.1652

)x(g2 −−−−−−−−++++====

[[[[ ]]]])69.0

42)x(g3 −−−−−−−−++++

−−−−====

Case ΣΣΣΣi = σσσσ2I: Example� Need to find out when gi(x) < gj(x) for i,j=1,2,3

� Can be done by solving gi(x) = gj(x) for i,j=1,2,3

� Let’s take g1(x) = g2(x) first

� Simplifying, [[[[ ]]]]647

1 −−−−====��

��

��−−−−−−−−

x 21 −−−−====−−−−−−−−

line equation

[[[[ ]]]] [[[[ ]]]])38.1

)38.165

(x321 −−−−−−−−++++====−−−−−−−−++++

Case ΣΣΣΣi = σσσσ2I: Example� Next solve g2(x) = g3(x)

02.6x32

x2 21 ====++++

� Almost finally solve g1(x) = g3(x)

81.1x32

x 21 −−−−====−−−−

� And finally solve g1(x) = g2(x) = g3(x)

82.4xand4.1x 21 ========

Case ΣΣΣΣi = σσσσ2I: Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ========

(((( ))))21

cP 3 ====

g1 (x) = g

g2 (x) = g

) = g 3

lines connecting means

are perpendicular to decision boundaries

� Covariance matrices are equal but arbitrary

Case ΣΣΣΣi = ΣΣΣΣ

� In this case, features x1, x2. ,…, xd are not necessarily independent

��

��==== 15.0

5.01ΣΣΣΣ

squared Mahalanobis Distance

� Discriminant function

Case ΣΣΣΣi = ΣΣΣΣ

)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

constantfor all classes

� Discriminant function becomes

)c(Pln)x()x(21

)x(g ii1t

ii ++++−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Mahalanobis Distance −−−−−−−−====−−−− −−−−−−−− )yx()yx(yx 1t2

1ΣΣΣΣ

� If ΣΣΣΣ=I, Mahalanobis Distance becomes usual Eucledian distance

)yx()yx(yx t2

I 1 −−−−−−−−====−−−− −−−−

Eucledian vs. Mahalanobis Distances

−−−−−−−−====−−−− −−−−ΣΣΣΣ−−−− )()( 12

1 µµµµµµµµµµµµ xxx t)()(2 µµµµµµµµµµµµ −−−−−−−−====−−−− xxx t

points x at equal Eucledian

distance from µµµµlie on a circle

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse:Σ stretches cirles to ellipses

eigenvectors of Σ

Case ΣΣΣΣi = ΣΣΣΣ Geometric Interpretation

(((( )))) 1ii xxg −−−−−−−−−−−−==== ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx21

xg iii 1 ++++−−−−−−−−==== −−−−ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ≠≠≠≠

1µµµµ

2µµµµ

3µµµµ

1µµµµ

2µµµµ

3µµµµ

points in each cell are closer to the mean in that cell than to any other mean under Mahalanobis distance

Case ΣΣΣΣi = ΣΣΣΣ� Can simplify discriminant function:

====++++−−−−−−−−−−−−==== −−−− )(ln)()(21

)( 1ii

tii cPxxxg µµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−−−−−− )(ln21 1111

t cPxxxx µµµµµµµµµµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(ln221 111

t cPxxx µµµµµµµµµµµµ

constant for all classes

(((( )))) )(ln221 11

ti cPx ++++ΣΣΣΣ++++ΣΣΣΣ−−−−−−−−==== −−−−−−−− µµµµµµµµµµµµ

� Thus in this case discriminant is also linear

0iti wxw ++++====��

��

�� −−−−++++==== −−−−−−−−i

1)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµ

Case ΣΣΣΣi = ΣΣΣΣ: Example� 3 classes, each 2-dimensional Gaussian with

��

��==== 21

1µµµµ��

��−−−−==== 5

12µµµµ

��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−============ 45.15.11

321 ΣΣΣΣΣΣΣΣΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

scalarrow vector

Case ΣΣΣΣi = ΣΣΣΣ: Example

��

�� −−−−++++====��

��

�� −−−−++++ −−−−−−−−−−−−−−−−i

1)c(Plnx

)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( )))) ��

��

�� −−−−++++��

��

�� −−−−−−−−====−−−− −−−−−−−−−−−−−−−−i

1)c(Pln

)c(Plnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( ))))��

��

��−−−−++++====−−−− −−−−−−−−−−−−

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

� Let’s solve in general first

(((( )))) (((( ))))xgxg ij ====

� Let’s regroup the terms

� We get the line where (((( )))) (((( ))))xgxg ij ====

Case ΣΣΣΣi = ΣΣΣΣ: Example

[[[[ ]]]] 0x02 ====−−−−

� Now substitute for i,j=1,2

[[[[ ]]]] 41.2x4.114.3 −−−−====−−−−−−−−

� Now substitute for i,j=2,3

[[[[ ]]]] 41.2x43.114.5 −−−−====−−−−−−−−� Now substitute for i,j=1,3

(((( ))))��

��

��−−−−++++====−−−− −−−−−−−−−−−−

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

0x1 ====

41.2x4.1x14.3 21 ====++++

41.2x43.1x14.5 21 ====++++

Case ΣΣΣΣi = ΣΣΣΣ : Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

lines connecting means

are not in general perpendicular to

decision boundaries

� Covariance matrices for each class are arbitrary

General Case ΣΣΣΣi are arbitrary

� In this case, features x1, x2. ,…, xd are not necessarily independent

��

��====ΣΣΣΣ 15.0

5.01i ��

��−−−−

−−−−====ΣΣΣΣ 49.09.01

� From previous discussion,

� This can’t be simplified, but we can rearrange it:

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

(((( )))) )(lnln21

)( 111iiii

ti cPxxxxg ++++ΣΣΣΣ−−−−ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− µµµµµµµµµµµµ

��

�� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��

��

�� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

)( 111iiii

ti cPxxxxg µµµµµµµµµµµµ

0)( itt

i wxwWxxxg ++++++++====

0)( itt

i wxwWxxxg ++++++++====

� Thus the discriminant function is quadratic� Therefore the decision boundaries are quadratic

(ellipses and parabolloids)

quadratic in x since

======== ====

========d

jijiij

t xxwxxwWxx1,1 1

linear in x

constant in x

� 3 classes, each 2-dimensional Gaussian with

��

��−−−−==== 3

11µµµµ

��

��==== 60

2µµµµ��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−==== 25.05.01

1ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

General Case ΣΣΣΣi are arbitrary: Example

��

��−−−−

−−−−==== 7222

2ΣΣΣΣ��

��==== 35.1

5.113ΣΣΣΣ

��

�� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��

��

�� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

)( 111iiii

ti cPxxxxg µµµµµµµµµµµµ

� Need to solve a bunch of quadratic inequalities of 2 variables

� Priors: and

General Case ΣΣΣΣi are arbitrary: Example

��

��−−−−==== 3

11µµµµ

��

��==== 60

2µµµµ��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−==== 25.05.01

1ΣΣΣΣ��

��−−−−

−−−−==== 7222

2ΣΣΣΣ��

��==== 35.1

5.113ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� The Bayes classifier when classes are normally distributed is in general quadratic� If covariance matrices are equal and proportional to

identity matrix, the Bayes classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Eucledian distance classifier� If covariance matrices are equal, the Bayes

classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Mahalanobis distance classifier

� Popular classifiers (Euclidean and Mahalanobis distance) are optimal only if distribution of data is appropriate (normal)

Important Points

CS434a/541a: Pattern Recognition Prof. Olga Vekslerolga/Courses/CS434a_541a/Lecture4.pdf · 3 Why...

Documents

Lecture 15 - University of Western Ontariooveksler/Courses/CS434a_541a/Lecture15.… · 1. Labeling large datasets is very costly (speech recognition) sometimes can label only a few

Lecture 17 - Computer Science - Western Universityoveksler/Courses/CS434a_541a/Lecture17.… · Lecture 17. Today Parametric Unsupervised Learning Expectation Maximization (EM) one

CS434a/541a: Pattern Recognition Prof. Olga Veksler

Lecture 6 - University of Western Ontariooveksler/Courses/CS434a_541a/Lecture6.pdfNon-Parametric Methods Neither probability distribution nor discriminant function is known Happens

Lecture 10 - Computer Science - Western Universityolga/Courses/CS434a_541a/Lecture10.pdf · LDF: Perceptron Criterion Function The perceptron criterion function try to find weight

CS4442/9542b Artificial Intelligence II Prof. Olga Vekslerolga/Courses/Winter2018/CS4442_9542b/L2-MLIn… · 5.7.3 » ¼ º « ¬ ª 8.7 6 » ¼ º « ¬ ª 1.7 2.3 » ¼ º « ¬

Lecture 5 - Computer Science - Western Universityolga/Courses/CS434a_541a/Lecture5.pdf · Prof. Olga Veksler Lecture 5. Today Introduction to parameter estimation Two methods for

CS4442/9542b: Artificial Intelligence II Prof. Olga Vekslerolga/Courses/Winter2010/CS4442_9542b/L16-CV … · 1 CS4442/9542b: Artificial Intelligence II Prof. Olga Veksler Lecture

CS434a/541a: Pattern Recognition Prof. Olga Vekslercs.haifa.ac.il/~rita/ml_course/lectures/Boosting.pdf · from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire Original

CS434a/541a: Pattern Recognition Prof. Olga Veksleroveksler/Courses/Fall2016/CS9840/Lecture… · Olga Veksler Lecture 4 Image Representation . Outline •How to represent an image

Lecture 14 - csd.uwo.caolga/Courses//CS434a_541a//Lecture14.pdf · training time c l a s s i f i c a t i o n e r r o r training err or t e s t i n g e r r o r this is a good time

CS4442/9542b ArtificialIntelligence II prof. Olga Vekslerolga/Courses/Winter2013/CS4442_9542b/L11-NLP-I… · •Speech •Images •music ... Analysis of the Boolean Model •advantages

Lecture 6 - Computer Science - Western Universityolga/Courses//CS434a_541a//Lecture6.pdf · Lecture 6. Today Introduction to nonparametric techniques Basic Issues in Density Estimation

Lecture 7 - University of Western Ontarioolga/Courses/CS434a_541a/Lecture7.pdf · Prof. Olga Veksler Lecture 7. Today Problems of high dimensional data, “the curse of dimensionality

Lecture 5 - University of Western Ontariooveksler/Courses/CS434a_541a/Lecture5.p… · Lecture 5. Today Introduction to parameter estimation Two methods for parameter estimation Maximum

Lecture 15 - University of Western Ontarioolga/Courses/CS434a_541a/Lecture15.pdf · Lecture 15. Today New Topic: Unsupervised Learning Supervised vs. unsupervised learning Unsupervised

Lecture 9 - Computer Science - Western Universityoveksler/Courses/CS434a_541a/Lecture9.pdfLecture 9. Announcements Final project proposal due Nov. 1 1-2 paragraph description Late

Lecture 11 - Western Universityolga/Courses/CS434a_541a/Lecture11.pdf · SVM Said to start in 1979 with Vladimir Vapnik’s paper Major developments throughout 1990’s Elegant theory

CS4442/9542b Artificial Intelligence II prof. Olga Vekslerolga/Courses/Winter2017/CS4442_9542b/L15-NL… · CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 15

Lecture 8 - University of Western Ontarioolga/Courses/CS434a_541a/Lecture...Multiple Discriminant Analysis (MDA) Can generalize FLD to multiple classes In case of c classes, can reduce