CS434a/541a: Pattern Recognition Prof. Olga Vekslerolga/Courses/CS434a_541a/Lecture4.pdf · 3 Why Normal Random Variables? Analytically tractable Works well when observation comes

1

CS434a/541a: Pattern RecognitionProf. Olga Veksler

Lecture 4

2

Outline

� Normal Random Variable� Properties� Discriminant functions

3

Why Normal Random Variables?

� Analytically tractable� Works well when observation comes

form a corrupted single prototype (µµµµ)� Is an optimal distribution of data for

many classifiers used in practice

4

The Univariate Normal Density

Where: µµµµ = mean (or expected value) of x

σσσσ2 = variance

,x

21

exp21

)x(p2

��

��

��

��

��

�� −−−−−−−−====σσσσ

µµµµσσσσππππ

� x is a scalar (has dimension 1)

5

6

Several Features

� What if we have several features x1, x2, …, xd� each normally distributed� may have different means� may have different variances� may be dependent or independent of each other

� How do we model their joint distribution?

7

covariance of x1 and xd

inverse of ΣΣΣΣ

determinant of ΣΣΣΣ

� Multivariate normal density in d dimensions is:

��

��

�� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ

The Multivariate Normal Density

��

��

��

��

��

��====

2d1d

d121

σσσσσσσσ

σσσσσσσσΣΣΣΣ

�

��

�

µµµµ = [µµµµ1, µµµµ2, …, µµµµd]t

x = [x1, x2, …, xd]t

� Each xi is N(µµµµi,σσσσi2)

� to prove this, integrate out all other features from the joint density

8

More on ΣΣΣΣ

� plays role similar to the role that σσσσ2 plays in one dimension

� From ΣΣΣΣ we can find out 1. The individual variances of features

x1, x2, …, xd

2. If features xi and xj are � independent σσσσij=0� have positive correlation σσσσij>0� have negative correlation σσσσij<0

��

��

��

��

��

��====

2d1d

d121

σσσσσσσσ

σσσσσσσσΣΣΣΣ

�

��

�

9


� If ΣΣΣΣ is diagonal then the features xi,…, xj are independent, and

∏∏∏∏====

��

��

�� −−−−−−−−====d

i i

ii

i

xxp

1

2

22)(

exp2

1)(

σσσσµµµµ

ππππσσσσ

��

��

��

��

��

��

23

22

21

000000

σσσσσσσσ

σσσσ

10

scalar s (single number), the closer s to 0 the larger is p(x)normalizing

constant

��

��

�� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ


[[[[ ]]]]��

��

��

��

��

��

��

��

��

��

��

��

−−−−−−−−−−−−

��

��

��

��

��

��

−−−−−−−−−−−−−−−−⋅⋅⋅⋅====

−−−−

33

22

11

1

232313

232212

131221

332211

xxx

xxx21

expc)x(pµµµµµµµµµµµµ

σσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσ

µµµµµµµµµµµµ

� Thus P(x) is larger for smaller )x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

� ΣΣΣΣ is positive semi definite (xtΣΣΣΣ x>=0)� If xtΣΣΣΣ x=0 for nonzero x then det(ΣΣΣΣ)=0. This case is

not interesting, p(x) is not defined1. one feature vector is a constant (has zero

variance) 2. or two components are multiples of each other

� so we will assume ΣΣΣΣ is positive definite (xtΣΣΣΣ x >0)

� If ΣΣΣΣ is positive definite then so is Σ Σ Σ Σ −−−−1111


t

t

21

21

1 ΜΜΜΜΜΜΜΜΦΛΦΛΦΛΦΛΦΛΦΛΦΛΦΛΣΣΣΣ ====��

��

��

��

��====

−−−−−−−−−−−−

� Thus if ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 denotes matrix s.t. ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 = ΛΛΛΛ−−−−1111

� Positive definite matrix of size d by d has d distinct real eigenvalues and its d eigenvectors are orthogonal

� Thus if ΦΦΦΦ is a matrix whose columns are normalized eigenvectors of ΣΣΣΣ, then Φ Φ Φ Φ -1= Φ Φ Φ Φ t

� ΣΦΣΦΣΦΣΦ =ΦΛΦΛΦΛΦΛ where ΛΛΛΛ is a diagonal matrix with corresponding eigenvalues on the diagonal

� Thus ΣΣΣΣ =ΦΛΦΦΛΦΦΛΦΦΛΦ−−−−1111 and ΣΣΣΣ −−−−1111 =ΦΛΦΛΦΛΦΛ−−−−1111 Φ Φ Φ Φ −−−−1111


� Thus

� Thus

where

====−−−−−−−−====−−−−−−−− −−−− )x(MM)x()x()x( tt1t µµµµµµµµµµµµΣΣΣΣµµµµ

(((( )))) (((( )))) 2tttt )x(M)x(M)x(M µµµµµµµµµµµµ −−−−====−−−−−−−−====

21 )()()( µµµµµµµµµµµµ −−−−====−−−−ΣΣΣΣ−−−− −−−− xMxx tt

121

tM −−−−−−−−==== ΦΦΦΦΛΛΛΛ

� Points x which satisfy lie on an

ellipse

constxMt ====−−−−2

)( µµµµ

rotation matrix

scaling matrix

)x()x( t µµµµµµµµ −−−−−−−−

µ

points x at equal Eucledian

distance from µµµµlie on a circle

µ

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse: Σstretches circles to ellipses


usual (Eucledian) distance between x and µµµµ

p(x) decreases

−−−−−−−− −−−− )x()x( 1t µµµµµµµµMahalanobis distance

between x and µµµµ

p(x) decreases slowp(

x) d

ecre

ases

fast

eigenvectors of ΣΣΣΣ

15

2-d Multivariate Normal Density

� Can you see much in this graph?

� At most you can see that the mean is around [0,0], but can’t really tell if x1 and x2 are correlated

16


� How about this graph?

17


� Level curves graph� p(x) is constant along

each contour� topological map of 3-d

surface

� Now we can see much more � x1 and x1 are independent� σσσσ1

2 and σσσσ22 are equal

µµµµ

18


��

��==== 10

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ��

��==== 10

01ΣΣΣΣ

[[[[ ]]]]1,1====µµµµ

��

��==== 40

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

��

��==== 10

04ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ


��

��==== 15.0

5.01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

��

��==== 19.0

9.01ΣΣΣΣ��

��==== 49.0

9.01ΣΣΣΣ

��

��−−−−

−−−−==== 15.05.01ΣΣΣΣ

��

��−−−−

−−−−==== 19.09.01ΣΣΣΣ

��

��−−−−

−−−−==== 49.09.01ΣΣΣΣ

20


� If X has density N(µµµµ,ΣΣΣΣ) then AX has density N(ΑΑΑΑtµµµµ,ΑΑΑΑtΣΑΣΑΣΑΣΑ)� Thus X can be transformed into a spherical normal

variable (covariance of spherical density is the identity matrix I) with whitening transform

21−−−−

ΦΛΦΛΦΛΦΛ====wA

��

��==== 19.0

9.01ΣΣΣΣ��

��==== 10

01ΣΣΣΣ

X AX

Discriminant Functions� Classifier can be viewed as network which

computes m discriminant functions and selects category corresponding to the largest discriminant

features

discriminantfunctions

select classgiving maximim

(((( ))))xg2 (((( ))))xgm(((( ))))xg1

1x 2x 3x dx

� gi(x) can be replaced with any monotonically increasing function, the results will be unchanged

22

Discriminant Functions

� The minimum error-rate classification is achieved by the discriminant function

gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)

� Since the observation x is independent of the class, the equivalent discriminant function is

gi(x) = P(x|ci)P(ci)

� For normal density, convinient to take logarithms. Since logarithm is a monotonically increasing function, the equivalent discriminant function is

gi(x) = ln P(x|ci)+ ln P(ci)

Discriminant Functions for the Normal Density

��

��

�� −−−−ΣΣΣΣ−−−−−−−−ΣΣΣΣ

==== −−−− )()(21

exp)2(

1)|( 1

2/12/ iit

ii

di xxcxp µµµµµµµµππππ

)(lnln21

2ln2

)()(21

)( 1iiii

tii cP

dxxxg ++++ΣΣΣΣ−−−−−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− ππππµµµµµµµµ

� Plug in p(x|ci) and P(ci) get

� Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

� Suppose we for class ci its class conditional density p(x|ci) is N(µi,Σi)

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

constant for all i

24

� That is

Case ΣΣΣΣi = σσσσ2I

��

��

��

��⋅⋅⋅⋅====

��

��

��

��

��

��====

100010001

000000

2

2

2

2

i σσσσσσσσ

σσσσσσσσ

� In this case, features x1, x2. ,…, xd are independent with different means and equal variances σ2

µµµµ

25

� Discriminant function


)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Can simplify discriminant function

(((( )))) )(lnln21

)()(21

)( 22 i

di

tii cPx

Ixxg ++++−−−−−−−−−−−−−−−−==== σσσσµµµµ

σσσσµµµµ

constant for all i

====++++−−−−−−−−−−−−==== )(ln)()(2

1)( 2 ii

tii cPxxxg µµµµµµµµ

σσσσ

)(ln2

1 2

2 ii cPx ++++−−−−−−−−==== µµµµσσσσ

� Det(ΣΣΣΣi)=σσσσ2d and ΣΣΣΣi-1=(1/σσσσ2)I

��

��

��

��

��

��

====

2

2

2

100

01

0

001

σσσσ

σσσσ

σσσσ

Case ΣΣΣΣi = σσσσ2I Geometric Interpretation

(((( )))) 2ii xxg µµµµ−−−−−−−−====

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx2

1xg i

2i2i ++++−−−−−−−−==== µµµµ

σσσσ

then),c(Pln)c(PlnIf ji ≠≠≠≠

decision region for c2



in c3

1µµµµ

2µµµµ

3µµµµ




x

1µµµµ

2µµµµ

3µµµµ

voronoi diagram: points in each cell are closer to the mean in that cell

than to any other mean

27

Case ΣΣΣΣi = σσσσ2I====++++−−−−−−−−−−−−==== )(ln)()(

21

)( 2 iit

ii cPxxxg µµµµµµµµσσσσ

)c(Pln)xxxx(2

1ii

tii

tti

t2 ++++++++−−−−−−−−−−−−==== µµµµµµµµµµµµµµµµ

σσσσconstant

for all classes

====++++++++−−−−−−−−==== )c(Pln)x2(2

1)x(g ii

ti

ti2i µµµµµµµµµµµµ

σσσσ

0)( itii wxwxg ++++====

))c(Pln2

(x i2i

ti

2

ti ++++−−−−++++

σσσσµµµµµµµµ

σσσσµµµµ

discriminant function is linear


0)( itii wxwxg ++++====

� Thus discriminant function is linear,� Therefore the decision boundaries

gi(x)=gj(x) are linear� lines if x has dimension 2� planes if x has dimension 3� hyper-planes if x has dimension larger than 3

linear in x:

====

====d

iii

ti xwxw

1

constant in x

29

Case ΣΣΣΣi = σσσσ2I: Example� 3 classes, each 2-dimensional Gaussian with

��

��==== 21

1µµµµ��

��==== 64

2µµµµ��

��−−−−==== 4

23µµµµ ��

��============ 30

03321 ΣΣΣΣΣΣΣΣΣΣΣΣ

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

��

��

��++++−−−−++++==== )c(Pln

2x)x(g i2

iti

2

ti

i σσσσµµµµµµµµ

σσσσµµµµ

� Discriminant function is

[[[[ ]]]])38.1

65

(x321

)x(g1 −−−−−−−−++++====

� Plug in parameters for each class[[[[ ]]]]

)38.1652

(x364

)x(g2 −−−−−−−−++++====

[[[[ ]]]])69.0

620

(x3

42)x(g3 −−−−−−−−++++

−−−−====

30

Case ΣΣΣΣi = σσσσ2I: Example� Need to find out when gi(x) < gj(x) for i,j=1,2,3

� Can be done by solving gi(x) = gj(x) for i,j=1,2,3

� Let’s take g1(x) = g2(x) first

� Simplifying, [[[[ ]]]]647

xx

343

2

1 −−−−====��

��

��−−−−−−−−

647

x34

x 21 −−−−====−−−−−−−−

line equation

[[[[ ]]]] [[[[ ]]]])38.1

652

(x364

)38.165

(x321 −−−−−−−−++++====−−−−−−−−++++

31

Case ΣΣΣΣi = σσσσ2I: Example� Next solve g2(x) = g3(x)

02.6x32

x2 21 ====++++

� Almost finally solve g1(x) = g3(x)

81.1x32

x 21 −−−−====−−−−

� And finally solve g1(x) = g2(x) = g3(x)

82.4xand4.1x 21 ========

32

Case ΣΣΣΣi = σσσσ2I: Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ========

c1

c2

c3

(((( ))))21

cP 3 ====

g1 (x) = g

2 (x)

g2 (x) = g

3 (x)

g 1(x

) = g 3

(x)

lines connecting means

are perpendicular to decision boundaries

33

� Covariance matrices are equal but arbitrary

Case ΣΣΣΣi = ΣΣΣΣ

� In this case, features x1, x2. ,…, xd are not necessarily independent

��

��==== 15.0

5.01ΣΣΣΣ

squared Mahalanobis Distance

� Discriminant function

Case ΣΣΣΣi = ΣΣΣΣ

)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

constantfor all classes

� Discriminant function becomes

)c(Pln)x()x(21

)x(g ii1t

ii ++++−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Mahalanobis Distance −−−−−−−−====−−−− −−−−−−−− )yx()yx(yx 1t2

1ΣΣΣΣ

� If ΣΣΣΣ=I, Mahalanobis Distance becomes usual Eucledian distance

)yx()yx(yx t2

I 1 −−−−−−−−====−−−− −−−−

Eucledian vs. Mahalanobis Distances

−−−−−−−−====−−−− −−−−ΣΣΣΣ−−−− )()( 12

1 µµµµµµµµµµµµ xxx t)()(2 µµµµµµµµµµµµ −−−−−−−−====−−−− xxx t

µ

points x at equal Eucledian

distance from µµµµlie on a circle

µ

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse:Σ stretches cirles to ellipses

eigenvectors of Σ

Case ΣΣΣΣi = ΣΣΣΣ Geometric Interpretation

(((( )))) 1ii xxg −−−−−−−−−−−−==== ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx21

xg iii 1 ++++−−−−−−−−==== −−−−ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ≠≠≠≠

1µµµµ

2µµµµ

3µµµµ




1µµµµ

2µµµµ

3µµµµ




points in each cell are closer to the mean in that cell than to any other mean under Mahalanobis distance

Case ΣΣΣΣi = ΣΣΣΣ� Can simplify discriminant function:

====++++−−−−−−−−−−−−==== −−−− )(ln)()(21

)( 1ii

tii cPxxxg µµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−−−−−− )(ln21 1111

iitii

tti

t cPxxxx µµµµµµµµµµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(ln221 111

iiti

ti

t cPxxx µµµµµµµµµµµµ

constant for all classes

(((( )))) )(ln221 11

iiti

ti cPx ++++ΣΣΣΣ++++ΣΣΣΣ−−−−−−−−==== −−−−−−−− µµµµµµµµµµµµ

� Thus in this case discriminant is also linear

0iti wxw ++++====��

��

��

�� −−−−++++==== −−−−−−−−i

1tii

1ti 2

1)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµ

Case ΣΣΣΣi = ΣΣΣΣ: Example� 3 classes, each 2-dimensional Gaussian with

��

��==== 21

1µµµµ��

��−−−−==== 5

12µµµµ

��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−============ 45.15.11

321 ΣΣΣΣΣΣΣΣΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

scalarrow vector

Case ΣΣΣΣi = ΣΣΣΣ: Example

��

��

�� −−−−++++====��

��

�� −−−−++++ −−−−−−−−−−−−−−−−i

1tii

1tij

1tjj

1tj 2

1)c(Plnx

21

)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( )))) ��

��

�� −−−−++++��

��

�� −−−−−−−−====−−−− −−−−−−−−−−−−−−−−i

1tiij

1tjj

1ti

1tj 2

1)c(Pln

21

)c(Plnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( ))))��

��

��

��−−−−++++====−−−− −−−−−−−−−−−−

i1t

ij1t

jj

i1ti

tj 2

121

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

� Let’s solve in general first

(((( )))) (((( ))))xgxg ij ====

� Let’s regroup the terms

� We get the line where (((( )))) (((( ))))xgxg ij ====

Case ΣΣΣΣi = ΣΣΣΣ: Example

[[[[ ]]]] 0x02 ====−−−−

� Now substitute for i,j=1,2

[[[[ ]]]] 41.2x4.114.3 −−−−====−−−−−−−−

� Now substitute for i,j=2,3

[[[[ ]]]] 41.2x43.114.5 −−−−====−−−−−−−−� Now substitute for i,j=1,3

(((( ))))��

��

��

��−−−−++++====−−−− −−−−−−−−−−−−

i1t

ij1t

jj

i1ti

tj 2

121

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

0x1 ====

41.2x4.1x14.3 21 ====++++

41.2x43.1x14.5 21 ====++++

41

Case ΣΣΣΣi = ΣΣΣΣ : Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

lines connecting means

are not in general perpendicular to

decision boundaries

c2

c3

c1

42

� Covariance matrices for each class are arbitrary

General Case ΣΣΣΣi are arbitrary

� In this case, features x1, x2. ,…, xd are not necessarily independent

��

��====ΣΣΣΣ 15.0

5.01i ��

��−−−−

−−−−====ΣΣΣΣ 49.09.01

j

43

� From previous discussion,


� This can’t be simplified, but we can rearrange it:

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

(((( )))) )(lnln21

221

)( 111iiii

tii

tii

ti cPxxxxg ++++ΣΣΣΣ−−−−ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− µµµµµµµµµµµµ

��

��

�� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��

��

�� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

21

21

)( 111iiii

tii

tii

ti cPxxxxg µµµµµµµµµµµµ

0)( itt

i wxwWxxxg ++++++++====

44


0)( itt

i wxwWxxxg ++++++++====

� Thus the discriminant function is quadratic� Therefore the decision boundaries are quadratic

(ellipses and parabolloids)

quadratic in x since

======== ====

========d

jijiij

d

j

d

ijiij

t xxwxxwWxx1,1 1

linear in x

constant in x

� 3 classes, each 2-dimensional Gaussian with

��

��−−−−==== 3

11µµµµ

��

��==== 60

2µµµµ��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−==== 25.05.01

1ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

General Case ΣΣΣΣi are arbitrary: Example

��

��−−−−

−−−−==== 7222

2ΣΣΣΣ��

��==== 35.1

5.113ΣΣΣΣ

��

��

�� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��

��

�� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

21

21

)( 111iiii

tii

tii

ti cPxxxxg µµµµµµµµµµµµ

� Need to solve a bunch of quadratic inequalities of 2 variables

� Priors: and

General Case ΣΣΣΣi are arbitrary: Example

��

��−−−−==== 3

11µµµµ

��

��==== 60

2µµµµ��

��−−−−==== 4

23µµµµ

��

��−−−−

−−−−==== 25.05.01

1ΣΣΣΣ��

��−−−−

−−−−==== 7222

2ΣΣΣΣ��

��==== 35.1

5.113ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

c2

c3 c1

c1

� The Bayes classifier when classes are normally distributed is in general quadratic� If covariance matrices are equal and proportional to

identity matrix, the Bayes classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Eucledian distance classifier� If covariance matrices are equal, the Bayes

classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Mahalanobis distance classifier

� Popular classifiers (Euclidean and Mahalanobis distance) are optimal only if distribution of data is appropriate (normal)

Important Points

Documents

CS434a/541a: Pattern Recognition Prof. Olga Vekslerolga/Courses/CS434a_541a/Lecture4.pdf · 3 Why Normal Random Variables? Analytically tractable Works well when observation comes