# Applied Multivariate Statistics Spring 2012 · PDF file Multidimensional Scaling Applied Multivariate Statistics – Spring 2012 TexPoint fonts used in EMF. Read the TexPoint manual

• View
5

0

Embed Size (px)

### Text of Applied Multivariate Statistics Spring 2012 · PDF file Multidimensional Scaling Applied...

• Multidimensional Scaling

Applied Multivariate Statistics – Spring 2012

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAAA

• Outline

 Fundamental Idea

 Classical Multidimensional Scaling

 Non-metric Multidimensional Scaling

2 Appl. Multivariate Statistics - Spring 2012

• Basic Idea

3 Appl. Multivariate Statistics - Spring 2012

How to represent in two dimensions?

• Idea 1: Projection

4 Appl. Multivariate Statistics - Spring 2012

• Idea 2: Squeeze on table

5 Appl. Multivariate Statistics - Spring 2012

Close points stay close

• Which idea is better?

6 Appl. Multivariate Statistics - Spring 2012

• Idea of MDS

 Represent high-dimensional point cloud in few (usually 2)

dimensions keeping distances between points similar

 Classical/Metric MDS: Use a clever projection

R: cmdscale

 Non-metric MDS: Squeeze data on table

R: isoMDS

7 Appl. Multivariate Statistics - Spring 2012

• Classical MDS

 Problem: Given euclidean distances among points, recover

the position of the points!

 Example: Road distance between 21 European cities

(almost euclidean, but not quite)

8 Appl. Multivariate Statistics - Spring 2012

• Classical MDS

 First try:

9 Appl. Multivariate Statistics - Spring 2012

• Classical MDS

 Flip axes:

10 Appl. Multivariate Statistics - Spring 2012

Can identify points up to

- shift

- rotation

- reflection

• Classical MDS

 Another example: Airpollution in US cities

 Range of manu and popul is much bigger than range of

wind

 Need to standardize to give every variable equal weight

11 Appl. Multivariate Statistics - Spring 2012

• Classical MDS

12 Appl. Multivariate Statistics - Spring 2012

• Classical MDS: Theory

 Input: Euclidean distances between n objects in p

dimensions

 Output: Position of points up to rotation, reflection, shift

 Two steps:

- Compute inner products matrix B from distance

- Compute positions from B

13 Appl. Multivariate Statistics - Spring 2012

• Classical MDS: Theory – Step 1

 Inner products matrix B = XXT

 Connect to distance:

 Center points to avoid shift invariance

 Invert realtionship:

“doubly centered”

14 Appl. Multivariate Statistics - Spring 2012

d2ij = bii + bjj ¡ 2bij

bij =¡12(d 2 ij ¡ d2i: ¡ d2:j + d2::)

• Classical MDS: Theory – Step 2

 Since B = XXT, we need the “square root” of B

 B is a symmetric and positive definite n*n matrix

 Thus, B can be diagonalized:

D is a diagonal matrix with on diagonal

(“eigenvalues”)

V contains as columns normalized eigenvectors

 Some eigenvalues will be zero; drop them:

 Take “square root”:

15 Appl. Multivariate Statistics - Spring 2012

B = V¤V T

¸1 ¸ ¸2 ¸ ::: ¸ ¸n

X = V1¤ ¡ 1 2

1

B = V1¤1V T 1

• Classical MDS: Low-dim representation

 Keep only few (e.g. 2) largest eigenvalues and

corresponding eigenvectors

 The resulting X will be the low-dimensional representation

we were looking for

 Goodness of fit (GOF) if we reduce to m dimensions:

(should be at least 0.8)

 Finds “optimal” low-dim representation: Minimizes

16 Appl. Multivariate Statistics - Spring 2012

GOF =

P m

i=1 ¸iP

n

i=1 ¸i

S = Pn

i=1

Pn j=1

³ d2ij ¡ (d

(m) ij )

2 ´

• Classical MDS: Pros and Cons

+ Optimal for euclidean input data

+ Still optimal, if B has non-negative eigenvalues

(pos. semidefinite)

+ Very fast

- No guarantees if B has negative eigenvalues

However, in practice, it is still used then. New measures for

Goodness of fit:

17 Appl. Multivariate Statistics - Spring 2012

GOF =

P m

i=1 j¸ijP

n

i=1 j¸ij

GOF =

P m

i=1 ¸2iP

n

i=1 ¸2 i

GOF =

P m

i=1 max(0;¸i)P

n

i=1 max(0;¸i)

Used in R function “cmdscale”

• Non-metric MDS: Idea

 Sometimes, there is no strict metric on original points

 Example: How much do you like the portraits?

(1: Not at all, 10: Very much)

18 Appl. Multivariate Statistics - Spring 2012

2 6 9

OR 1 5 10 ??

• Non-metric MDS: Idea

 Absolute values are not

that meaningful

 Ranking is important

 Non-metric MDS finds a low-dimensional

representation, which

respects the ranking of distances

19 Appl. Multivariate Statistics - Spring 2012

>

>

• Non-metric MDS: Theory

 is the true dissimilarity, dij is the distance of representation

 Minimize STRESS ( is an increasing function):

 Optimize over both position of points and µ

 is called “disparity”

 Solved numerically (isotonic regression);

Classical MDS as starting value;

very time consuming

20 Appl. Multivariate Statistics - Spring 2012

S =

P i

• Non-metric MDS: Example for intuition (only)

21 Appl. Multivariate Statistics - Spring 2012

True points in

high dimensional space

3

2

5

B A

C

dAB < dBC < dAC

STRESS = 19.7

Compute best

representation

• Non-metric MDS: Example for intuition (only)

22 Appl. Multivariate Statistics - Spring 2012

True points in

high dimensional space

2.7

2

4.8

B A

C

dAB < dBC < dAC

STRESS = 20.1

Compute best

representation

• Non-metric MDS: Example for intuition (only)

23 Appl. Multivariate Statistics - Spring 2012

True points in

high dimensional space

2.9

2

5.2

B A

C

dAB < dBC < dAC

STRESS = 18.9

Stop if minimal STRESS is found.

We will finally represent the distances

dAB = 2, dBC = 2.9, dAC = 5.2

Compute best

representation

• Non-metric MDS: Pros and Cons

+ Fulfills a clear objective without many assumptions

(minimize STRESS)

+ Results don’t change with rescaling or monotonic variable

transformation

+ Works even if you only have rank information

- Slow in large problems

- Usually only local (not global) optimum found

- Only gets ranks of distances right

24 Appl. Multivariate Statistics - Spring 2012

• Non-metric MDS: Example

 Do people in the same party vote alike?

 Agreement of 15 congressman in 19 votes

25 Appl. Multivariate Statistics - Spring 2012

• Non-metric MDS: Example

26 Appl. Multivariate Statistics - Spring 2012

• Concepts to know

 Classical MDS:

- Finds low-dim projection that respects distances

- Optimal for euclidean distances

- No clear guarantees for other distances

- fast

 Non-metric MDS:

- Squeezes data points on table

- respects only rankings of distances

- (locally) solves clear objective

- slow

27 Appl. Multivariate Statistics - Spring 2012

• R commands to know

 cmdscale included in standard R distribution

 isoMDS from package “MASS”

28 Appl. Multivariate Statistics - Spring 2012 ##### MULTIVARIATE SPATIAL STATISTICS - sct.uab. SPATIAL STATISTICS ... using multivariate geostatistics ... in Geostatistics Wollongong â€96 (eds) E. Baafi
Documents Documents