22
A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL [email protected] [email protected]

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL [email protected] [email protected]

Embed Size (px)

Citation preview

Page 1: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

A Plot for Visualizing Multivariate Data

Rida E. A. Moustafa

George Mason UniversityADM Group,AAL

[email protected]@aalcpas.com

Page 2: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Talk Outline

The Theory of MV-Plot. Detecting Linear Structures with MV-plot. Detecting Non-Linear Structures with MV-plot. Comparisons with other methods and application on real data.

Page 3: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

MV-Plot Theory

d

jjd

d

jjd

xfxxfxgv

xxfm

1

21

1

1

|)(|))(,(

||)(

Given an observation x=(x1,x2,…,xd)We define m and v as follows:

Computing m and v for every observation produces vector of m and v.

What is the relationship between m and v?

Page 4: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

MV-Relationship in 2-d

21212

2

121

2121

2

121

||

|)||(|||

iiij

iji

iij

iji

xxmxv

xxxm

• Normalizing the data in range (0,1) avoid the abs-value in computing m.

• Close to the PC in 2-d

Page 5: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

MV- detects linear structure(s)

011011

00111

1

01121

01121

0112

;

;)1()1(

if

)1(

;)1(

axavaxam

awaww

w

wxwv

wxwmwxwx

iiii

ii

iiii

If the data is linear in the original space

It will be linear in the MV-space!!

Page 6: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

MV- detects linear structure(s)

1

10

1

1

10

1

)1()1)1(

)1(

2

d

jijjd

dj

d

jijjdj

wdxwdv

wxwm

1

10

1

10

d

jijjj

d

jijjj

axav

axam

Page 7: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Linear structure(s)Example I

Page 8: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Linear structure(s) Example II

Page 9: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Linear structure(s) Example III

Page 10: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting nonlinear datawith MV-plot

MV- plot can detect nonlinear structure in the data set without any changes in the equations.

Page 11: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting nonlinear structure

|)sin(|),sin()sin(,

|)cos(|),cos()cos(,

xxvxxmxx

xxvxxmxx

Page 12: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Sphere(s)

.222

1

221

2

1

12

dR

ii

d

jiijd

d

jiijdi

mv

dmxmxv

Case I:

• The sphere radius R

• The sphere center is the origin

Page 13: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Sphere(s)

.

)()(

222

1

221

2

1

12

dR

ii

d

ji

cj

cjijd

d

ji

cj

cjijdi

mv

mxdxx

mxxxv

Case II:

• The sphere radius R

• The sphere center is not the origin

Page 14: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Detecting Sphere(s)

Page 15: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Fisher’s IRIS data (150x4) 3-classes of( 50 point each)

Process control data (600x60)6-classes of (100 points each)

Pollen data (3,848x5) (Wegman’s data)2-classes (linear and nonlinear)

Application on Real data

Page 16: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Multidimensional Scaling Fisher Discriminate Analysis Principal Component

Related Dimensional Reduction Methods

Page 17: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

IRIS (R. A. Fisher) Dataset150-cases in 4-dim

Page 18: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Time Series Dataset600-cases in 60-dim

Page 19: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Pollen dataset 3,848-points in 5-dim

Other methods:

Require more storage and speed.

Even if it work, we expect bad results on this particular data.

(Wegman2002)

Page 20: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Pollen dataset

Linear and Nonlinear mixed structures.

Page 21: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

The linear structure in the Pollen data set

17+16+18+17+14+16=98 Linear, 3750 nonlinear

Page 22: A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

Summary

MV-algorithm can discover the linear and nonlinear pattern at the same time.

MV-algorithm can discover symmetric data.

MV-algorithm deals with large multivariate data.