Upload
jeremy-woody
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
A Plot for Visualizing Multivariate Data
Rida E. A. Moustafa
George Mason UniversityADM Group,AAL
[email protected]@aalcpas.com
Talk Outline
The Theory of MV-Plot. Detecting Linear Structures with MV-plot. Detecting Non-Linear Structures with MV-plot. Comparisons with other methods and application on real data.
MV-Plot Theory
d
jjd
d
jjd
xfxxfxgv
xxfm
1
21
1
1
|)(|))(,(
||)(
Given an observation x=(x1,x2,…,xd)We define m and v as follows:
Computing m and v for every observation produces vector of m and v.
What is the relationship between m and v?
MV-Relationship in 2-d
21212
2
121
2121
2
121
||
|)||(|||
iiij
iji
iij
iji
xxmxv
xxxm
• Normalizing the data in range (0,1) avoid the abs-value in computing m.
• Close to the PC in 2-d
MV- detects linear structure(s)
011011
00111
1
01121
01121
0112
;
;)1()1(
if
)1(
;)1(
axavaxam
awaww
w
wxwv
wxwmwxwx
iiii
ii
iiii
If the data is linear in the original space
It will be linear in the MV-space!!
MV- detects linear structure(s)
1
10
1
1
10
1
)1()1)1(
)1(
2
d
jijjd
dj
d
jijjdj
wdxwdv
wxwm
1
10
1
10
d
jijjj
d
jijjj
axav
axam
Detecting Linear structure(s)Example I
Detecting Linear structure(s) Example II
Detecting Linear structure(s) Example III
Detecting nonlinear datawith MV-plot
MV- plot can detect nonlinear structure in the data set without any changes in the equations.
Detecting nonlinear structure
|)sin(|),sin()sin(,
|)cos(|),cos()cos(,
xxvxxmxx
xxvxxmxx
Detecting Sphere(s)
.222
1
221
2
1
12
dR
ii
d
jiijd
d
jiijdi
mv
dmxmxv
Case I:
• The sphere radius R
• The sphere center is the origin
Detecting Sphere(s)
.
)()(
222
1
221
2
1
12
dR
ii
d
ji
cj
cjijd
d
ji
cj
cjijdi
mv
mxdxx
mxxxv
Case II:
• The sphere radius R
• The sphere center is not the origin
Detecting Sphere(s)
Fisher’s IRIS data (150x4) 3-classes of( 50 point each)
Process control data (600x60)6-classes of (100 points each)
Pollen data (3,848x5) (Wegman’s data)2-classes (linear and nonlinear)
Application on Real data
Multidimensional Scaling Fisher Discriminate Analysis Principal Component
Related Dimensional Reduction Methods
IRIS (R. A. Fisher) Dataset150-cases in 4-dim
Time Series Dataset600-cases in 60-dim
Pollen dataset 3,848-points in 5-dim
Other methods:
Require more storage and speed.
Even if it work, we expect bad results on this particular data.
(Wegman2002)
Pollen dataset
Linear and Nonlinear mixed structures.
The linear structure in the Pollen data set
17+16+18+17+14+16=98 Linear, 3750 nonlinear
Summary
MV-algorithm can discover the linear and nonlinear pattern at the same time.
MV-algorithm can discover symmetric data.
MV-algorithm deals with large multivariate data.