Click here to load reader
View
240
Download
2
Embed Size (px)
Multivariate Data Analysis Using
Statgraphics Centurion: Part 2
Dr. Neil W. Polhemus
Statpoint Technologies, Inc.
1
Multivariate Statistical Methods
The simultaneous observation and analysis of more than one response variable.
*Primary Uses
1. Data reduction or structural simplification
2. Sorting and grouping
3. Investigation of the dependence among variables
4. Prediction
5. Hypothesis construction and testing
*Johnson and Wichern, Applied Multivariate Statistical Analysis
2
Data set: U.S. Demographics
3
Variables
Education: percent completing college
Income per Capita: in 2011 dollars
Urban Percentage: percent living in urban areas
Population Density: people per square mile
Non-English Language: percent that dont speak English
Median Age: in years
Unemployment Rate: for 2012
Percent Female: of general population
Crime Rate: violent crimes per 100,000
Manufacturing: % of GDP
Infant Mortality: 2006
Poverty Rate: percentage below poverty level
Election 2012: results of 2012 presidential election (Obama or Romney)
4
Urban Percentage
5
Percent Female
6
Poverty Rate
7
Star Glyphs
8
Data Input
9
10
Cluster Analysis
Method for assigning objects to groups so that objects in the same group are more similar to each other than they are to objects in other groups.
Some questions:
1. How many natural groups are there?
2. Which objects belong to each group?
11
Data Input
12
Analysis Options
13
Dendrogram Nearest Neighbor
14
Dendrogram Furthest Neighbor
15
Agglomeration Distance Plot
16
Dendrogram - 4 Clusters
17
Save Cluster Numbers
18
Map of Cluster Numbers
19
Method of k-Means
1. Starts with a set of typical objects (seeds) for each cluster.
2. Each object is assigned to group with nearest seed.
3. Centroids of groups are computed.
4. Objects are assigned to different groups if nearer to a different centroid.
5. Steps 3 and 4 are repeated until no change.
20
Method of k-Means
21
Seeds
22
Row 5 = California Row 32 = New York Row 34 = North Dakota Row 43 = Texas
Map for k-Means
23
Classification
The classification problem deals with the assignment of objects to known groups. Goals include:
1. Understanding what differentiates the groups.
2. Predicting group membership for unclassified objects.
24
Methods
Discriminant Analysis (on the Centurion Relate menu) based on Fishers linear discriminant functions.
Neural Network Classifier (on the Centurion Relate menu) based on a Bayesian classifier with priors and costs.
Uniwin (companion program to Statgraphics) includes methods for dealing with qualitative factors.
25
(Linear) Discriminant Analysis
Constructs linear functions of the standardized classification variables:
These functions are derived so as to maximize the separation of the groups. If there are g groups, there are g-1 discriminant functions.
26
pjpjjj ZdZdZdD ...2211
Example
27
Data Input
28
Analysis Options
29
All Variables
30
Discriminant Function Plot
31
Backward selection
32
Classification Functions
These functions create a score for each group:
Objects are assigned to whatever group has the largest value of (Cj * priorj) where priorj is the prior probability of belonging to group j.
33
02211 ... jpjpjjj cXcXcXcC
Classification Functions
34
Classification Summary
35
Classification Table
36
Validation Set
May leave some of the objects out of the training set and use them to validate the results.
37
Correctly classified 8 out of 10 omitted states.
Neural Network Classifier
Implements a nonparametric method for classifying objects based on the product of 3 quantities:
1. The estimated density function in the neighborhood of the object
(given a specified value of s).
2. The prior probabilities of belonging to each group.
3. The costs of misclassifying cases that belong to a given group.
38
Bivariate Density (s = 0.5)
39
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
Romney
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0123
45
(X 0.001)
de
ns
ity
RomneyObama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Obama
Biv ariate Density
3555
7595
115Urban Percentage
8
12
16
20
24
Pov erty rate0
1
2
3
4(X 0.001)
de
ns
ity
Schematic Diagram
40
Data Input
41
Analysis Options
42
Classification Summary
43
Reduced Set of Variables
44
Some Improvement
45
Classification Plot
46
Classification Plot
47
UNIWIN Plus from Sigma Plus
Software package written by our longtime colleague Christian Charles at Sigma Plus in France.
Contains additional features for multivariate analysis.
Reads Statgraphics data files.
48
Uniwin Cluster and Classification
Handle qualitative variables.
49
New Variables
50
Uniwin Cluster Analysis
51
Dendrogram
52
Uniwin Clusters
53
Uniwin Classification
54
Classification Results
55
Wrongly classified only Nevada (only cattle state to vote for Obama).
More Information
56
Statgraphics Centurion: www.statgraphics.com Uniwin Plus: www.statgraphics.fr or www.sigmaplus.fr Or send e-mail to [email protected]
http://www.statgraphics.com/http://www.statgraphics.fr/http://www.sigmaplus.fr/mailto:[email protected]
Join the Statgraphics Community on:
Follow us on
https://www.facebook.com/pages/STATGRAPHICS/79706768276http://www.linkedin.com/groups?gid=1796550https://twitter.com/STATGRAPHICS