# Multivariate Statistical Analysis Using Statgraphics ... · PDF fileMultivariate Data Analysis Using Statgraphics Centurion: Part 2 Dr. Neil W. Polhemus Statpoint Technologies, Inc

• View
240

2

Embed Size (px)

### Text of Multivariate Statistical Analysis Using Statgraphics ... · PDF fileMultivariate Data Analysis...

• Multivariate Data Analysis Using

Statgraphics Centurion: Part 2

Dr. Neil W. Polhemus

Statpoint Technologies, Inc.

1

• Multivariate Statistical Methods

The simultaneous observation and analysis of more than one response variable.

*Primary Uses

1. Data reduction or structural simplification

2. Sorting and grouping

3. Investigation of the dependence among variables

4. Prediction

5. Hypothesis construction and testing

*Johnson and Wichern, Applied Multivariate Statistical Analysis

2

• Data set: U.S. Demographics

3

• Variables

Education: percent completing college

Income per Capita: in 2011 dollars

Urban Percentage: percent living in urban areas

Population Density: people per square mile

Non-English Language: percent that dont speak English

Median Age: in years

Unemployment Rate: for 2012

Percent Female: of general population

Crime Rate: violent crimes per 100,000

Manufacturing: % of GDP

Infant Mortality: 2006

Poverty Rate: percentage below poverty level

Election 2012: results of 2012 presidential election (Obama or Romney)

4

• Urban Percentage

5

• Percent Female

6

• Poverty Rate

7

• Star Glyphs

8

• Data Input

9

• 10

• Cluster Analysis

Method for assigning objects to groups so that objects in the same group are more similar to each other than they are to objects in other groups.

Some questions:

1. How many natural groups are there?

2. Which objects belong to each group?

11

• Data Input

12

• Analysis Options

13

• Dendrogram Nearest Neighbor

14

• Dendrogram Furthest Neighbor

15

• Agglomeration Distance Plot

16

• Dendrogram - 4 Clusters

17

• Save Cluster Numbers

18

• Map of Cluster Numbers

19

• Method of k-Means

1. Starts with a set of typical objects (seeds) for each cluster.

2. Each object is assigned to group with nearest seed.

3. Centroids of groups are computed.

4. Objects are assigned to different groups if nearer to a different centroid.

5. Steps 3 and 4 are repeated until no change.

20

• Method of k-Means

21

• Seeds

22

Row 5 = California Row 32 = New York Row 34 = North Dakota Row 43 = Texas

• Map for k-Means

23

• Classification

The classification problem deals with the assignment of objects to known groups. Goals include:

1. Understanding what differentiates the groups.

2. Predicting group membership for unclassified objects.

24

• Methods

Discriminant Analysis (on the Centurion Relate menu) based on Fishers linear discriminant functions.

Neural Network Classifier (on the Centurion Relate menu) based on a Bayesian classifier with priors and costs.

Uniwin (companion program to Statgraphics) includes methods for dealing with qualitative factors.

25

• (Linear) Discriminant Analysis

Constructs linear functions of the standardized classification variables:

These functions are derived so as to maximize the separation of the groups. If there are g groups, there are g-1 discriminant functions.

26

pjpjjj ZdZdZdD ...2211

• Example

27

• Data Input

28

• Analysis Options

29

• All Variables

30

• Discriminant Function Plot

31

• Backward selection

32

• Classification Functions

These functions create a score for each group:

Objects are assigned to whatever group has the largest value of (Cj * priorj) where priorj is the prior probability of belonging to group j.

33

02211 ... jpjpjjj cXcXcXcC

• Classification Functions

34

• Classification Summary

35

• Classification Table

36

• Validation Set

May leave some of the objects out of the training set and use them to validate the results.

37

Correctly classified 8 out of 10 omitted states.

• Neural Network Classifier

Implements a nonparametric method for classifying objects based on the product of 3 quantities:

1. The estimated density function in the neighborhood of the object

(given a specified value of s).

2. The prior probabilities of belonging to each group.

3. The costs of misclassifying cases that belong to a given group.

38

• Bivariate Density (s = 0.5)

39

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

Romney

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0123

45

(X 0.001)

de

ns

ity

RomneyObama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

Obama

Biv ariate Density

3555

7595

115Urban Percentage

8

12

16

20

24

Pov erty rate0

1

2

3

4(X 0.001)

de

ns

ity

• Schematic Diagram

40

• Data Input

41

• Analysis Options

42

• Classification Summary

43

• Reduced Set of Variables

44

• Some Improvement

45

• Classification Plot

46

• Classification Plot

47

• UNIWIN Plus from Sigma Plus

Software package written by our longtime colleague Christian Charles at Sigma Plus in France.

Contains additional features for multivariate analysis.

48

• Uniwin Cluster and Classification

Handle qualitative variables.

49

• New Variables

50

• Uniwin Cluster Analysis

51

• Dendrogram

52

• Uniwin Clusters

53

• Uniwin Classification

54

• Classification Results

55

Wrongly classified only Nevada (only cattle state to vote for Obama).

56

Statgraphics Centurion: www.statgraphics.com Uniwin Plus: www.statgraphics.fr or www.sigmaplus.fr Or send e-mail to [email protected]

http://www.statgraphics.com/http://www.statgraphics.fr/http://www.sigmaplus.fr/mailto:[email protected]

• Join the Statgraphics Community on:

Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents