176
 22 quai gallieni - 92150 Suresnes - France  Tél : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 00 spad@coheris.com – www.coheris.com Siret : 399 467 927 00105 - APE : 5829C Register number training: 11-92-1522492 DATA MINER GUIDE  Descriptive St atistics - Factorial A nalyses - Clustering  Linear Models – Discriminant Analyses – Scoring – Decision Trees

SPAD7 Data Miner Guide

Embed Size (px)

Citation preview

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 1/176

 22 quai gallieni - 92150 Suresnes - France

  Tél : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 [email protected]  – www.coheris.com Siret : 399 467 927 00105  - APE : 5829C Register number training: 11-92-1522492 

DATA MINER

GUIDE 

 Descriptive Statistics - Factorial Analyses - Clustering

 Linear Models – Discriminant Analyses –

Scoring – Decision Trees

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 2/176

 

Tél : +33 1 57 32 60 60 - Fax : +33 1 57 32 62 00 www.coheris.com 

Siret : 399 467 927 00105  - APE : 5829C Register number training : 11-92-1522492 

Data Miner Guide

© Copyright 1996, 2008 SPAD. All rights reserved.

For any further information about the SPAD software, training and consulting activities, please visit

us at www.coheris.com or contact us by email:

About E-mailSPAD Software [email protected] 

SPAD Hot line [email protected] 

Training [email protected] 

Consulting [email protected] 

Books [email protected] 

For further information about the COHERIS Group offer (CRM, BI, Data Mining, Data Quality

 Management, Merchandising Sfa), visit us at www.coheris.com 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 3/176

3

Table of contents

DESCRIPTIVE STATISTICS WITH SPAD 4 

STATS  -  MARGINAL DISTRIBUTIONS, HISTOGRAMS  5 

DEMOD – AUTOMATIC CHARACTERIZATION OF A QUALITATIVE VARIABLE  16 DESCO  -  AUTOMATIC CHARACTERIZATION OF A CONTINUOUS VARIABLE  21 

TABLE  -  CROSS TABLES  25 

BIVAR  -  BIVARIATE ANALYSIS  28 

FACTORIAL ANALYSES WITH SPAD 30 

PCA  -  PRINCIPAL COMPONENT ANALYSIS  32 

SCA  -  SIMPLE CORRESPONDENCE ANALYSIS  45 

MCA  -  MULTIPLE CORRESPONDENCE ANALYSIS  50 

CLUSTERING WITH SPAD 62 

RECIP  /  SEMIS  -  CLUSTERING ON FACTORS SCORES  63 

PARTI - DECLA -  CUT OF THE TREE AND CLUSTERS DESCRIPTION  69 

CLASS - MINER  -  CLUSTERS DESCRIPTION  78 

ESCAL  -  STORING THE FACTORIAL AXES AND THE PARTITIONS  79 

THE LINEAR MODEL AND ITS APPLICATIONS  80 

REGRESSION AND ANALYSIS OF VARIABCE,  GENERAL LINEAR MODEL  80 

OPTIMAL REGRESSIONS RESEARCH  85 

LOGISTIC REGRESSION  94 

THE DISCRIMINANT AND ITS METHODS  105 

FUWILD  -  OPTIMAL DISCRIMINANT ANALYSIS  105 

DIS2GD  -  LINEAR DISCRIMINANT ANALYSIS BASED ON CONTINUOUS VARIABLES  117 

DIS2GFP  -  LINEAR DISCRIMINANT ANALYSIS BASED ON PRINCIPAL FACTORS  126 

DISCO  -  DISCRIMINANT ANALYSIS BASED ON QUALITATIVE VARIABLES  134 

SCORE  -  SCORING FUNCTION  134 IDT 1  -  INTERACTIVE DECISION TREE 1 154 

IDT 2  -  INTERACTIVE DECISION TREE 2 154 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 4/176

 

4

DESCRIPTIVE STATISTICS WITH SPAD

STATS : marginal distributions, histograms, matrix plot, box plot

DEMOD : automatic characterization of a qualitative variable

DESCO : automatic characterization of a continuous variable

TABLE : Crossed tables

BIVAR : Bivariate analysis

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 5/176

Descriptive Statistics with SPAD

5

STATS  -  MARGINAL DISTRIBUTIONS , HISTOGRAMS 

This procedure supplies a rapid and automatic description of your nominal andcontinuous variables.

The Survey.sba base is an opinion survey file, which will be used for this example. The file is

supplied with the application and installed automatically on your PC.

SET THE PARAMETERS FOR A METHOD 

Before it can be executed, a method must have its parameters set.

To access the parameter settings of a method, right click on the method then on the “Set themethod” command or double-click on the method icon. 

The rules for calculation and parameter settings of each of the methods are available on line.

The Cases, Weighting and Parameters tabs are available for almost all SPAD methods.

Cases: the Cases tab lets you select the cases used for the method

Weighting: the weighting tab allows you to adjust the distribution of the cases in the sample

Parameters: options and settings of the method

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 6/176

STATS - marginal distributions, Histograms

6

The Cases tab

The Cases tab lets you select the cases with one of the following methods:

•  All the available cases

•  One or more logical filters (selection criteria combined with AND/OR)

•  A name list of cases•  A selection made in one or more intervals

•  Random draw

 Apply a logical filter

In case of error, you can delete an expression from the filter by selecting the expression to discard,

and click on Delete.

The cases satisfying the filter are considered as active, while the others are supplementary.

Select the individuals from a list

Click on Logical filter Select the chosen

variable

 

Click on the operator

 Click on

Validate

 

Global Definition

of the filter

 

Click on the

operand 

Select the chosen

method by List 

Choose your cases in the Available list and

use the transfer buttons to select them.

Select the status of

the cases

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 7/176

Descriptive Statistics with SPAD

7

Select cases by interval

You can save the definition of the selection made, by clicking on the Save button. This allows you

to re-use it later.

 Do a Random Draw 

This selection lets you apply the method to a sample before applying it to the entire SPAD base.

It also lets you, by executing the same method several times, after having taken the precaution tochange the number of preliminary request, to test the stability of the results of the method.

Indicate the number of preliminary

requests for the random draw. On

another execution of the selection, you

do not need to change the value of this

number unless you want to generatedifferent draws

Enter the percentage of the

draw by random, or the

sample size after the draw

Click on OK 

Select by interval as the

method of choice 

Select the status of the

cases 

 Define the interval as a

function of its rank in the

Base SPAD 

 Click on the arrow button

to move your choice to thecases status window

Click on the Yes radio button

to run a random draw

Click on Define to set the

 parameters for the

random draw

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 8/176

STATS - marginal distributions, Histograms

8

The Weighting tab

The weighting tab allows you to adjust the distribution of the cases in the sample:

•  According to a Weighting variable already in the file.

•  As a function of one or more theoretical percentages (calculation by adjustment).

Enter the theoretical percentage for each category and click on OK.

You can repeat this operation for another variable. In this way you get an adjustment as a function

of several variables with a simple weighting variable. This requires a calculation by successive

approximations, as shown in the window below:

Click on the options in the

first window, to access the

options window for the

weighting system.

In the case of calculation byadjustment, in the available

variables window, choose the

variable serving to correct and

click on the button Define 

Select the

weighting

type 

For a category, enter

the theoretical percentage and hit

Enter

You can use the options

 by default, or change theoptions for fitting

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 9/176

Descriptive Statistics with SPAD

9

 Attention: The weighting calculation in the weighting tab page for a method is temporary (the

weighting variable is not saved). This approach lets you make quick tests and also to measure the

influence of the weighting on the results of the method. When a satisfactory weighting variable has

 been obtained, it is preferable to create a permanent weighting variable with the menu Tools –

Weighting of the main menu (Data Management Manual, paragraph 4.3).

Then in the weighting tab of a method, we will select this variable as the weight variable.

The « Marginal distributions » tab

We select the categorical variables in the list below.

The “Parameters” button allows you to display or not the categories without anyrespondent and to display or not the missing data as a new category.

The “Statistics” button displays summary statistics on the selected variables. For example,select the Region where the respondent lives (V1), then click on the statistics button. Awindow opens with statistics on the variable:

This statistics window shows for the categoricalvariables: the count and percentage associated foreach category. For the continuous variables; thestatistic window shows the count, the mean, thestandard deviation, as well as the minimum and

maximum.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 10/176

STATS - marginal distributions, Histograms

10

The « Histograms - Categorization » tab

This tab allows you to select continuous variables both for histograms/summary statisticsand for categorization (marginal distributions of the variables values)

The “Parameters” button allows you to set global or specific parameters for the histogramscharacteristics such as the number of classes, the min and max bounds and the histogrambar width.

You can also select continuous variables for categorization. As a result, each distinct valueis displayed with its frequency.It is a preliminary step before splitting the continuous variable into classes.

It is not allowed to do both histograms and categorization for the same variable.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 11/176

Descriptive Statistics with SPAD

11

The « Marginal distributions by categories » tab

This tab is useful for variables that are based on the same categories. The categories oftheses variables must have the same labels and must be ranked in the same order (we cancheck it with the “marginal distributions” tab).

The « Parameters » tab

This tab allows you to export the results into excel or not.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 12/176

STATS - marginal distributions, Histograms

12

Once you have specified your request, then you validate the method by clicking on the“OK” button.

RESULTS 

Results are accessible in the Execution view or by right-clicking on the method andchoosing the “Results” command. Then, depending on the method, different choices areavailable between the results editor, the Graphics gallery and Excel results.

The results editor

The Result Editor opens up in a new window.

The information list has a tree structure.

  By clicking on you open a branch of the tree, and by clicking on you close abranch of the tree. You can use the mouse to navigate through the tree.

  By double clicking on the title, you display the relevant results in the new window.

The Layout option of the File menu allows you to customize results display on the screen.The results can be printed or copied into your word processor, but they cannot be changedin this editor.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 13/176

Descriptive Statistics with SPAD

13

THE RESULTS OF THE STATS METHOD 

SUMMARY STATISTICS OF THE VARIABLES

MARG I NA L D I S TRI BUT I ONS OF CATEGOR I CA L VAR I AB LES- - - - - - - - COUNTS - - - - - - - -ACTUAL %/ TOTAL %/ EXPR. HI STOGRAM OF WEI GHTS

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -1 . Re g i o n w h e r e t h e r e s p o n d e n t l i v e s

Rég1 - Par i s r egi on 56 17. 78 17. 78 *** ** ****Rég2 - Par i s Basi n 51 16. 19 16. 19 ** **** **Rég3 - nort h 24 7. 62 7. 62 ** **Rég4 - east 29 9. 21 9. 21 ** ***Rég5 - west 45 14. 29 14. 29 ** ** ** *Rég6 - south- west 38 12. 06 12. 06 ** ** **Rég7 - cent er east 36 11. 43 11. 43 *** ** *Rég8 - medi t er r anean 36 11. 43 11. 43 ** ** **

OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

2 . U r b a n a r e a s i ze ( n u m b e r o f i n h a b i t a n t s )

Agg1 - l ess t han 2000 84 26. 67 26. 67 ** **** ** ** ***Agg2 - 2001 t o 5000 18 5. 71 5. 71 ** *Agg3 - 5001 t o 10000 18 5. 71 5. 71 ** *Agg4 - 10001 t o 20000 12 3. 81 3. 81 **Agg5 - 20001 t o 50000 23 7. 30 7. 30 ** **Agg6 - 50001 t o 100000 18 5. 71 5. 71 ** *Agg7 - 100001 t o 200000 28 8. 89 8. 89 ** ** *Agg8 - mor e t han 200000 68 21. 59 21. 59 ** ** *** ***Agg9 - pari s, pari s. aggl o 46 14. 60 14. 60 *** *** *

OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

3 . Se x o f r e s p o n d e n t

Sex1 - mal e 138 43. 81 43. 81 ** **** ** ** **** ** ** ***Sex2 - f emal e 177 56. 19 56. 19 *** *** *** *** *** *** *** *** **

OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

MARG I NA L D I S TRI BUT I ONS CATEGOR I ZED VAR I AB LES

- - - - - - - - - - - COUNTS - - - - - - - - - - - -ACTUAL %/ TOTAL %/ EXPR. % CUM. HI STOGRAM OF WEI GHTS

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 14/176

STATS - marginal distributions, Histograms

14

2 6 . N u m b e r o f p e r s o n s in a h o u s i n g

1. 000 38 12. 06 12. 06 12. 06 ** ** **2. 000 90 28. 57 28. 57 40. 63 **** ** **** ** *3. 000 69 21. 90 21. 90 62. 54 *** ** ** ** *4. 000 71 22. 54 22. 54 85. 08 *** ** ** ** *5. 000 34 10. 79 10. 79 95. 87 ** ** *6. 000 7 2. 22 2. 22 98. 10 *7. 000 4 1. 27 1. 27 99. 37 *8. 000 2 0. 63 0. 63 100. 00 *

OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -2 8 . N u m b e r o f c h i ld r e n

0. 000 70 22. 22 22. 22 22. 22 *** ** ** ** *1. 000 67 21. 27 21. 27 43. 49 *** ** ** ** *2. 000 94 29. 84 29. 84 73. 33 **** ** **** ** *3. 000 54 17. 14 17. 14 90. 48 ** ** ** **4. 000 9 2. 86 2. 86 93. 33 **5. 000 11 3. 49 3. 49 96. 83 **6. 000 2 0. 63 0. 63 97. 46 *7. 000 2 0. 63 0. 63 98. 10 *8. 000 2 0. 63 0. 63 98. 73 *9. 000 4 1. 27 1. 27 100. 00 *

OVERALL 315 100. 00 100. 00- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

SUMMARY STAT IST I CS OF CONT I NUOUS VAR I ABLES

 TOTAL COUNT : 315 TOTAL WEI GHT : 315. 00+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +| NUM . LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM | MI N. 2 MAX. 2 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +| 4 . Age of r espondent 315 315. 00 | 43. 756 16. 581 | 18. 000 86. 000 | 19. 000 83. 000 || 41 . Fami l y, chi l dren : i 315 315. 00 | 6. 651 1. 062 | 1. 000 7. 000 | 2. 000 6. 000 || 42 . Work, prof essi on : i 315 315. 00 | 5. 956 1. 544 | 1. 000 7. 000 | 2. 000 6. 000 || 43 . Free t i me, r el ax: i m 315 315. 00 | 5. 295 1. 454 | 0. 000 7. 000 | 1. 000 6. 000 || 44 . Fri ends, acquai ntanc 315 315. 00 | 5. 190 1. 424 | 1. 000 7. 000 | 2. 000 6. 000 || 45 . Rel ati ves, brothers, 315 315. 00 | 5. 629 1. 436 | 1. 000 7. 000 | 2. 000 6. 000 || 46 . Rel i gi on : i mpor t anc 315 315. 00 | 3. 241 2. 022 | 0. 000 7. 000 | 1. 000 6. 000 || 47 . Pol i t i c, pol i t i cal l 315 315. 00 | 3. 111 1. 770 | 0. 000 7. 000 | 1. 000 6. 000 || 50 . Stat e benef i t s : ave 283 283. 00 | 533. 795 926. 899 | 0. 000 5100. 000 | 15. 000 4980. 000 || 51 . Sal ary of t he r espon 267 267. 00 | 4408. 547 4575. 339 | 0. 000 40000. 000 | 300. 000 24000. 000 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - + 

H I S TOGRAMS OF CONT I NUOUS VAR I AB LES

V AR I A B L E 4 : A g e o f r e s p o n d e n t

LOW. LI MI T| MEAN | WEI GHT| HI STOGRAM ( BETWEEN 16. 00 I NCLUDED AND 88. 00 EXCLUDED,BAR I NTERVAL WI DTH = 2. 00)

- - - - - - - - - - +- - - - - - - - - - +- - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -16. 00 | 20. 93 | 28 | XXXXXXXXXXXXXX24. 00 | 27. 85 | 68 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX32. 00 | 35. 31 | 58 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXX40. 00 | 43. 35 | 37 | XXXXXXXXXXXXXXXXXX48. 00 | 52. 08 | 39 | XXXXXXXXXXXXXXXXXXX56. 00 | 59. 06 | 33 | XXXXXXXXXXXXXXXX64. 00 | 67. 09 | 33 | XXXXXXXXXXXXXXXX72. 00 | 74. 71 | 14 | XXXXXXX80. 00 | 82. 20 | 5 | XX

+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

| | OVERALL | HI STOGRAM || | ( FROM 18. 00 TO 86. 00) | ( FROM 16. 00 TO 88. 00) |+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| WEI GHT | 315. 00 | 315. 00 || MEAN | 43. 756 | 43. 756 || STD. DEV. | 16. 581 | 16. 440 |+- - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - +WEI GHTS OF REMAI NI NG CASES : STRI CTLY LESS THAN . . . . . 16. 00 : 0. 00

GREATER THAN OR EQUAL TO 88. 00 : 0. 00

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 15/176

Descriptive Statistics with SPAD

15

MARG I NA L D I S TRI BUT I ONS OF GROUPED VAR I AB LES

COMMA ND NUMBER 1

- - - - - - - - COUNTS - - - - - - - -ACTUAL %/ TOTAL %/ EXPR.

DI STRI BUTI ON OF ANSWER : yesFOR VARI ABLESHave you r ecent l y been nervous 155. 00 49. 21 49. 21Have you r ecent l y had backaches 149. 00 47. 30 47. 30

Have you r ecent l y had headaches 115. 00 36. 51 36. 51Have you r ecent l y been depr essed 50. 00 15. 87 15. 87DI STRI BUTI ON OF ANSWER : noFOR VARI ABLESHave you r ecent l y been depr essed 265. 00 84. 13 84. 13Have you r ecent l y had headaches 200. 00 63. 49 63. 49Have you r ecent l y had backaches 166. 00 52. 70 52. 70Have you r ecent l y been nervous 160. 00 50. 79 50. 79

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 16/176

DEMOD – Automatic Characterization of a qualitative variable

16

DEMOD – AUTOMATIC CHARACTERIZATION OF A

QUALITATIVE VARIABLE 

This extremely powerful procedure provides the automatic characterization of anycategorical variable.This is the IDEAL procedure to find out everything about a variable in one question. Thewell-structured outputs form comprehensive study reports.

One can characterize either each category of a variable, or globally the variable itself. Allthe elements available (active and illustrative) may participate in the characterization: thecategorical variables of the categorical variables, the categorical variables themselves, andthe continuous variables.

The following table summarizes all the capabilities of the DEMOD procedure:

Elements to characterize Characterizing elements

•  Groups of cases (defined by the categories of thevariable to characterize)

We describe each category with all its significant characterizingelements.

• categories

• categorical variables

• continuous variables 

•  The categorical variable to characterize

We cross the variable with all the characterizing elements anddisplay only the elements that are dependant from the variableto characterize.

• categories• categorical variables

• continuous variables 

A group of cases is defined by a category of the variable to characterize. We have as muchgroups of cases as the number of categories of the variable to characterize.

Double-click on the demod icon in order to access the settings of the method.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 17/176

Descriptive Statistics with SPAD

17

THE « VARIABLES » TAB 

The scrolling menu allows you to select the variables to characterize and the characterizingelements.

In this example, the variable to characterize is V8 « The family is the only place where youfeel well». All the other variables whether categorical or continuous are selected ascharacterizing elements.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 18/176

DEMOD – Automatic Characterization of a qualitative variable

18

THE « PARAMETERS » TAB 

This tab allows you to modify the default parameters for the DEMOD method.

Once you have set the parameters, then you validate the method by clicking on the “OK”button and run the chain.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 19/176

Descriptive Statistics with SPAD

19

THE DEMOD RESULTS 

THE DEMOD-5 EXCEL SHEET 

% of category in group :Frequency of the category in the group divided by the frequency of the group

% of category in set:Frequency of the category in the population

% of group in category:Frequency of the group in the category divided by the frequency of category

Test-value:When the test-value is greater than zero, it means that the category is over-represented in the group. The category is under-represented if the test-value is

negative. By default, SPAD displays only characterizing elements with a test-valuegreater equal than 1.96 (i.e. a probability equal to 0.025 for an unilateral test).

Probability:The probability evaluates the scale of the difference between the percentage of thecategory in the group and the percentage of the category in the population. Lower isthe probability, more significant is the difference and greater is the test-value relatedto this probability (the test-value is the fractile of the normal law that corresponds tothe same probability).

Weight:Weight of the cases in the category

Characterisation by categories of groups of 

The family is the only place where you feel well

Group: Yes (Count: 230 - Percentage: 73.02)

Variable labelCaracteristic

categories

% of

category in

group

% of

category in

set

% of group

in categoryTest-value Probability Weight

Marital status married   78,26 70,79 80,72 4,55 0,000 223

Do you watch TV every day 62,61 55,87 81,82 3,83 0,000 176

Opinion about marriage indissoluble 31,30 25,71 88,89 3,79 0,000 81

Are you worried about the risk of a nuclear plant accident a lot 32,61 28,25 84,27 2,76 0,003 89

Do you have children yes 81,30 77,14 76,95 2,68 0,004 243

Are you worried about the risk of a road accident a lot 40,87 36,51 81,74 2,55 0,005 115

Educational level of the respondent primary school 20,43 17,14 87,04 2,50 0,006 54

Current situation of the respondent retired people 20,43 17,14 87,04 2,50 0,006 54

Are you worried about the risk of a mugging a lot 33,04 29,21 82,61 2,38 0,009 92

Do you think the society needs to change I do not know 11,30 9,21 89,66 2,01 0,022 29

Current situation of the respondent unemployed person 5,22 7,30 52,17 -2,02 0,022 23

Are you worried about the risk of a mugging not at all 23,04 26,35 63,86 -2,02 0,022 83

Current situation of the respondent student 2,17 3,81 41,67 -2,06 0,020 12Educational level of the respondent technical and GCSE 3,48 5,40 47,06 -2,10 0,018 17

Marital status cohabitation 3,04 5,08 43,75 -2,30 0,011 16

Do you have work-personal life problems yes 20,43 24,13 61,84 -2,33 0,010 76

Urban area size (number of inhabitants) more than 200000 17,83 21,59 60,29 -2,46 0,007 68

Your opinion on the life conditions in the future improving a lot 3,91 6,67 42,86 -2,81 0,002 21

Do you watch TV quite often 19,57 24,13 59,21 -2,90 0,002 76

Marital status single 9,57 13,33 52,38 -2,93 0,002 42

Do you have children no 17,39 21,90 57,97 -2,96 0,002 69

Opinion about marriage dissolved if agreem 30,87 36,19 62,28 -3,07 0,001 114

Are you worried about the risk of a road accident a little 15,65 20,32 56,25 -3,13 0,001 64

Educational level of the respondent more high school 9,13 13,65 48,84 -3,49 0,000 43

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 20/176

DEMOD – Automatic Characterization of a qualitative variable

20

THE DEMOD-13 EXCEL SHEET 

Category mean:Weighted mean of the variable in the category

Overall mean:Weighted mean of the category in the overall population

Interpretation:One can see that the « Age of respondent » is the most characterizing continuousvariable of the group who answered « yes » to the question « The family is the onlyplace where you feel well ».This group is significantly older than the average respondent with an average age of46 years old, compared to 43.75 years old for the overall population.

Characterisation by continuous variables of categories of 

The family is the only place where you feel wellYes (Weight = 230.00 Count = 230 )

Characteristic variablesCategory

mean

Overall

mean

Category Std.

deviation

Overall Std.

deviationTest-value Probability

Age of respondent 46,100 43,756 16,752 16,581 4,12 0,000

Religion : importance given 3,383 3,241 2,081 2,022 2,04 0,021Relatives, brothers, sisters ... : importance given 5,726 5,629 1,380 1,436 1,98 0,024

Salary of the respondent 4044,990 4408,550 3690,140 4575,340 -2,09 0,018

No (Weight = 83.00 Count = 83 )

Characteristic variablesCategory

mean

Overall

mean

Category Std.

deviation

Overall Std.

deviationTest-value Probability

Salary of the respondent 5377,780 4408,550 6311,000 4575,340 2,10 0,018

 Number of children 1,542 1,860 1,772 1,671 -2,02 0,022

Age of respondent 36,855 43,756 13,971 16,581 -4,41 0,000

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 21/176

Descriptive Statistics with SPAD

21

DESCO  -  AUTOMATIC CHARACTERIZATION OF A

CONTINUOUS VARIABLE 

This procedure provides the statistical characterization of one or more continuousvariables by:

The other continuous variables, with the support of correlations.The categories of the categorical variables, by comparison of means.The categorical variables themselves, with the help of Fisher's statistic.

THE « VARIABLES » TAB 

A continuous variable can be characterized with the other variables whether categorical orcontinuous, called characterizing variables.

The scrolling menu allows you to select the variables to characterize and the characterizingelements.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 22/176

DESCO - Automatic Characterization of a continuous variable

22

THE « PARAMETERS » TAB 

The parameter « Minimum relative weight of charactering elements » is useful if you donot want to display characterizing categories whose the frequency in the population islower than 2% (threshold by default).

Display the categories whose therelated probabilities are lower

equal than 0.025. It correspondsto a test-value of 1.96.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 23/176

Descriptive Statistics with SPAD

23

THE DESCO RESULTS 

CHARACTERISATION OF CONTINUOUS VARIABLES

DESCR I PT ION OF : Sa l a r y o f t h e r e s p o n d e n t

DESCRI PT I ON BY CATEGORI ES

OF CONT I NUOUS VAR I ABLE : S a l a r y o f t h e r e s p o n d e n t

ON 267. 0 ACTI VE CASES MEAN = 4408. 547STD. DEV. = 4575. 339

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| TEST PROB. | MEAN STD. DEV. | CATEGORI ES | VARI ABLE LABEL | WEI GHT || VALUE | | | | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| 8.16 0.000 | 7060. 53 4921. 82 | yes, f ul l t i me | At t he moment , do you have a professi onal acti vi t y | 114.00 || 7. 58 0. 000 | 6496. 32 4736. 16 | empl oyed | Curr ent si t uat i on of t he r espondent | 136. 00 || 7. 28 0. 000 | 6617. 07 4883. 30 | no | Have you been unempl oyed dur i ng t he l ast t wel ve months | 123. 00 || 6. 69 0. 000 | 6533. 19 5486. 12 | mal e | Sex of r espondent | 117. 00 || 4. 60 0. 000 | 6452. 63 5414. 05 | no | Do you have work- per sonal l i f e probl ems | 76. 00 || 4. 25 0. 000 | 6698. 25 6784. 83 | qui t e of t en | Do you watch TV | 57.00 || 3. 73 0. 000 | 6331. 15 3880. 83 | yes | Do you have work- per sonal l i f e probl ems | 61. 00 || 3. 47 0. 000 | 6797. 37 6049. 03 | more hi gh school | Educat i onal l evel of t he r espondent | 38. 00 || 3. 35 0. 000 | 4860. 06 4834. 30 | no | Have you r ecentl y been depressed | 217. 00 || 3. 18 0. 001 | 5291. 85 5418. 67 | no | Have you r ecentl y been ner vous | 135. 00 || 3. 10 0. 001 | 6950. 00 5579. 71 | yes | Do you have a pi ano | 28. 00 || 2. 89 0. 002 | 6529. 41 5935. 61 | yes | Do you have a second house | 34.00 || 2. 88 0. 002 | 6330. 00 7536. 22 | yes | Do you have a vi deo-t ape | 40. 00 || 2. 65 0. 004 | 5937. 26 6786. 27 | Par i s r egi on | Regi on wher e t he r espondent l i ves | 51. 00 || 2. 43 0. 008 | 5179. 34 5246. 40 | a l ot | Has t he r espondent been i nter ested by t he survey | 117. 00 || 2.17 0.015 | 6906. 67 4638. 46 | a l ot bet t er | Your opi ni on on t he evol uti on of t he dai l y per sonal l i f e | 15. 00 || 2. 10 0. 018 | 5377. 78 6311. 00 | No | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 72. 00 || - 2.01 0.022 | 3301. 51 2735. 77 | qui t e agree | Pers ons l i ke me oft en f eel al one | 55. 00 || - 2. 09 0. 018 | 4044. 99 3690. 14 | Yes | The f ami l y i s t he onl y pl ace wher e you f eel wel l | 193. 00 || - 2.14 0.016 | 3769. 06 3573. 01 | a l ot | Are you worr i ed about t he ri sk of havi ng a seri ous i l l ness | 125.00 || - 2.23 0.013 | 3196. 12 3440.69 | a l ot worse | Your opi ni on on t he evol uti on of French peopl e l i f e l evel | 56. 00 || - 2.47 0.007 | 3319. 48 2735. 76 | a l ot | Are you worr i ed about t he ri sk of a nucl ear pl ant acci dent | 77. 00 || - 2. 54 0. 006 | 1971. 43 1864. 75 | unempl oyed per son | Curr ent si t uat i on of t he r espondent | 21. 00 || - 2.57 0.005 | 760.00 1356. 61 | st udent | Curr ent si t uat i on of t he respondent | 10. 00 || - 2.66 0.004 | 2606. 41 3255. 77 | a l ot worse | Your opi ni on on t he evol uti on of t he dai l y per sonal l i f e | 39. 00 || - 2. 86 0. 002 | 3726. 34 3277. 03 | every day | Do you watch TV | 155. 00 || - 2. 88 0. 002 | 4069. 97 3721. 48 | no | Do you have a vi deo-t ape | 227. 00 || - 2. 89 0. 002 | 4099. 07 4253. 85 | no | Do you have a second house | 233. 00 || - 3. 10 0. 001 | 4110. 81 4346. 66 | no | Do you have a pi ano | 239. 00 || - 3. 18 0. 001 | 3505. 18 3271. 07 | yes | Have you r ecentl y been ner vous | 132. 00 || - 3. 35 0. 000 | 2449. 00 2373. 53 | yes | Have you r ecentl y been depressed | 50. 00 || - 3.49 0.000 | 2263. 04 2043. 80 | no qual i f i cati ons | Educat i onal l evel of t he respondent | 46. 00 || - 4. 36 0. 000 | 832. 14 1563. 89 | I have never worked | At t he moment , do you have a prof essi onal acti vi t y | 28. 00 || - 4. 85 0. 000 | 2691. 10 3397. 40 | no | At t he moment , do you have a prof essi onal acti vi t y | 103. 00 || - 6.54 0.000 | 488.54 1396. 02 | housewi f e w/ o prof. | Curr ent si t uat i on of t he respondent | 48. 00 || - 6. 69 0. 000 | 2751. 33 2742. 02 | f emal e | Sex of r espondent | 150. 00 || - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng category | Do you have work- per sonal l i f e probl ems | 130. 00 || - 7. 28 0. 000 | 2311. 41 3196. 29 | mi ssi ng cat egory | Have you been unempl oyed dur i ng t he l ast t wel ve months | 130. 00 |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +| | 4408. 55 4575. 34 | OVERALL | 267. 00 |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - +

 DESCR I PT I ON BY CATEGOR ICAL VAR I ABLES

OF VARI A B L E : S a la r y o f t h e r e s p o n d e n t

+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +| TEST- VALUE | PROBA. | NUM . VARI ABLE LABEL | DEN. DEG. FREE. | FI SHER|+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +| 8. 56 | 0. 000 | 5 . Cur rent si t uati on of t he r espondent | 261 | 21. 44|| 8. 48 | 0. 000 | 18 . At t he moment , do you have a prof essi onal acti vi t y | 263 | 31. 95|| 7. 50 | 0. 000 | 20 . Have you been unempl oyed dur i ng t he l ast t wel ve mont hs | 264 | 35. 01|| 7. 28 | 0. 000 | 19 . Do you have work- per sonal l i f e probl ems | 264 | 32. 89|| 6. 98 | 0. 000 | 3 . Sex of r espondent | 265 | 53. 58|| 3. 48 | 0. 000 | 7 . Educat i onal l evel of t he r espondent | 258 | 3. 87|| 3. 47 | 0. 000 | 33 . Do you watch TV | 263 | 6. 57|| 3. 38 | 0. 001 | 24 . Have you r ecentl y been depressed | 265 | 11. 69|| 3. 21 | 0. 001 | 23 . Have you r ecentl y been ner vous | 265 | 10. 50|| 3. 12 | 0. 002 | 16 . Do you have a pi ano | 265 | 9. 94|| 2. 90 | 0. 004 | 17 . Do you have a second house | 265 | 8. 58|| 2. 89 | 0. 004 | 15 . Do you have a vi deo- t ape | 265 | 8. 50|

| 2. 04 | 0. 021 | 52 . Has t he r espondent been i nterest ed by t he survey | 264 | 3. 92|| 1. 92 | 0. 054 | 21 . Have you r ecentl y had headaches | 265 | 3. 74|| 1. 77 | 0. 039 | 30 . Your opi ni on on t he evol ut i on of t he dai l y per sonal l i f e | 261 | 2. 38|| 1. 56 | 0. 059 | 25 . Ar e you sat i sf i ed of your heal t h | 263 | 2. 51|| 1. 33 | 0. 092 | 40 . Ar e you worr i ed about t he r i sk of a nucl ear pl ant acci dent | 263 | 2. 16|| 1. 31 | 0. 189 | 29 . Do you r egul arl y i mpose r est r i cti ons | 265 | 1. 73|| 1. 24 | 0. 107 | 8 . The f ami l y i s t he onl y pl ace where you f eel wel l | 264 | 2. 24|| 1. 12 | 0. 132 | 1 . Regi on wher e t he r espondent l i ves | 259 | 1. 61|| 1. 07 | 0. 143 | 39 . Ar e you worr i ed about t he r i sk of umempl oyment | 263 | 1. 82|| 1. 03 | 0. 151 | 35 . The comput er sci ence di f f usi on i s. . . | 263 | 1. 78|| 1. 02 | 0. 154 | 34 . Do you t hi nk t he soci ety needs t o change | 264 | 1. 86|| 0. 92 | 0. 179 | 49 . Per sons l i ke me oft en f eel al one | 263 | 1. 64|| 0. 89 | 0. 186 | 31 . Your opi ni on on t he evol ut i on of French peopl e l i f e l evel | 260 | 1. 48|| 0. 86 | 0. 194 | 36 . Ar e you wor r i ed about t he ri sk of havi ng a ser i ous i l l ness| 263 | 1. 58|| 0. 79 | 0. 428 | 22 . Have you r ecentl y had backaches | 265 | 0. 63|| 0. 78 | 0. 217 | 11 . Ar e you sat i sf i ed of your housi ng | 263 | 1. 49|| 0. 65 | 0. 257 | 37 . Ar e you worr i ed about t he r i sk of a muggi ng | 263 | 1. 35|| 0. 45 | 0. 327 | 13 . Occupat i on st atus of housi ng | 262 | 1. 16|| 0. 22 | 0. 412 | 27 . Do you have chi l dren | 264 | 0. 88|

| 0. 13 | 0. 446 | 38 . Ar e you worr i ed about t he r i sk of a r oad acci dent | 263 | 0. 89|| 0. 10 | 0. 459 | 6 . Mari t al status | 262 | 0. 91|| 0. 08 | 0. 469 | 9 . Opi ni on about marr i age | 263 | 0. 85|| - 0. 15 | 0. 561 | 32 . Your opi ni on on t he l i f e condi t i ons i n t he f ut ur e | 261 | 0. 79|| - 0. 21 | 0. 585 | 12 . Are you sat i sf i ed of your dai l y l i f e | 263 | 0. 65|

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 24/176

DESCO - Automatic Characterization of a continuous variable

24

| - 0. 23 | 0. 591 | 14 . The housi ng expenses are f or you | 260 | 0. 77|| - 0. 53 | 0. 702 | 10 . Housekeepi ng works, t ake car e of chi l dren. . . | 263 | 0. 47|| - 0. 59 | 0. 724 | 2 . Ur ban area si ze ( number of i nhabi t ant s) | 258 | 0. 66|| - 0. 64 | 0. 740 | 48 . Your opi ni on on t he j ust i ce r unni ng i n 1986 | 261 | 0. 55|+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - +

 

SUMMARY STAT I S T I CS OF CONT I NUOUS VAR I AB LES TOTAL COUNT 315 TOTAL WEI GHT 315. 00+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +| NUM . I DEN - LABEL COUNT WEI GHT | MEAN STD. DEV. | MI NI MUM MAXI MUM |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - +| 4 . Age - Age of r espondent 267 267. 00 | 43. 61 16. 88 | 18. 00 83. 00 || 26 . Nbpr - Number of persons i n 267 267. 00 | 3. 04 1. 43 | 1. 00 8. 00 || 28 . Nbef - Number of chi l dren 267 267. 00 | 1. 85 1. 69 | 0. 00 9. 00 || 41 . Fami - Fami l y, chi l dr en : i 267 267. 00 | 6. 65 1. 07 | 1. 00 7. 00 || 42 . Trav - Work, pr of essi on : i 267 267. 00 | 5. 90 1. 57 | 1. 00 7. 00 || 43 . Loi s - Free t i me, r el ax: i m 267 267. 00 | 5. 30 1. 43 | 0. 00 7. 00 || 44 . Ami s - Fri ends, acquai nt anc 267 267. 00 | 5. 18 1. 41 | 1. 00 7. 00 || 45 . Par t - Rel at i ves, br ot her s, 267 267. 00 | 5. 63 1. 44 | 1. 00 7. 00 || 46 . Rel i - Rel i gi on : i mport anc 267 267. 00 | 3. 15 1. 96 | 1. 00 7. 00 || 47 . Pol i - Pol i t i c, pol i t i cal l 267 267. 00 | 3. 15 1. 79 | 1. 00 7. 00 || 50 . PrFm - St ate benef i t s : ave 244 244. 00 | 583. 10 966. 04 | 0. 00 5100. 00 |

| 51 . Sal r - Sal ary of t he r espon 267 267. 00 | 4408. 55 4575. 34 | 0. 00 40000. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - + 

CORRELAT I ONS W I TH CONT I NUOUS VAR I A B LES

OF V ARI A B L E : S a la r y o f t h e r e s p o n d e n t

+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +| TEST-VALUE | PROB. | CORRELATI ON | NUM . VARI ABLE LABEL | WEI GHT |+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - +| 99. 90 | 0. 000 | 1. 000 | 51 . Sal ary of t he r espondent | 267. 000 || - 2. 53 | 0. 006 | - 0. 162 | 50 . St ate benef i t s : average mont hl y amount | 244. 000 |+- - - - - - - - - - - - +- - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - + 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 25/176

Descriptive Statistics with SPAD

25

TABLE  -  CROSS TABLES 

With this procedure, you can obtain in one go an unlimited number of tables for members,

means or frequencies.

THE « TABLES » TAB 

This tab allows you to define the cross tables to create.

The table’s cells can display weights, % raw, % column, average and standard deviationdepending on the parameters and settings.

The scrolling menu allows you to define the cross tables you want to display with orwithout supplementary information such as mean or frequency related to anothervariable.

If a variable appears in the “Means” column, each cell of the cross table will display theweighted average corresponding to the cases of the cell.

If a variable appears in the « Frequencies » column, each cell of the cross table will displaythe weighted sum of the values of the variable for the cases of the cell.

By clicking on “local filter”, you can define a specific filter for each command.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 26/176

TABLE - Cross tables

26

THE « PARAMETERS » TAB 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 27/176

Descriptive Statistics with SPAD

27

THE TABLE RESULTS 

CROSS-TABS

L I S T OF COMMANDS

COMMAND 1

 TABLE 1 BY ROW : 9 . Opi ni on about mar r i age

BY COLUMN : 3 . Sex of r espondentCOMMAND 2

 TABLE 2 BY ROW : 9 . Opi ni on about mar r i ageBY COLUMN : 3 . Sex of r espondentMEANS OF : 4 . Age of r espondent

L I ST OF CROSS -TABS

T A B LE 1 B Y ROW : O p i n i o n a b o u t m a r r i a g e TO TA L W EI GH T: 3 1 5 .

B Y CO LUMN : S e x o f r e s p o n d e n t

WEI GHT | mal e | f emal e | OVERALLCOLUMN PERC. | | |

ROW PERC. | | |- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

| 41 | 40 | 81i ndi ssol ubl e | 29. 71 | 22. 60 | 25. 71

| 50. 62 | 49. 38 | 100. 00- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

| 39 | 69 | 108di ssol ved seri ous pb | 28. 26 | 38. 98 | 34. 29| 36. 11 | 63. 89 | 100. 00

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 50 | 64 | 114

di ssol ved i f agr eem | 36. 23 | 36. 16 | 36. 19| 43. 86 | 56. 14 | 100. 00

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 8 | 4 | 12

I do not know | 5. 80 | 2. 26 | 3. 81| 66. 67 | 33. 33 | 100. 00

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 138 | 177 | 315

OVERALL | 100. 00 | 100. 00 | 100. 00| 43. 81 | 56. 19 | 100. 00

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -KHI 2 = 6. 67 / 3 DEGREES OF FREEDOM / 0 EXPECTED FREQUENCI ES LESS THAN 5PROB. ( KHI 2 > 6. 67 ) = 0. 083 / TEST- VALUE = 1. 38- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -T A B LE 2 B Y ROW : O p i n i o n a b o u t m a r r i a g e TO TA L W EI GH T: 3 1 5 .

B Y CO LUMN : S e x o f r e s p o n d e n t

M E ANS O F : A g e o f r e s p o n d e n t

WEI GHT | mal e | f emal e | OVERALLMEAN | | |

STD. DEV. | | |- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

| 41 | 40 | 81i ndi ssol ubl e | 45. 829 | 48. 325 | 47. 062

| 17. 234 | 17. 084 | 17. 206- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -

| 39 | 69 | 108di ssol ved ser i ous pb | 43. 000 | 46. 362 | 45. 148

| 14. 739 | 18. 260 | 17. 148

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 50 | 64 | 114

di ssol ved i f agr eem | 41. 300 | 38. 484 | 39. 719| 15. 442 | 14. 330 | 14. 893

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 8 | 4 | 12

I do not know | 50. 250 | 41. 250 | 47. 250| 15. 618 | 8. 842 | 14. 377

- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - -| 138 | 177 | 315

OVERALL | 43. 645 | 43. 842 | 43. 756| 16. 007 | 17. 015 | 16. 581

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 28/176

BIVAR - Bivariate Analysis

28

BIVAR  -  BIVARIATE ANALYSIS 

The BIVAR procedure lets you characterize a sample from the viewpoint of two particular

continuous variables (AXES variables or base variables). The sample can be described bycategorical variables and by other continuous variables.

THE « VARIABLES » TAB 

With this tab, the SPAD user selects the two continuous variables for the bivariateanalysis.

It is possible to include in the analysis some supplementary variables (whether continuousor categorical).

The graph editor of the BIVAR method is the same that is used for factorial analyses.The capabilities of the graph editor will be described in the section “Factorial analyses”.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 29/176

Descriptive Statistics with SPAD

29

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 30/176

BIVAR - Bivariate Analysis

30

FACTORIAL ANALYSES WITH SPAD

PCA : Principal Component Analysis (PCA)

SCA : Simple Correspondence Analysis (SCA)

MCA : Multiple Correspondence Analysis (MCA)

DEFAC : Factors description

SPAD provides the main techniques in multidimensional exploratory analysis, combinedwith procedures for clustering. One area of application concerns the processing of large-scale surveys in market research and socio-economic research.

The main applications of factorial analyses are: (1) to reduce the number of dimensionsand (2) to detect structure in the relationships between variables. Therefore, factor analysisis applied as a data reduction or structure detection method.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 31/176

Factorial Analyses with SPAD

31

VOCABULARY 

Active Variables Variables used to perform the factorial analysis

Supplementary variables Variables that are not used to perform the original analysisbut used to illustrate the main results of the analysis.

Contribution Criteria that measures the contribution of an element(category, variable, frequency or case) to the inertia (totalinertia, dimension’s inertia…)

Cosines ² Criteria that measures the quality of representation of anelement (category, variable, case or frequency) for eachdimension.

Axes, factors, dimensions These terms correspond to the factors computed orextracted by the analysis. Consecutive factors areuncorrelated or orthogonal to each other. Factors areconsecutively extracted by maximizing the remainingvariability in the active data.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 32/176

PCA - Principal Component Analysis

32

PCA  -  PRINCIPAL COMPONENT ANALYSIS 

This method performs the principal component analysis of a sample of cases describedwith continuous variables. The analysis can be performed on original variables or normedvariables (centered and normalized) whether the active variables are on the same scale ornot.It is possible to introduce supplementary elements such as: cases, other continuousvariables or categorical variables.

•  Import the Sba dataset Cars.sba.

•  Drag and drop the PCA method on the Cars dataset as follows.

The two goals of the analysis are:

  Capture the main interrelationships between correlated variables in small number

of summary characteristics: dimension reduction

  Identify automobile models with similar attributes: Useful step for developingclustering or classification model

The dataset contains measurements on 6 variables for 24 models: cubic capacity, power,speed, weight, length and width.

Due to strong differences in measurement scales, we will perform a PCA on normedvariables.

KIDENCubic

capacity Power Speed Weight Length Width

Honda civic 1396 90 174 850 369 166

Peugeot 205 Rallye 1294 103 189 805 370 157

Seat Ibiza SX I 1461 100 181 925 363 161

Citroën AX Sport 1294 95 184 730 350 160

Renault 19 1721 92 180 965 415 169

Fiat Tipo 1580 83 170 970 395 170

Peugeot 405 1769 90 180 1080 440 169

Renault 21 2068 88 180 1135 446 170

Citroën BX 1769 90 182 1060 424 168

Opel Omega 1998 122 190 1255 473 177

Peugeot 405 Break 1905 125 194 1120 439 171

Ford Sierra 1993 115 185 1190 451 172

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 33/176

Factorial Analyses with SPAD

33

Renault Espace 1995 120 177 1265 436 177

Nissan Vanette 1952 87 144 1430 436 169

VW Caravelle 2109 112 149 1320 457 184

 Audi 90 Quattro 1994 160 214 1220 439 169

BMW 530i 2986 188 226 1510 472 175

Rover 827i 2675 177 222 1365 469 175

Renault 25 2548 182 226 1350 471 180BMW 325iX 2494 171 208 1600 432 164

Ford Scorpio 2933 150 200 1345 466 176

Fiat Uno 1116 58 145 780 364 155

Peugeot 205 1580 80 159 880 370 156

Ford Fiesta 1117 50 135 810 371 162

 The matrix plot, performed with the STATS method, is a good overview of the pair wiserelationships between variables.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 34/176

PCA - Principal Component Analysis

34

The SETTING OPTIONS 

THE « VARIABLES » TAB 

This tab allows the SPAD user to define the following elements:

  Active continuous variables  Supplementary continuous variables  Supplementary categorical variables

In our example, we select all the available continuous variables as active. We do not haveany more available variable for supplementary information.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 35/176

Factorial Analyses with SPAD

35

THE « CASES » TAB 

The Cases tab allows you to define the role of the cases in the analysis.

The cases retained are the ACTIVE cases, those not retained are called ILLUSTRATIVESor SUPPLEMENTARY. By using the selections by list or interval, we can also define theABANDONNED cases (which are neither active nor illustrative).

All the calculations that lead to the factorial planes, to the hierarchical classification treeand to the final partitions are carried out only on the active cases. The illustrative casesmay be projected onto the factorial planes constructed, and re-assigned during thepartition into classes, of which they are the closest or form a ‘’missing data’’ class.

The cases abandoned are completely ignored in the calculations and affected automaticallyto a missing data class in the partitions.

If you conduct many analyses on a particular sub-population, it may be preferable tocreate a BASE corresponding this one. To do this, use the Recoding chain in the Toolsmenu.

In the Cars example, we select all the cases as active.

THE « PARAMETERS » TAB 

NORMED PCA AND NOT NORMED PCA

Cases coordinates are notdisplayed by default.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 36/176

PCA - Principal Component Analysis

36

Normed PCA means that all the active variables are previously centered and standardizedby SPAD. The consequence is that all the variables are assigned the same contribution tothe overall inertia.When the PCA is not normed (only centered), the distance between the variable and theorigin is equal to the variance of the variable.

Most of the time, it is advised to perform a normed analysis in order to assign the sameimportance to each active variable. It is particularly recommended when themeasurements scales are different.

In our example, we can see that the measurements scales are strongly different. Thus, wewill perform a normed PCA.

RETAINED COORDINATES 

The number of retained coordinates is useful for the methods that follow the PCA in thechain. These methods can be DEFAC (factors description) and RECIP/SEMIS (clustering).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 37/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 38/176

PCA - Principal Component Analysis

38

of variables. The third column contains the cumulative variance extracted. The variancesextracted by the factors are called the eigenvalues. This name derives from thecomputational issues involved.

Eigenvalues and the Number-of-Factors ProblemNow that we have a measure of how much variance each successive factor extracts, we can

return to the question of how many factors to retain. By its nature this is an arbitrarydecision. However, there are some guidelines that are commonly used, and that, inpractice, seem to yield the best results.The Kaiser criterion. First, we can retain only factors with eigenvalues greater than 1. Inessence this is like saying that, unless a factor extracts at least as much as the equivalent ofone original variable, we drop it. This criterion was proposed by Kaiser (1960), and isprobably the one most widely used. In our example above, using this criterion, we wouldretain 1 factor (principal component).The scree test. A graphical method is the scree test first proposed by Cattell (1966). We canplot the eigenvalues shown above in a simple line plot.

0,0

1,0

2,0

3,0

4,0

5,0

1 2 3 4 5 6

 

Cattell suggests to find the place where the smooth decrease of eigenvalues appears tolevel off to the right of the plot. To the right of this point, presumably, one finds only"factorial scree" -- "scree" is the geological term referring to the debris which collects on thelower part of a rocky slope. According to this criterion, we would probably retain 1 or 2

factors in our example.

RESEARCH OF I RREGULAR I T I ES ( TH I RD D I F FERENCES)

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1 - - 2 | - 2785. 86 | ************** **************** **************** ****** |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +RESEARCH OF I RREGULAR I T I ES ( SECOND D I FFERENCES )

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1 - - 2 | 3163. 20 | ************** **************** **************** ****** || 2 - - 3 | 377. 34 | ******* |

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 39/176

Factorial Analyses with SPAD

39

AND ERSON 'S LAPLACE I NTERVALS

W I T H 0 . 9 5 T HRESHO LD

+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | LOWER LI MI T EI GENVALUE UPPER LI MI T |+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1 | 1. 9486 4. 6173 7. 2860 || 2 | 0. 3709 0. 8788 1. 3868 || 3 | 0. 1281 0. 3035 0. 4789 || 4 | 0. 0445 0. 1055 0. 1665 || 5 | 0. 0309 0. 0732 0. 1154 |+- - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +LENGTH AND RELAT I VE POS I T I ON OF I N TERVALS1 . . . . . . . . . . . . . . . . . * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * .2 . . . * - - - - - - - - +- - - - - - - -*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 . *- - +- - *. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 *+* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 +*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Third and second differences as well as Anderson’s laplace intervals are other guidelinesto help the SPAD User to choose the number of dimensions to retain for further analyses.

LOAD I NGS OF VAR I AB LES ON AXES 1 TO 5

ACTI VE VAR I AB LES- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

VARI ABLES | LOADI NGS | VARI ABLE- FACTOR CORRELATI ONS | NORMED EI GENVECTORS- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -I DEN - SHORT LABEL | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -CYLI - Cubi c capaci t y| 0.96 0.01 - 0.15 0. 04 - 0. 23 | 0.96 0. 01 - 0. 15 0.04 - 0. 23 | 0.45 0. 01 - 0. 27 0.11 - 0.84PUI S - Power | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 90 0. 38 - 0. 02 - 0. 16 0. 04 | 0. 42 0. 41 - 0. 03 - 0. 49 0. 15VI TE - Speed | 0. 75 0. 62 0. 20 0. 08 0. 04 | 0.75 0. 62 0. 20 0. 08 0. 04 | 0. 35 0. 66 0. 37 0. 26 0. 13POI D - Wei ght | 0.91 - 0.18 - 0.35 - 0. 06 0. 11 | 0.91 - 0. 18 - 0. 35 - 0.06 0. 11 | 0.42 - 0.19 - 0. 63 - 0.18 0. 42LONG - Length | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 92 - 0. 30 0. 05 0. 22 0. 07 | 0. 43 - 0. 32 0. 10 0. 69 0. 26LARG - Wi dth | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 80 - 0. 48 0. 34 - 0. 14 - 0. 02 | 0. 37 - 0. 51 0. 62 - 0. 42 - 0. 06- - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For normed PCA, correlations (variable – factor) and loadings are equivalent.Apparently, the first factor is generally more highly correlated with the variables than thesecond factor. This is to be expected because, as previously described, these factors areextracted successively and will account for less and less variance overall.

Normed eigen vectors are the coefficients that describe the linear relationship between theactive normed variables and the factors: in this example, we have:

...35.0)(

)(42.0

)(

)(45.01   +⎟⎟

 ⎠

 ⎞⎜⎜⎝ 

⎛    −+⎟⎟

 ⎠

 ⎞⎜⎜⎝ 

⎛    −=

PUIS STDEV 

PUIS  MeanPUIS 

CYLI STDEV 

CYLI  MeanCYLI Factor   

Note:SPAD does not print out neither the contributions nor the cosinus² for the active variables.However, it is possible to calculate them this way:

),²(),²(   α α   j Loading jCos   =  for a normed PCA

),²(),²(   α α   jnCorrelatio jCos   =  for both normed and not normed PCA

and),²(),(   α α   jnVector  NormedEige jonContributi   =  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 40/176

PCA - Principal Component Analysis

40

FACTOR SCORES , CONTR I BUT I ONS AN D SQUARED COS I NES OF CASES

AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - +| CASES | FACTOR SCORES | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DENTI FI ER REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| Honda ci vi c 4. 17 4. 59 | - 2.01 0.32 0.50 - 0.44 - 0.10 | 3. 6 0. 5 3.4 7.6 0. 6 | 0.88 0.02 0. 05 0. 04 0.00 || Peugeot 205 Ral l ye 4. 17 7. 37 | - 2. 25 1. 49 0. 14 0. 09 0. 19 | 4. 6 10. 6 0. 3 0. 3 2. 1 | 0. 69 0. 30 0. 00 0. 00 0. 00 || Seat I bi za SX I 4. 17 4. 73 | - 1.92 0.94 - 0. 06 - 0.36 0.00 | 3. 3 4. 2 0.1 5.0 0. 0 | 0.78 0.19 0. 00 0. 03 0.00 || Ci t r oën AX Spor t 4. 17 8. 78 | - 2.60 1.29 0. 47 - 0.32 - 0.15 | 6. 1 7. 9 3.0 4.0 1. 2 | 0.77 0.19 0. 02 0. 01 0.00 |

| Renaul t 19 4. 17 0. 92 | - 0. 78 - 0. 16 0. 48 0. 20 - 0. 12 | 0. 6 0. 1 3. 1 1.6 0. 8 | 0. 66 0. 03 0. 25 0. 04 0. 01 || Fi at Ti po 4. 17 2. 18 | - 1.30 - 0.43 0. 43 - 0.22 - 0.10 | 1. 5 0. 9 2. 5 2. 0 0. 6 | 0.77 0.09 0. 08 0. 02 0.00 || Peugeot 405 4. 17 0. 71 | - 0. 30 - 0. 46 0. 21 0. 58 0. 16 | 0. 1 1. 0 0. 6 13. 1 1. 4 | 0. 12 0. 30 0. 06 0. 47 0. 04 || Renaul t 21 4. 17 0. 96 | 0. 15 - 0. 64 0. 01 0. 67 - 0. 21 | 0. 0 1. 9 0. 0 17. 8 2. 5 | 0. 02 0. 42 0. 00 0. 47 0. 05 || Ci t r oën BX 4. 17 0. 54 | - 0. 52 - 0. 20 0. 17 0. 40 0. 04 | 0. 2 0. 2 0. 4 6. 2 0. 1 | 0. 50 0. 07 0. 06 0. 29 0. 00 || Opel Omega 4. 17 3. 25 | 1. 45 - 0. 79 0. 51 0. 31 0. 42 | 1. 9 3. 0 3.5 3. 7 10. 0 | 0. 64 0. 19 0. 08 0. 03 0. 05 || Peugeot 405 Br eak 4. 17 0. 55 | 0. 57 0. 13 0. 39 0. 15 0. 19 | 0. 3 0. 1 2. 0 0. 9 2. 1 | 0. 58 0. 03 0. 27 0. 04 0. 07 || Ford Si err a 4. 17 0. 82 | 0.70 - 0.43 0. 14 0. 30 0.16 | 0. 4 0. 9 0. 3 3. 5 1. 4 | 0.60 0.23 0. 02 0. 11 0.03 || Renaul t Espace 4. 17 1. 77 | 0. 86 - 0. 87 0. 20 - 0. 44 0. 13 | 0. 7 3. 6 0. 5 7. 7 0. 9 | 0. 42 0. 43 0. 02 0. 11 0. 01 || Ni ssan Vanett e 4. 17 4. 73 | - 0. 11 - 1. 69 - 1. 33 - 0. 05 0. 24 | 0. 0 13. 6 24. 4 0. 1 3. 3 | 0. 00 0. 61 0. 38 0. 00 0. 01 || VW Caravel l e 4. 17 7. 58 | 1.14 - 2.39 0. 21 - 0.69 - 0.06 | 1. 2 27. 1 0. 6 18.7 0. 2 | 0.17 0.75 0. 01 0. 06 0.00 || Audi 90 Quat t r o 4. 17 3. 43 | 1. 39 1. 10 0. 19 - 0. 03 0. 48 | 1. 7 5. 7 0. 5 0. 0 13. 0 | 0. 56 0. 35 0. 01 0. 00 0. 07 || BMW 530i 4. 17 15. 98 | 3. 88 0. 85 - 0. 35 - 0. 04 - 0. 30 | 13. 6 3. 4 1. 7 0. 1 5. 1 | 0. 94 0. 04 0. 01 0. 00 0. 01 || Rover 827i 4. 17 10. 52 | 3. 15 0. 75 0. 13 0. 05 - 0. 13 | 8. 9 2. 7 0. 2 0. 1 0. 9 | 0. 94 0. 05 0. 00 0. 00 0. 00 || Renaul t 25 4. 17 12. 39 | 3. 39 0. 57 0. 71 - 0. 23 0. 07 | 10. 4 1. 5 6.9 2. 1 0. 3 | 0. 93 0. 03 0. 04 0. 00 0. 00 || BMW 325i X 4. 17 8. 92 | 2. 20 1. 17 - 1. 59 - 0. 24 0. 32 | 4. 4 6. 5 34. 6 2. 3 6. 0 | 0. 54 0. 15 0. 28 0. 01 0. 01 || Ford Scorpi o 4. 17 8. 28 | 2.74 - 0.15 - 0. 19 0. 13 - 0.83 | 6. 8 0. 1 0. 5 0. 6 39. 1 | 0.91 0.00 0. 00 0. 00 0.08 || Fi at Uno 4. 17 14. 29 | - 3. 73 0. 03 - 0. 50 0. 19 0. 01 | 12. 6 0. 0 3. 5 1. 4 0. 0 | 0. 97 0. 00 0. 02 0. 00 0. 00 || Peugeot 205 4. 17 7. 70 | - 2. 60 0. 46 - 0. 72 0. 12 - 0. 39 | 6. 1 1. 0 7. 1 0. 6 8. 4 | 0. 88 0. 03 0. 07 0. 00 0. 02 || Ford Fi esta 4. 17 12. 99 | - 3.49 - 0.87 - 0. 13 - 0.11 - 0.03 | 11. 0 3. 6 0. 2 0.5 0. 1 | 0.94 0.06 0. 00 0. 00 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

 DISTO : the distance between the case and the center of gravity of the overall sample. Thisis helpful to determine the “Average cars”, (close to the center of gravity) and the morespecific ones that are far from the center of gravity.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 41/176

Factorial Analyses with SPAD

41

THE FACTORIAL GRAPH EDITOR 

To access the factorial graph editor, click on this icon .

To create a new factorial graph, select « Graph » - « New », the following windowappears:

The preselection step allows you to select the different elements to display in the graph:

  Active or supplementary cases  Active or supplementary variables  …

If you forget to select an element, you have to create a new graph and redo thepreselection.

THE TOOL BAR OF THE GRAPH EDITOR 

Points Total Delete Cancelselection Unselection the labels the ghosts

Factors Framing Write Setselection selection the labels as ghost

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 42/176

PCA - Principal Component Analysis

42

Information Vertical Correlationon points symmetric view circle

Refresh Horizontalsymmetric view

SAVE A GRAPH

Internal save is dependent on the chain.In the case of a re-execution of the chain, or the deletion by the user of the results of the

chain, these internal saves are deleted.This type of save uses the commands:SaveSave as – internal save of the graphics menu.

When you save in internal format, you give a TITLE to the saved graphic.Later you can reload this save with the command Open – Internal save graphics menu.

The utility of the Save in Internal Format is that all the functions of the annotations andproperties of the factorial planes remain available.

The save in archive format is a save, which is independent of the chain.

This type of save is made using the command Save as – Save archive on the graphicsmenu.When saving in archive format, you give a NAME to the graphic saved with the obligatoryextension .GFA.

Later, you can recover this save with the command Open -Save archive in the Graphics

menu.This save is independent of the chain. Some formats are no longer possible in this type ofsave, in particular the formatting of cases.

The editor for the factorial planes also lets you save the graphics in .BMP or .PCX format.These images can then be inserted into a word processor document.The EMF Metafile format gives the best image quality.This type of Save is made with the command Save as - Screen Image BMP/PCX.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 43/176

Factorial Analyses with SPAD

43

GENERAL PRINCIPLES 

The construction of a graphic after an analysis requires the following general principles:

Go to the New Graphics Menu, which opens the pre-selections Dialogue Box.For a single analysis, you can open several graphics at once through the Graphics Menuand make different pre-selections. All the graphics you create can be saved in an internalor the archive format.

To modify your graph, apply the following rule:

  Select the points with the tool bar or the selection menu  Format them with the format menu  Deselect to see the effect of the embellishments.

IMPORTANTTo manipulate (move, change etc.) the labels and the texts on a graphic, enlarge the frame.For this you have to be in standard mode, that is: no selection mode button is highlighted,and the status bar is empty.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 44/176

PCA - Principal Component Analysis

44

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 45/176

Factorial Analyses with SPAD

45

SCA  -  SIMPLE CORRESPONDENCE ANALYSIS 

This procedure performs a simple correspondence analysis (SCA) on a contingency tableor a table with non-negative numbers.

Simple correspondence analysis is a powerful statistical tool for the graphical analysis ofcontingency tables.

The result of a simple correspondence analysis is a two-dimensional graphicalrepresentation of the association between rows and columns of the table.The plot contains a point for each row and each column of the table. Rows with similarpatterns of counts produce points that are close together, and columns with similar

patterns of counts produce points that are close together.

Simple correspondence analysis analyzes a contingency table made up of one or morecolumn variables and one or more row variables.

To illustrate this method, consider the following dataset, a typical two-dimensionalcontingency table. The data deal with the perception of different kinds of alcohol.

Select the SPAD dataset “ALCOOL.SBA » and import it.

PASTIS WHISKY MARTINI SUZE VODKA GIN MALIBU BEER

Like the taste 49 50 42 18 25 23 25 59

With friends 83 83 76 60 69 68 69 74

To relax oneself 61 61 51 32 38 39 39 72

Become expensive 60 88 42 41 75 70 61 19

Refreshing 78 22 18 19 17 19 14 80

Not elegant 26 11 13 17 13 11 13 29

Friendly product 64 64 56 34 45 42 46 68

Good before meals 88 79 85 64 45 46 37 41

Good during the day 24 21 12 10 13 12 13 85

Good during evening 7 61 12 11 53 50 48 54

For all year long 83 87 85 79 83 82 80 90

Liked by youngs 45 77 36 16 65 69 76 89

Good for guests 88 92 87 60 70 67 67 81

Oldy, not trendy 12 4 13 38 5 6 8 7 As wel l for men as for women 50 62 69 43 49 51 61 60

Close to me 38 41 27 11 16 18 17 49

By habits 36 30 24 16 19 19 17 40

Make snobish 3 35 9 8 28 25 21 4

We can mix it 43 87 29 32 82 80 43 40

For night life / bars / nightclubs 12 91 27 16 84 81 72 67

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 46/176

SCA - Simple correspondence analysis

46

The SETTING OPTIONS 

THE « COLUMNS » TAB 

Active frequencies: all

THE « ROWS » TAB 

This tab is exactly similar to the « Cases » tabs available for the descriptive statisticsmethods.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 47/176

Factorial Analyses with SPAD

47

THE « PARAMETERS » TAB 

In order to display the rowsresults in excel sheets, clickon the « Options » button

and select Yes

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 48/176

SCA - Simple correspondence analysis

48

THE SCA RESULTS 

SIMPLE CORRESPONDENCE ANALYSIS

E I GENVA LUESCOMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 0. 1345

SUM OF EI GENVALUES. . . . . . . . . . . . 0. 1345

H I S TOGRAM OF THE F I RST 7 EI GENVALUES

+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED |   |

| | | | PERCENTAGE |   |

+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

| 1 | 0. 0664 | 49. 37 | 49. 37 |   ******************************************************************************** |

| 2 | 0. 0449 | 33. 34 | 82. 72 |   ******************************************************* |

| 3 | 0. 0124 | 9. 24 | 91. 96 |   *************** |

| 4 | 0. 0069 | 5. 14 | 97. 09 |   ********* |

| 5 | 0. 0029 | 2. 18 | 99. 27 |   **** |

| 6 | 0. 0008 | 0. 63 | 99. 90 |   ** |

| 7 | 0. 0001 | 0. 10 | 100. 00 |   * |

+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

COORD I NATES, CONTR I BUT I ONS OF FREQUENC I ES ON AXES 1 TO 5

ACT I VE FREQUENC I ES+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| FREQUENCI ES | COORDI NATES | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DEN – LABEL REL. WT DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

| PAST - PASTI S 13. 12 0. 17 | -0. 36 - 0. 05 0. 16 0. 11 - 0. 04 | 26. 3 0. 6 26. 5 23. 5 8. 3 | 0. 76 0. 01 0. 14 0. 07 0. 01 || WHI S - WHI SKY 15. 83 0. 05 | 0. 19 0. 02 0. 09 - 0. 02 0. 09 | 8. 4 0. 1 9. 7 0. 6 39. 5 | 0. 67 0. 01 0. 15 0. 00 0. 14 || MART - MARTI NI 11. 23 0. 11 | - 0. 17 - 0. 21 0. 09 - 0. 17 0. 00 | 4. 9 10. 5 7. 2 49. 7 0. 0 | 0. 26 0. 38 0. 07 0. 28 0. 00 || SUZE - SUZE 8. 63 0. 30 | - 0. 22 - 0. 43 - 0. 24 0. 05 0. 04 | 6. 3 35. 6 40. 7 3. 2 3. 9 | 0. 16 0. 62 0. 20 0. 01 0. 00 || VODK - VODKA 12. 35 0. 10 | 0. 30 0. 00 - 0. 01 0. 06 0. 00 | 16. 8 0. 0 0. 0 7. 2 0. 0 | 0. 94 0. 00 0. 00 0. 04 0. 00 || GI N - GI N 12. 13 0. 08 | 0. 28 0. 00 - 0. 01 0. 06 - 0. 01 | 14. 3 0. 0 0. 1 5. 9 0. 7 | 0. 94 0. 00 0. 00 0. 04 0. 00 || MALI - MALI BU 11. 42 0. 07 | 0. 21 0. 02 - 0. 06 - 0. 07 - 0. 11 | 7. 9 0. 1 3. 0 8. 7 45. 9 | 0. 67 0. 00 0. 05 0. 08 0. 17 || BI ER - BEER 15. 30 0. 23 | - 0. 26 0. 39 - 0. 10 - 0. 02 0. 02 | 15. 2 53. 1 12. 7 1. 1 1. 7 | 0. 28 0. 67 0. 04 0. 00 0. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

COORD I NATES , CONTR I BUT I ONS AND SQUARED COS IN ES OF CASES

AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| CASES | COORDI NATES | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DENTI FI ER REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| Li ke t he t aste 4.02 0.08 | - 0.21 0. 10 0.12 - 0.08 0. 06 | 2.6 0. 9 4. 4 3. 9 5.2 | 0.55 0.13 0.18 0.09 0.05 || Wi t h f ri ends 8.04 0.01 | - 0.04 - 0.10 0. 00 - 0.01 - 0.04 | 0.1 1. 9 0. 0 0. 2 3. 9 | 0.09 0.79 0.00 0.01 0.10 || To rel ax onesel f 5.43 0.03 | - 0.14 0. 04 0. 04 - 0.04 0. 02 | 1.6 0.2 0.7 1.1 0.7 | 0.79 0.07 0.06 0.06 0.02 || Become expensi ve 6. 30 0. 12 | 0. 25 - 0. 19 0. 09 0. 11 - 0. 03 | 5. 7 5. 0 4. 2 10. 1 1. 8 | 0. 51 0. 30 0. 07 0. 09 0. 01 |

| Refr eshi ng 3.69 0.48 | - 0.56 0.30 0. 07 0.25 - 0.07 | 17. 5 7. 3 1.5 33. 0 6. 9 | 0.66 0.19 0.01 0.13 0.01 || Not el egant 1.84 0.14 | - 0.32 0. 03 - 0.12 0. 11 - 0.08 | 2.9 0.0 2.0 3.0 3. 9 | 0.76 0.01 0.10 0.08 0.05 || Fri endl y product 5.79 0.01 | - 0.10 0. 00 0. 05 - 0.04 - 0.01 | 0.8 0.0 1.2 1.6 0.2 | 0.67 0.00 0.18 0.13 0.01 || Good bef ore meal s 6. 70 0. 14 | - 0. 18 - 0. 30 0. 11 - 0. 03 0. 06 | 3. 1 13. 0 6. 7 0. 8 8. 5 | 0. 23 0. 64 0. 09 0. 01 0. 03 || Good dur i ng t he day 2. 62 0. 69 | - 0. 43 0. 66 - 0. 25 - 0. 04 0. 11 | 7. 2 25. 1 13. 0 0. 5 10. 6 | 0. 26 0. 63 0. 09 0. 00 0. 02 || Good dur i ng eveni ng 4. 09 0. 25 | 0. 40 0. 26 - 0. 12 - 0. 01 0. 03 | 10. 0 6. 0 5. 1 0. 0 0. 9 | 0. 66 0. 27 0. 06 0. 00 0. 00 || For al l year l ong 9.24 0.02 | - 0.02 - 0.11 - 0.08 - 0.01 - 0.03 | 0.1 2.7 4.2 0.3 3.7 | 0.02 0.60 0.26 0.01 0.05 || Li ked by youngs 6.53 0.09 | 0.17 0. 22 - 0.02 - 0.03 - 0.09 | 2.8 7.0 0.2 0.7 17. 5 | 0.33 0.55 0.01 0.01 0.09 || Good f or guests 8.45 0.02 | - 0.06 - 0.10 0. 03 - 0.04 - 0.01 | 0.5 1.7 0.7 2.2 0.2 | 0.23 0.57 0.07 0.11 0.00 || Ol dy, not t rendy 1.28 1.41 | - 0.46 - 0.84 - 0.68 0. 11 0.08 | 4.1 20. 2 47. 5 2. 3 2. 9 | 0.15 0.50 0.33 0.01 0.00 || As wel l f or men as f or w 6. 15 0.03 | - 0.01 - 0.09 - 0.02 - 0.14 - 0.06 | 0.0 1.2 0.3 16.3 6.6 | 0. 00 0.28 0.02 0.59 0.10 || Cl ose t o me 3.00 0.11 | - 0.22 0. 19 0.13 - 0.05 0. 10 | 2.2 2. 3 4. 1 1. 0 9. 6 | 0.42 0.30 0.15 0.02 0.08 || By habi t s 2.78 0.05 | - 0.21 0.08 0.06 0.02 0.03 | 1. 8 0.4 0. 8 0.2 0. 6 | 0.80 0.11 0.06 0.01 0.01 || Make snobi sh 1. 84 0. 40 | 0. 61 - 0. 09 0. 03 0. 02 0. 09 | 10. 4 0. 4 0. 1 0. 1 4. 6 | 0. 95 0. 02 0. 00 0. 00 0. 02 || We can mi x i t 6. 02 0. 13 | 0. 31 - 0. 03 0. 03 0. 16 0. 07 | 8. 6 0. 1 0. 5 22. 3 11. 4 | 0. 72 0. 01 0. 01 0. 19 0. 04 || For ni ght l i f e / bars / 6.21 0.23 | 0.44 0.18 - 0.07 - 0.02 0.01 | 17.9 4.4 2.7 0.3 0.1 | 0. 84 0.14 0.02 0.00 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 49/176

Factorial Analyses with SPAD

49

The following graph has been designed with the SPAD Amado procedure.Using the SCA results, rows and columns are ranked by decreasing first factorcoordinates. It gives a visual structure to the table. The width of a column is proportionalto its frequency.

28

84

53

82 75 6549

8369 70

45 38 4525 19 16 13 13 5

17

25

81

50

80 70 6951

8268 67

42 39 4623 19 18 11 12 6

19

21

7248 43

6176

6180 69 67

46 39 3725 17 17 13 13 8 14

35

91

6187 88 77

6287 83 92

64 6179

5030 41

11 214

22

927

1229

42 36

6985 76 87

56 51

85

4224 27

13 12 13 18

8 16 1132 41

1643

7960 60

34 32

64

18 16 11 17 1038

19

4

6754

4019

89

60

9074 81

68 72

4159

40 4929

85

7

80

3 12 743

60

4550

83 83 8864 61

88

4936 38 26 24

12

78

VODKA

GIN

MALIBU

WHISKY

MARTINI

SUZE

BEER

PASTIS

   M  a   k  e  s  n  o   b   i  s   h

   F  o  r  n   i  g   h   t   l   i   f  e   /   b  a  r  s   /  n   i  g   h   t  c   l  u   b  s

   G  o  o   d   d  u  r   i  n  g  e  v  e  n   i  n  g

   W  e  c  a  n  m   i  x   i   t

   B  e  c  o  m  e  e  x  p  e  n  s   i  v  e

   L   i   k  e   d   b  y  y  o  u  n  g  s

   A  s  w  e   l   l   f  o  r  m  e  n  a  s   f  o  r  w  o  m  e  n

   F  o  r  a   l   l  y  e  a  r   l  o  n  g

   W   i   t   h   f  r   i  e  n   d  s

   G  o  o   d   f  o  r  g  u  e  s   t  s

   F  r   i  e  n   d   l  y  p  r  o   d  u  c   t

   T  o  r  e   l  a  x  o  n  e  s  e   l   f

   G  o  o   d   b  e   f  o  r  e  m  e  a   l  s

   L   i   k  e   t   h  e   t  a  s   t  e

   B  y   h  a   b   i   t  s

   C   l  o  s  e   t  o  m  e

   N  o   t  e   l  e  g  a  n   t

   G  o  o   d   d  u  r   i  n  g   t   h  e   d  a  y

   O   l   d  y .  n  o   t   t  r  e  n   d  y

   R  e   f  r  e  s   h   i  n  g

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 50/176

MCA - Multiple Correspondence Analysis

50

MCA  -  MULTIPLE CORRESPONDENCE ANALYSIS 

The multiple correspondence analysis extends the simple correspondence analysisproperties to n-way tables.The procedure requires more than 2 active categorical variables, observed on a set of cases.As well as for the other factorial analyses, it is possible to add some supplementaryelements such as illustrative cases, illustrative continuous or categorical variables.

We will perform the MCA on the ASPI1000.SBA dataset.

VARIABLES DESCRIPTION OF THE ASPI1000.SBA DATASET 

 ACTIVE CATEGORICAL VARIABLES - 7 VARIABLES - 28 CATEGORIES

11 . Gender ( 2 categories )29 . Do you own securities ? ( 2 categories )39 . Urban area size (number of inhabitants) ( 5 categories )49 . Job category ( 5 categories )51 . Diploma in 5 categories ( 5 categories )52 . Occupation status of housing in 4 categories ( 4 categories )53 . Age in 5 categories ( 5 categories )

SUPPLEMENTARY CATEGORICAL VARIABLES - 35 VARIABLES - 152 CATEGORIES

All available categorical variables

SUPPLEMENTARY CONTINUOUS VARIABLES - 8 VARIABLES

All available continuous variables

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 51/176

Factorial Analyses with SPAD

51

The SETTING OPTIONS 

THE « VARIABLES » TAB 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 52/176

MCA - Multiple Correspondence Analysis

52

THE « PARAMETERS » TAB 

Random assignment of active categories inferior to (in %)To assure the robustness of the analysis, it may be useful, on the definition of the

axes of the analysis, to take into account only the categorical variables of a sufficientweight.For each question, the cases concerned by a weak total weight category will beassigned at random to one of the other categories of the variable with a sufficientweight in the question considered. This cleaning operation allows the data table toconserve its completely disjunctive property.

The parameter PCMIN fixes the percentage of the total weight of the active casesbelow which a category is considered to have a weight too weak. If all the caseshave the weight 1, PCMIN is the percentage of the number of active cases below

which a category will be broken down.

If all the categories for a question (or all except one) have too weak weight, thequestion itself will be made illustrative for the calculation of the axes.The default value (2%) is suitable for most analyses. If the parameter is set to 0.0,only the categories with a null weight will be eliminated.

Retained coordinatesThe number of retained coordinates is useful for the methods that follow the MCA

in the chain. These methods can be DEFAC (factors description) and RECIP/SEMIS(clustering).

By default, cases coordinatesare not displayed.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 53/176

Factorial Analyses with SPAD

53

THE MCA RESULTS 

MULTIPLE CORRESPONDENCE ANALYSIS

E LI M I N A T I ON O F AC TI V E CA T EGORI ES W I T H SMA L L WE I GH T S

 THRESHOLD ( PCMI N) : 2. 00 % WEI GHT: 20. 00BEFORE CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ESAFTER CLEANI NG : 7 ACTI VE QUESTI ONS 28 ASSOCI ATE CATEGORI ES TOTAL WEI GHT OF ACTI VE CASES : 1000. 00

MARG I NA L D I S TRI BUT I ONS OF ACTI VE QUEST I ONS- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

CATEGORI ES | BEFORE CLEANI NG | AFTER CLEANI NGI DENT LABEL | COUNT WEI GHT | COUNT WEI GHT HI STOGRAM OF RELATI VE WEI GHTS,- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

11 . Gendermasc - mal e | 469 469. 00 | 469 469. 00 *** **** **** *** **** *** **** ****f émi - gender | 531 531. 00 | 531 531. 00 *** **** **** *** **** *** **** **** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

29 . Do you own some securi t i es ?vmo1 - Yes | 121 121. 00 | 121 121. 00 ** **** **

vmo2 - No | 879 879. 00 | 879 879. 00 *** **** **** *** **** *** **** **** *** **** **** *** **** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -39 . Ur ban area si ze ( number of i nhabi t ant s)

agg1 - Lower t han 2. 000 | 83 83. 00 | 83 83. 00 *****agg2 - 2. 000 - 20. 000 | 87 87. 00 | 87 87. 00 ***** *agg3 - 20. 000 - 100. 000 | 175 175. 00 | 175 175. 00 ***** ******agg4 - greater t han 100. 000 | 329 329. 00 | 329 329. 00 ***** ******* ***** ***agg5 - Par i s | 326 326. 00 | 326 326. 00 *** **** **** *** **** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

49 . J ob categoryemp1 - Worker | 263 263. 00 | 263 263. 00 ***** ******* ****emp2 - Empl oyee | 335 335. 00 | 335 335. 00 ***** ******* ***** ****emp3 - Manager | 229 229. 00 | 229 229. 00 ** **** ** **** **emp4 - Ot her | 48 48. 00 | 48 48. 00 ==RAND. ASSI GN. ==  49_ - mi ssi ng category | 125 125. 00 | 125 125. 00 ***** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

51 . Di pl oma i n 5 categori esdi e1 - No one | 189 189. 00 | 189 189. 00 ***** *******di e2 - CEP | 321 321. 00 | 321 321. 00 *** **** **** *** **** **

di e3 - BEPC- BE-BEPS | 158 158. 00 | 158 158. 00 ***** *****di e4 - Bac - Br evet sup. | 182 182. 00 | 182 182. 00 *** **** ****di e5 - Uni ver si t y | 150 150. 00 | 150 150. 00 *** **** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

52 . Occupat i on st atus of housi ng i n 4 categori essl o1 - homeowner | 120 120. 00 | 120 120. 00 ** **** **sl o2 - owner | 290 290. 00 | 290 290. 00 *** **** **** *** ****sl o3 - t enant | 523 523. 00 | 523 523. 00 ******* ******* ******* ******** ***sl o4 - f r ee housi ng, other | 67 67. 00 | 67 67. 00 *** **- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

53 . Age i n 5 categori esagc1 - Lower t han 25 yo | 150 150. 00 | 150 150. 00 ***** *****agc2 - 25 t o 34 yo | 284 284. 00 | 284 284. 00 ***** ******* ***** *agc3 - 35 t o 49 yo | 209 209. 00 | 209 209. 00 ***** ******* *agc4 - 50 t o 64 yo | 188 188. 00 | 188 188. 00 ***** *******agc5 - 65 yo and more | 169 169. 00 | 169 169. 00 ** **** ** ***- - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 54/176

MCA - Multiple Correspondence Analysis

54

E I GENVA LUESCOMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 2. 8571

SUM OF EI GENVALUES. . . . . . . . . . . . 2. 8571

H I S TOGRAM OF THE F I RST 2 0 E I GENVALUES+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED |   || | | | PERCENTAGE |   |

+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - +| 1 | 0. 2703 | 9. 46 | 9. 46 |   ******************************************************************************** |  

| 2 | 0. 2369 | 8. 29 | 17. 75 |   *********************************************************************** |  

| 3 | 0. 2084 | 7. 29 | 25. 05 |   ************************************************************** || 4 | 0. 1922 | 6. 73 | 31. 77 |   ********************************************************* || 5 | 0. 1846 | 6. 46 | 38. 23 |   ******************************************************* || 6 | 0. 1578 | 5. 52 | 43. 76 |   *********************************************** || 7 | 0. 1534 | 5. 37 | 49. 13 |   ********************************************** || 8 | 0. 1493 | 5. 23 | 54. 35 |   ********************************************* || 9 | 0. 1441 | 5. 04 | 59. 40 |   ******************************************* || 10 | 0. 1398 | 4. 89 | 64. 29 |   ****************************************** || 11 | 0. 1326 | 4. 64 | 68. 93 |   **************************************** || 12 | 0. 1300 | 4. 55 | 73. 48 |   *************************************** || 13 | 0. 1284 | 4. 49 | 77. 97 |   ************************************** || 14 | 0. 1222 | 4. 28 | 82. 25 |   ************************************* || 15 | 0. 1070 | 3. 74 | 86. 00 |   ******************************** || 16 | 0. 1015 | 3. 55 | 89. 55 |   ******************************* || 17 | 0. 0954 | 3. 34 | 92. 89 |   ***************************** || 18 | 0. 0821 | 2. 87 | 95. 76 |   ************************* |

| 19 | 0. 0748 | 2. 62 | 98. 38 |   *********************** || 20 | 0. 0462 | 1. 62 | 100. 00 |   ************** |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - +

RESEARCH OF I RREGULAR I T I ES ( TH I RD D I F FERENCES)+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 5 - - 6 | - 27. 77 | **************************************************** || 14 - - 15 | - 10. 42 | ******************** || 17 - - 18 | - 6. 67 | ************* || 13 - - 14 | - 5. 44 | *********** || 10 - - 11 | - 3. 77 | ******** || 2 - - 3 | - 3. 66 | ******* || 8 - - 9 | - 1. 53 | *** |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

RESEARCH OF I RREGULAR I T I ES ( SECOND D I FFERENCES )+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 5 - - 6 | 22. 31 | **************************************************** || 2 - - 3 | 12. 28 | ***************************** || 14 - - 15 | 9. 83 | *********************** || 3 - - 4 | 8. 62 | ********************* || 1 - - 2 | 4. 94 | ************ || 10 - - 11 | 4. 67 | *********** || 11 - - 12 | 0. 90 | *** || 8 - - 9 | 0. 81 | ** || 6 - - 7 | 0. 40 | * |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + 

Irregularity 2nd diff between 5 and 6 = [ (  λ7 –  λ6 ) – (  λ6 –  λ5 ) ] * 1000

The two tables below are the equivalent of the scree test (or Cattel test).This procedure detects the main irregularities in the graph and ranks them by decreasingimportance.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 55/176

Factorial Analyses with SPAD

55

LOAD I NGS , CONTR I BUT I ONS AND SQUARED COS I N ES OF ACTI VE CATEGOR I ES

AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| CATEGORI ES | LOADI NGS | CONTRI BUTI ONS | SQUARED COSI NES || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - || I DEN - LABEL REL. WT. DI STO | 1 2 3 4 5 | 1 2 3 4 5 | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 11 . Gender || masc - mal e 6.70 1.13 | - 0.29 0. 08 0.43 - 0.47 - 0.25 | 2.1 0.2 6.0 7. 6 2. 3 | 0.07 0.01 0.16 0.19 0.06 || f émi - gender 7.59 0.88 | 0.26 - 0.07 - 0.38 0. 41 0.22 | 1.8 0.2 5. 3 6. 7 2. 0 | 0.07 0.01 0.16 0.19 0.06 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 3. 9 0. 3 11. 2 14. 4 4. 3 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 29 . Do you own some secur i t i es ? |

| vmo1 - Yes 1.73 7.26 | 0.69 1.46 - 0.25 - 0.23 0.06 | 3.1 15. 5 0.5 0. 5 0.0 | 0.07 0.29 0.01 0.01 0.00 || vmo2 - No 12. 56 0.14 | - 0.10 - 0.20 0.03 0.03 - 0.01 | 0.4 2. 1 0.1 0. 1 0.0 | 0.07 0.29 0.01 0.01 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 3. 5 17. 6 0. 6 0. 6 0. 0 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 39 . Urban area si ze ( number of i nhabi t ant s) || agg1 - Lower t han 2.000 1.19 11. 05 | -1. 06 0.83 - 1.06 0.75 - 0.06 | 5.0 3.4 6.4 3.5 0.0 | 0. 10 0.06 0.10 0.05 0.00 || agg2 - 2.000 - 20. 000 1.24 10. 49 | - 0.55 0.26 0.28 0.80 - 0.61 | 1.4 0.3 0.5 4.2 2.5 | 0.03 0.01 0.01 0.06 0.04 || agg3 - 20.000 - 100.000 2.50 4.71 | -0. 27 0. 07 - 0.17 0.07 - 0.12 | 0.7 0.1 0.3 0.1 0.2 | 0. 02 0.00 0.01 0.00 0.00 || agg4 - great er t han 100.000 4.70 2.04 | - 0.04 - 0.40 0.05 - 0.22 - 0.27 | 0.0 3.2 0.0 1.2 1.9 | 0.00 0.08 0.00 0.02 0.04 || agg5 - Par i s 4.66 2.07 | 0. 60 0.08 0.24 - 0.22 0.52 | 6. 2 0.1 1. 3 1.2 6. 7 | 0.18 0.00 0.03 0.02 0.13 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 13. 3 7. 1 8. 5 10. 1 11. 3 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 49 . J ob cat egory || emp1 - Worker 3.94 2.62 | - 0.88 - 0.47 0.54 - 0.66 - 0.20 | 11. 2 3.6 5. 6 8.9 0. 8 | 0.29 0.08 0.11 0.17 0.01 || emp2 - Empl oyee 4. 91 1. 91 | - 0. 19 - 0. 20 - 0. 38 0. 67 0. 63 | 0. 6 0. 8 3. 5 11. 4 10. 5 | 0. 02 0. 02 0. 08 0. 23 0. 21 || emp3 - Manager 3. 44 3. 15 | 0. 80 0. 89 0. 74 0. 02 - 0. 14 | 8. 2 11. 4 9. 0 0. 0 0. 4 | 0. 21 0. 25 0. 17 0. 00 0. 01 || 49_ - mi ssi ng category 1.99 6.19 | 0.80 - 0.12 - 1.41 - 0.38 - 0.91 | 4.7 0.1 18. 9 1.5 9.0 | 0. 10 0.00 0.32 0.02 0.13 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 24. 8 16. 0 36. 9 21. 8 20. 6 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 51 . Di pl oma i n 5 categori es || di e1 - No one 2.70 4.29 | - 0.70 - 0.23 - 0.23 - 0.93 0.34 | 5.0 0. 6 0.7 12. 1 1.7 | 0.12 0.01 0.01 0.20 0.03 || di e2 - CEP 4.59 2.12 | - 0.80 0.08 0.05 0.29 - 0.07 | 10. 9 0.1 0. 1 2.0 0. 1 | 0.30 0.00 0.00 0.04 0.00 || di e3 - BEPC- BE-BEPS 2.26 5.33 | 0.23 - 0.62 - 0.17 0. 47 0. 56 | 0.4 3.7 0.3 2.6 3.8 | 0.01 0.07 0.01 0.04 0.06 || di e4 - Bac - Brevet sup. 2.60 4.49 | 0.93 - 0.06 - 0.32 0.26 - 0.95 | 8.3 0.0 1.3 0.9 12. 6 | 0. 19 0.00 0.02 0.01 0.20 || di e5 - Uni ver si t y 2.14 5.67 | 1.23 0. 84 0.73 - 0.26 0. 27 | 12. 1 6. 4 5.5 0. 8 0. 8 | 0.27 0.13 0.10 0.01 0.01 |

+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 36. 6 10. 9 7. 9 18. 4 19. 2 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 52 . Occupat i on st atus of housi ng i n 4 categori es || sl o1 - homeowner 1. 71 7. 33 | - 0. 31 - 0. 06 0. 85 1. 02 - 1. 30 | 0. 6 0. 0 5. 9 9. 2 15. 7 | 0. 01 0. 00 0. 10 0. 14 0. 23 || sl o2 - owner 4.14 2.45 | - 0.44 1. 00 - 0.51 - 0.07 - 0.01 | 3.0 17. 6 5. 2 0. 1 0. 0 | 0.08 0.41 0.11 0.00 0.00 || sl o3 - t enant 7.47 0.91 | 0.27 - 0.51 0. 15 - 0.15 0. 33 | 2.0 8.2 0.8 0. 9 4. 4 | 0.08 0.28 0.03 0.03 0.12 || sl o4 - f ree housi ng, other 0.96 13. 93 | 0.34 - 0.25 - 0.50 - 0.33 - 0.20 | 0.4 0.3 1.2 0.6 0.2 | 0. 01 0.00 0.02 0.01 0.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 6. 0 26. 0 13. 0 10. 8 20. 3 +- - - - - - - - - - - - - - - - - - - - - - - - - - +| 53 . Age i n 5 categori es || agc1 - Lower t han 25 yo 2.14 5.67 | 0.81 - 0.98 - 0.89 - 0.68 - 0.80 | 5.2 8.7 8.2 5.2 7.4 | 0. 12 0.17 0.14 0.08 0.11 || agc2 - 25 t o 34 yo 4.06 2.52 | 0.35 - 0.45 0. 63 0. 47 0. 41 | 1.9 3.4 7.8 4.8 3.7 | 0.05 0.08 0.16 0.09 0.07 || agc3 - 35 t o 49 yo 2.99 3.78 | - 0.33 0. 36 0. 41 0.41 - 0.69 | 1.2 1.6 2.5 2.6 7.6 | 0.03 0.03 0.05 0.04 0.12 || agc4 - 50 t o 64 yo 2.69 4.32 | - 0.51 0. 30 - 0.42 0. 21 0. 25 | 2.6 1.0 2.3 0.6 0.9 | 0.06 0.02 0.04 0.01 0.01 || agc5 - 65 yo and more 2. 41 4. 92 | - 0. 34 0. 84 - 0. 32 - 0. 93 0. 59 | 1. 0 7. 2 1. 2 10. 8 4. 6 | 0. 02 0. 14 0. 02 0. 17 0. 07 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - CUMULATED CONTRI BUTI ON = 11. 8 22. 0 21. 9 23. 9 24. 2 +- - - - - - - - - - - - - - - - - - - - - - - - - - +

 P.REL : relative weight of the category .P.REL = ( nq * 100 ) / ( n * Q ) where nq is the weight of the category, n the overall weight

and Q the number of active variables.For example, for the « male » category, P.REL = ( 469 * 100 ) / ( 1000 * 7 ) = 6.70 .

DISTO : distance between the category and the center of gravity. This criteria depends onthe weight of the category. The formula is the following :

d² (j,G) = ( n / n j ) – 1 where n j is the weight of the category j and n the overall weight

LOAD I NGS AND TEST -VALUES OF CATEGOR IES

AXES 1 TO 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| CATEGORI ES | TEST-VALUES | LOADI NGS | || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - |

| I DEN - LABEL COUNT ABS. WT | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 11 . Gender || masc - mal e 469 469.00 | - 8.6 2. 3 12. 8 - 13. 9 - 7.5 | - 0.29 0.08 0. 43 - 0.47 - 0.25 | 1.13 || f émi - gender 531 531.00 | 8.6 - 2.3 - 12. 8 13. 9 7.5 | 0.26 - 0.07 - 0.38 0.41 0.22 | 0.88 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 29 . Do you own some securi t i es ? || vmo1 - Yes 121 121.00 | 8.1 17. 1 - 2.9 - 2.7 0.7 | 0.69 1. 46 - 0.25 - 0.23 0. 06 | 7.26 || vmo2 - No 879 879.00 | - 8.1 - 17. 1 2. 9 2. 7 - 0.7 | - 0.10 - 0.20 0. 03 0.03 - 0.01 | 0.14 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 39 . Urban area si ze ( number of i nhabi t ant s) || agg1 - Lower t han 2.000 83 83. 00 | - 10. 1 7.9 - 10. 1 7.2 -0. 6 | - 1.06 0.83 - 1.06 0.75 - 0.06 | 11. 05 || agg2 - 2.000 - 20.000 87 87. 00 | -5. 4 2.5 2.7 7.8 - 5.9 | -0. 55 0.26 0.28 0.80 - 0.61 | 10. 49 || agg3 - 20. 000 - 100.000 175 175.00 | - 3.9 1.1 -2. 4 1.0 - 1.7 | - 0.27 0.07 - 0.17 0.07 - 0.12 | 4.71 || agg4 - great er t han 100.000 329 329.00 | - 0.9 - 8.8 1.0 - 4.8 - 6.0 | - 0.04 -0. 40 0.05 -0. 22 - 0.27 | 2.04 || agg5 - Par i s 326 326.00 | 13. 2 1. 8 5. 2 - 4.9 11. 3 | 0.60 0. 08 0.24 - 0.22 0. 52 | 2.07 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 49 . J ob category || emp1 - Worker 263 263.00 | - 16. 1 - 9.7 10. 7 - 12. 6 - 3.5 | - 0.86 - 0.51 0.57 - 0.67 - 0.18 | 2.80 || emp2 - Empl oyee 335 335. 00 | - 3. 6 - 5. 0 - 8. 5 15. 2 14. 2 | - 0. 16 - 0. 22 - 0. 38 0. 68 0. 63 | 1. 99 || emp3 - Manager 229 229. 00 | 14. 6 14. 9 13. 2 0. 2 - 2. 1 | 0. 85 0. 86 0. 77 0. 01 - 0. 12 | 3. 37 || emp4 - Ot her 48 48. 00 | - 5.2 5. 3 - 3.5 - 0.2 - 3.3 | - 0.73 0. 75 - 0.50 - 0.03 - 0.47 | 19. 83 || 49_ - mi ssi ng category 125 125.00 | 11. 4 - 2.4 - 16. 6 - 5.0 - 10. 9 | 0.96 - 0.20 - 1.39 - 0.42 - 0.91 | 7.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 56/176

MCA - Multiple Correspondence Analysis

56

| 51 . Di pl oma i n 5 categori es || di e1 - No one 189 189.00 | - 10. 8 - 3.5 - 3.5 - 14. 2 5.3 | - 0.70 - 0.23 - 0.23 - 0.93 0.34 | 4.29 || di e2 - CEP 321 321.00 | - 17. 4 1.8 1. 2 6. 3 - 1.5 | - 0.80 0.08 0.05 0.29 - 0.07 | 2.12 || di e3 - BEPC- BE-BEPS 158 158.00 | 3.1 - 8.5 - 2.3 6.5 7.7 | 0.23 - 0.62 - 0.17 0.47 0.56 | 5.33 || di e4 - Bac - Brevet sup. 182 182.00 | 13. 9 - 0.9 - 4.8 3.8 - 14. 1 | 0.93 - 0.06 - 0.32 0.26 - 0.95 | 4.49 || di e5 - Uni ver si t y 150 150.00 | 16. 4 11. 2 9. 7 - 3.5 3.6 | 1.23 0. 84 0. 73 - 0.26 0. 27 | 5.67 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 52 . Occupat i on st atus of housi ng i n 4 categori es || sl o1 - homeowner 120 120.00 | - 3.6 - 0.7 9.9 11. 9 - 15. 2 | - 0.31 - 0.06 0.85 1.02 - 1.30 | 7.33 || sl o2 - owner 290 290.00 | - 8.9 20. 2 - 10. 3 - 1.4 - 0.2 | - 0.44 1. 00 - 0.51 - 0.07 - 0.01 | 2.45 || sl o3 - t enant 523 523.00 | 9.0 - 16. 9 5. 1 - 5.0 10. 9 | 0.27 - 0.51 0. 15 - 0.15 0. 33 | 0.91 || sl o4 - f ree housi ng, other 67 67. 00 | 2. 8 -2.1 -4. 3 -2.8 -1. 7 | 0. 34 -0.25 - 0. 50 -0.33 - 0. 20 | 13. 93 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

| 53 . Age i n 5 categori es || agc1 - Lower t han 25 yo 150 150.00 | 10. 7 - 13. 0 - 11. 8 - 9.1 - 10. 6 | 0.81 - 0.98 - 0.89 - 0.68 - 0.80 | 5.67 || agc2 - 25 t o 34 yo 284 284. 00 | 7. 0 - 8. 9 12. 6 9. 5 8. 1 | 0. 35 - 0. 45 0. 63 0. 47 0. 41 | 2. 52 || agc3 - 35 t o 49 yo 209 209.00 | - 5.3 5.9 6.7 6.6 - 11. 2 | - 0.33 0.36 0.41 0.41 - 0.69 | 3.78 || agc4 - 50 t o 64 yo 188 188.00 | -7. 7 4.6 - 6.4 3.1 3.9 | -0. 51 0.30 - 0.42 0.21 0.25 | 4.32 || agc5 - 65 yo and more 169 169. 00 | - 4. 8 12. 0 - 4. 5 - 13. 2 8. 4 | - 0. 34 0. 84 - 0. 32 - 0. 93 0. 59 | 4. 92 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 1 . The f ami l y i s t he onl y pl ace where you f eel wel l || f bi 1 - Yes 561 561.00 | - 14. 5 4.4 - 3.6 0. 6 0.5 | - 0.40 0.12 - 0.10 0.02 0.02 | 0.78 || f bi 2 - No 431 431.00 | 14. 6 - 4.5 3. 6 - 0.4 - 0.8 | 0.53 - 0.16 0.13 - 0.02 - 0.03 | 1.32 || 1_ - mi ssi ng category 8 8.00 | -0. 3 0.6 - 0.4 - 1.0 1.5 | -0. 11 0. 20 - 0.13 - 0.34 0.54 | 124.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 2 . Opi ni on about weddi ng || Mar1 - i ndissol uble 231 231. 00 | -7.9 4.1 - 3. 3 - 3. 2 0.3 | -0.46 0. 23 - 0. 19 - 0. 19 0. 02 | 3. 33 || Mar2 - di ssol ved seri ous pb 342 342.00 | - 1.8 3.4 - 1.8 3.2 -0. 7 | - 0.08 0.15 - 0.08 0.14 - 0.03 | 1.92 || Mar3 - di ssol ved i f agreem 387 387.00 | 8.7 -6. 4 4.7 -0. 2 0.2 | 0.35 - 0.25 0.19 - 0.01 0.01 | 1.58 || Mar4 - I do not know 39 39. 00 | - 0.3 - 1.3 0.0 - 0.4 0.5 | - 0.05 - 0.21 0.00 - 0.06 0.08 | 24. 64 || 2_ - mi ssi ng category 1 1.00 | 0.8 0.8 - 0.8 0.1 - 0.4 | 0.79 0.77 - 0.81 0.09 - 0.42 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 3 . Housekeepi ng works, t ake care of chi l dren. . . || Mén1 - onl y women do i t 42 42. 00 | - 3.5 - 0.6 - 0.9 - 0.9 - 0.4 | - 0.52 - 0.08 - 0.14 - 0.14 - 0.06 | 22. 81 || Mén2 - usual l y t he women 336 336.00 | - 2.4 4.9 -1. 4 - 2.3 2.0 | - 0.11 0.22 - 0.06 - 0.10 0.09 | 1.98 |

| Mén3 - men and women 599 599. 00 | 3. 6 - 4. 3 2. 1 2. 9 - 2. 1 | 0. 09 - 0. 11 0. 05 0. 07 - 0. 05 | 0. 67 || Mén4 - I do not know 19 19. 00 | 0.7 - 0.3 - 2.1 - 0.7 0.1 | 0.15 - 0.07 - 0.47 - 0.15 0.02 | 51. 63 || 3_ - mi ssi ng category 4 4.00 | 0.2 - 1.3 1.2 - 0.9 1.9 | 0.11 - 0.64 0.62 - 0.43 0.93 | 249.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 4 . Are you sati sf i ed of your dai l y l i f e || Cad1 - a l ot 259 259.00 | - 0.8 5. 3 - 3.4 1. 7 - 0.9 | - 0.04 0.28 - 0.18 0.09 - 0.05 | 2.86 || Cad2 - enough 549 549. 00 | - 0. 9 0. 1 1. 2 0. 1 0. 2 | - 0. 03 0. 00 0. 03 0. 00 0. 00 | 0. 82 || Cad3 - a l i tt l e 145 145. 00 | 1. 9 - 4.8 1. 3 - 1. 3 1. 1 | 0. 14 - 0. 37 0. 10 - 0. 10 0. 08 | 5. 90 || Cad4 - not at all 46 46. 00 | 0. 6 - 3.3 2. 0 - 1. 6 - 0.3 | 0. 08 - 0. 47 0. 29 - 0. 23 - 0. 04 | 20. 74 || 4_ - mi ssi ng category 1 1.00 | 0.4 1.5 0.7 - 0.6 - 0.1 | 0.35 1.52 0.72 - 0.56 - 0.12 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 5 . The envi ronment al protecti on and mai ntenance i s. . . || env1 - ver y i mpor t ant 657 657.00 | 8.0 0.0 1.8 0.8 - 1.4 | 0.18 0.00 0.04 0.02 - 0.03 | 0.52 || env2 - qui t e i mport ant 298 298.00 | - 7.1 - 0.1 - 0.7 0.3 0.6 | - 0.34 0.00 - 0.04 0.02 0.03 | 2.36 || env3 - not i mport ant 36 36. 00 | -3. 0 - 0.1 - 2.8 - 1.7 2.4 | -0. 49 - 0.01 - 0.46 - 0.27 0.39 | 26. 78 || env4 - not at all i mportant 7 7. 00 | - 0. 4 0.1 -0. 1 -2.5 - 0.3 | - 0. 16 0. 05 -0.02 -0.94 - 0. 13 | 141. 86 || 5_ - mi ssi ng category 2 2.00 | 0.3 0.8 - 0.1 - 0.3 - 0.5 | 0.20 0.59 - 0.10 - 0.24 - 0.37 | 499.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 6 . Do scienti f i c di scoveri es amel i orate the qual i ty of l i f e ? || sci 1 - Yes, a l i t tl e 509 509. 00 | -1. 9 - 0.2 0. 3 0. 5 - 0.6 | -0. 06 0. 00 0. 01 0. 02 - 0. 02 | 0. 96 || sci 2 - Yes, a l ot 383 383. 00 | 3. 1 1.8 - 1. 2 0. 8 - 0.3 | 0. 12 0. 07 - 0. 05 0. 03 - 0. 01 | 1. 61 || sci 3 - Not at all 105 105. 00 | -1.6 - 2.3 1. 3 - 2. 0 1.6 | -0.15 - 0. 22 0. 12 - 0. 19 0. 15 | 8. 52 |

| 6_ - mi ssi ng category 3 3. 00 | - 1. 1 - 1.5 0. 1 - 0. 8 - 0.6 | - 0. 65 - 0. 89 0. 07 - 0. 49 - 0. 36 | 332. 33 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 7 . Are you sati sf i ed of your healt h || Snt 1 - a l ot 267 267.00 | 3.8 0. 3 0. 4 - 1.1 - 2.1 | 0.20 0. 02 0.02 - 0.06 - 0.11 | 2.75 || Snt 2 - sati sf i ed 600 600.00 | - 2.7 0. 4 0. 4 2. 0 2. 3 | - 0.07 0. 01 0.01 0. 05 0.06 | 0.67 || Snt 3 - a l i tt l e 115 115. 00 | - 0. 6 - 0.8 - 1. 1 - 1. 3 - 1.2 | - 0. 05 - 0. 07 - 0. 09 - 0. 12 - 0. 10 | 7. 70 || Snt 4 - not at all 18 18. 00 | -1.1 - 0.5 - 0. 2 - 0. 3 1.3 | -0.25 - 0. 11 - 0. 05 - 0. 06 0. 30 | 54. 56 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 8 . Evol uti on of your dai l y l i f e f or the l ast 10 years || Ftr 1 - i mprovi ng a l ot 102 102.00 | 1.7 0.9 0.5 1.8 - 0.8 | 0.16 0.08 0.04 0.17 - 0.08 | 8.80 || Ftr 2 - i mprovi ng a l i tt l e 316 316. 00 | - 1. 2 - 1.5 1. 8 4. 2 - 0.9 | - 0. 05 - 0. 07 0. 08 0. 20 - 0. 04 | 2. 16 || Ftr 3 - t he same 250 250.00 | 0.8 2.3 - 2.6 - 3.0 - 2.1 | 0.05 0. 12 - 0.14 - 0.16 - 0.11 | 3.00 || Ftr 4 - a l i tt l e worse 190 190. 00 | -2. 2 0.3 0. 8 - 2. 1 3.7 | -0. 14 0. 02 0. 05 - 0. 14 0. 24 | 4. 26 || Ftr 5 - a l ot worse 114 114.00 | 0.3 - 0.1 1.3 - 0.1 1.6 | 0.03 - 0.01 0.12 - 0.01 0.14 | 7.77 || Ftr 6 - I do not know 26 26. 00 | 2. 9 - 4.0 - 3. 2 - 1. 9 - 2.3 | 0. 55 - 0. 78 - 0. 61 - 0. 36 - 0. 45 | 37. 46 || 8_ - mi ssi ng category 2 2. 00 | - 0. 7 - 0.3 - 1. 2 - 1. 8 - 1.0 | - 0. 47 - 0. 23 - 0. 83 - 1. 30 - 0. 73 | 499. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 9 . Your opi ni on on t he j usti ce runni ng i n 1986 || J us1 - ver y wel l 13 13. 00 | 0.0 1. 7 - 2.2 - 2.1 0. 2 | 0.01 0.47 - 0.60 - 0.57 0.06 | 75. 92 || J us2 - qui te wel l 243 243. 00 | -0.8 3.4 - 0. 1 - 0. 2 - 0.8 | -0.05 0. 19 - 0. 01 - 0. 01 - 0. 04 | 3. 12 || J us3 - qui t e bad 398 398.00 | 0.6 - 1.0 - 1.7 1.2 - 1.8 | 0.02 - 0.04 - 0.06 0. 05 - 0.07 | 1.51 || J us4 - ver y bad 256 256.00 | 1.3 - 2.9 3.9 - 1.3 1.1 | 0.07 - 0.16 0. 21 - 0.07 0. 06 | 2.91 || J us5 - I do not know 65 65. 00 | -3. 3 0.5 - 2.1 0.0 0.6 | -0. 40 0.05 - 0.26 0.00 0.07 | 14. 38 || J us6 - do not answer 25 25. 00 | 2.2 - 0.1 - 0.4 1.9 3.4 | 0.43 - 0.02 - 0.09 0.37 0.68 | 39. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 10 . Do you t hi nk t he soci ety needs t o change || Soc1 - yes 759 759.00 | 1.8 - 4.8 3.1 - 0.3 - 0.4 | 0.03 - 0.08 0. 05 - 0.01 - 0.01 | 0.32 || Soc1 - no 170 170.00 | - 0.6 4. 4 - 2.3 0. 9 - 0.6 | - 0.04 0.31 - 0.16 0.06 - 0.04 | 4.88 || Soc1 - I do not know 71 71. 00 | - 2.1 1.5 - 1.7 - 0.8 1.7 | - 0.24 0.17 - 0.20 - 0.09 0.19 | 13. 08 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 12 . Educat i onal l evel of t he respondent || di p1 - No one 189 189.00 | - 10. 8 - 3.5 - 3.5 - 14. 2 5.3 | - 0.70 - 0.23 - 0.23 - 0.93 0.34 | 4.29 || di p2 - CEP 321 321.00 | - 17. 4 1.8 1. 2 6.3 - 1.5 | - 0.80 0.08 0.05 0.29 - 0.07 | 2.12 || di p3 - BEPC- BE-BEPS 158 158.00 | 3.1 - 8.5 - 2.3 6.5 7.7 | 0.23 - 0.62 - 0.17 0.47 0.56 | 5.33 || di p4 - Bac 162 162.00 | 13. 2 - 1.7 - 5.1 3. 7 - 14. 0 | 0.95 - 0.12 - 0.37 0. 26 - 1.01 | 5.17 || di p5 - brevet sup. 20 20. 00 | 3.4 2.2 0.2 0.9 - 2.1 | 0.76 0. 48 0. 05 0. 19 - 0.46 | 49. 00 || di p6 - Uni ver si t y 142 142.00 | 15. 8 10. 9 9. 8 - 3.5 3.3 | 1.23 0. 85 0. 76 - 0.27 0. 26 | 6.04 || di p7 - other 8 8.00 | 3. 7 2.2 0. 6 - 0.2 1.4 | 1. 31 0.77 0.21 - 0.07 0.50 | 124.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 13 . What do you t hi nk about publ i c nur seri es || cre1 - ver y sati sf yi ng 139 139.00 | 1.6 - 3.6 1.5 1.8 2.3 | 0.13 - 0.28 0.12 0.14 0.18 | 6.19 || cre2 - qui t e sati sf yi ng 386 386.00 | 1.8 2.8 1.8 0.7 - 0.7 | 0.07 0.11 0.07 0.03 - 0.03 | 1.59 || cre3 - not very sati sf yi ng 242 242. 00 | 1. 6 0.4 -0. 4 - 1. 3 - 1.5 | 0. 09 0. 02 - 0. 02 - 0. 08 - 0. 08 | 3. 13 |

| cre4 - not at all sati sf . 92 92. 00 | - 0. 9 - 1.8 - 0. 8 - 1. 1 0.8 | - 0. 09 - 0. 18 - 0. 08 - 0. 11 0. 08 | 9. 87 || cre5 - does not know 139 139.00 | - 5.8 0.6 - 2.7 - 0.1 0.0 | - 0.45 0.05 - 0.21 - 0.01 0.00 | 6.19 || 13_ - mi ssi ng category 2 2.00 | 2.0 0.5 - 1.6 - 0.9 - 1.4 | 1.40 0.37 - 1.11 - 0.66 - 0.98 | 499.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 57/176

Factorial Analyses with SPAD

57

| 14 . What do you t hi nk about at - home mot her s || cre1 - very sati sf yi ng 786 786. 00 | - 6. 8 2.9 - 4. 5 3. 5 - 0.3 | - 0. 11 0. 05 - 0. 07 0. 06 - 0. 01 | 0. 27 || cre2 - qui te sati sf yi ng 129 129. 00 | 6. 0 - 1.7 1. 9 - 1. 8 - 0.7 | 0. 50 - 0. 14 0. 16 - 0. 14 - 0. 06 | 6. 75 || cre3 - not very sati sf yi ng 35 35. 00 | 2. 7 - 1.5 2. 8 - 1. 5 1.3 | 0. 45 - 0. 25 0. 47 - 0. 25 0. 21 | 27. 57 || cre4 - not at all sati sf . 20 20. 00 | 2. 8 - 1.0 2. 4 - 1. 3 1.0 | 0. 63 - 0. 22 0. 53 - 0. 29 0. 21 | 49. 00 || cre5 - does not know 29 29. 00 | - 1.0 - 1.5 2.1 - 2.3 0.2 | - 0.19 - 0.27 0.38 - 0.43 0.03 | 33. 48 || 14_ - mi ssi ng category 1 1.00 | 0.8 0.8 - 0.8 0.1 - 0.4 | 0.79 0.77 - 0.81 0.09 - 0.42 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 16 . Do you l i ke your l andscape vi ew || Log1 - a l ot 516 516.00 | - 4.8 4. 7 - 4.1 2. 2 0. 0 | - 0.15 0. 14 - 0.13 0. 07 0.00 | 0.94 || Log2 - enough 296 296.00 | 3.2 - 0.4 3. 7 - 0.3 - 0.5 | 0.16 - 0.02 0.18 - 0.01 - 0.02 | 2.38 || Log3 - a l i tt l e 82 82. 00 | 1. 3 - 2.6 1. 8 - 1. 0 1.5 | 0. 14 - 0. 27 0.19 - 0. 10 0.16 | 11. 20 || Log4 - not at all 104 104. 00 | 1. 7 - 4.7 - 0. 4 - 2. 0 - 0.5 | 0. 15 - 0. 44 - 0. 03 - 0. 19 - 0. 05 | 8. 62 |

| 16_ - mi ssi ng category 2 2.00 | 1.0 0.3 0.1 - 2.4 0.1 | 0.69 0.23 0.05 - 1.68 0.05 | 499.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 17 . Do you own a di sh washi ng machi ne ? || l av1 - Yes 211 211.00 | 4. 6 7.4 1. 0 2.9 - 6.0 | 0. 28 0.45 0.06 0.18 - 0.37 | 3.74 || l av2 - Not 789 789. 00 | - 4. 6 - 7.4 - 1. 0 - 2. 9 6. 0 | - 0. 07 - 0. 12 - 0. 02 - 0. 05 0. 10 | 0. 27 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 18 . Do you own a col or TV ? || t co1 - Yes 373 373.00 | - 2.5 3. 8 - 0.6 0. 4 0.2 | - 0.10 0.16 - 0.02 0.02 0.01 | 1.68 || t co2 - Not 624 624.00 | 2.6 - 3.7 0. 5 - 0.4 - 0.4 | 0.06 - 0.09 0.01 - 0.01 - 0.01 | 0.60 || 18_ - mi ssi ng category 3 3.00 | - 1.0 - 0.3 0.8 0.1 1.0 | - 0.59 - 0.17 0.45 0.08 0.59 | 332.33 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 20 . Occupat i on st atus of housi ng || Occ1 - homeowner 120 120. 00 | - 3. 6 - 0. 7 9. 9 11. 9 - 15. 2 | - 0. 31 - 0. 06 0. 85 1. 02 - 1. 30 | 7. 33 || Occ2 - owner 290 290.00 | - 8.9 20. 2 - 10. 3 - 1.4 - 0.2 | - 0.44 1. 00 - 0.51 - 0.07 - 0.01 | 2.45 || Occ3 - t enant 523 523.00 | 9.0 - 16. 9 5.1 - 5.0 10. 9 | 0.27 - 0.51 0.15 - 0.15 0.33 | 0.91 || Occ4 - f ree housi ng 58 58. 00 | 2.5 - 2.2 - 3.3 - 2.6 - 0.7 | 0.32 - 0.28 - 0.42 - 0.33 - 0.09 | 16. 24 || Occ5 - other 9 9. 00 | 1. 3 - 0.2 - 3. 2 - 1. 1 - 2.9 | 0. 44 - 0. 06 - 1. 05 - 0. 38 - 0. 96 | 110. 11 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 21 . The housi ng expenses are f or you || Dép1 - uni mport ant 113 113.00 | 0.1 2.8 - 4.0 - 2.1 0.9 | 0.01 0. 24 - 0.36 - 0.19 0. 08 | 7.85 || Dép2 - wi t hout bi g probl em 444 444.00 | - 2.0 2.6 1.9 - 0.4 -1. 6 | - 0.07 0.09 0.07 - 0.01 - 0.06 | 1.25 || Dép3 - a bi g probl em 352 352.00 | 1.1 - 2.9 1.9 2.8 1.9 | 0.05 - 0.12 0.08 0.12 0.08 | 1.84 |

| Dép4 - a very bi g probl em 55 55. 00 | 0.2 - 3.0 1.1 0.1 0.6 | 0.03 - 0.39 0.14 0.01 0.07 | 17. 18 || Dép5 - do not f ace wi t h 6 6.00 | - 0.2 1.2 0.8 - 0.8 - 0.1 | - 0.10 0.47 0.32 - 0.32 - 0.03 | 165.67 || Dép6 - I do not know 22 22. 00 | 2.0 - 1.2 - 4.1 - 1.6 - 2.3 | 0.42 - 0.25 - 0.86 - 0.34 - 0.48 | 44. 45 || 21_ - mi ssi ng category 8 8.00 | 0.9 - 0.1 - 3.2 - 1.9 - 1.8 | 0.33 - 0.05 - 1.13 - 0.66 - 0.62 | 124.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 22 . Ar e you embar assed wi t h t he noi se ? || bru1 - a l i tt l e 196 196. 00 | 1. 7 0.4 0. 6 - 0. 7 - 0.2 | 0. 11 0.03 0.04 - 0. 05 - 0. 02 | 4. 10 || bru2 - a l ot 197 197.00 | 2.8 - 3.3 0. 9 - 3.8 1. 8 | 0.18 - 0.21 0. 06 - 0.25 0. 11 | 4.08 || bru3 - not at all 606 606. 00 | -3.6 2.3 - 1. 2 3. 7 - 1.3 | -0.09 0. 06 - 0. 03 0. 09 - 0. 03 | 0. 65 || 22_ - mi ssi ng category 1 1.00 | - 1.0 0.9 - 1.1 0.4 1.0 | - 0.99 0.92 - 1.15 0.36 0.95 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 23 . Do you par t i ci pat e t o t he envi r onment al protecti on ? || déf 1 - Yes 126 126.00 | 6.5 0. 9 1.9 - 1.3 - 3.6 | 0.54 0.07 0.16 - 0.11 - 0.30 | 6.94 || déf 2 - No 874 874.00 | - 6.5 - 0.9 - 1.9 1. 3 3.6 | - 0.08 - 0.01 - 0.02 0.02 0.04 | 0.14 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 24 . Last j ob || cs01 - manoeuvre 13 13. 00 | - 3.1 - 1.6 1.7 - 1.6 - 0.9 | - 0.87 - 0.44 0. 46 - 0.43 - 0.26 | 75. 92 || cs02 - ouvri er spéci ali sé 98 98. 00 | - 9. 1 - 5.6 5. 6 - 6. 0 - 1.6 | - 0. 87 - 0. 54 0. 53 - 0. 58 - 0. 16 | 9. 20 || cs03 - ouvri er qual i f i é 152 152. 00 | - 11. 3 - 6.7 8. 0 - 10. 0 - 2.6 | - 0. 85 - 0. 50 0. 60 - 0. 75 - 0. 19 | 5. 58 || cs04 - empl oyé de commerce 39 39. 00 | - 0. 6 - 2. 6 - 2. 0 5. 5 4. 7 | - 0. 09 - 0. 41 - 0. 31 0. 86 0. 74 | 24. 64 || cs05 - aut re empl oyé qual . 68 68. 00 | 2.3 - 3.9 -2. 5 7.2 5.1 | 0.27 - 0.45 - 0.30 0.84 0.59 | 13. 71 || cs06 - aut re emp. non qual . 91 91. 00 | - 1.5 -2. 4 - 3.8 6.3 6.5 | - 0.15 - 0.24 - 0.38 0.63 0.65 | 9.99 |

| cs07 - per sonnel de servi ce 70 70. 00 | - 2.8 -1. 9 - 4.6 5.7 5.6 | - 0.32 - 0.22 - 0.53 0.65 0.65 | 13. 29 || cs08 - cont remaî t re 14 14. 00 | - 1.9 1.5 - 0.9 0.2 0.7 | - 0.49 0. 39 - 0.24 0. 04 0. 20 | 70. 43 || cs09 - arti san 18 18. 00 | - 2. 3 - 0.4 - 1. 2 3. 0 3.3 | - 0. 55 - 0. 09 - 0. 28 0.70 0. 78 | 54. 56 || cs10 - pet i t commercant 35 35. 00 | - 2.8 1.1 - 2.6 3.5 3.7 | - 0.47 0.19 - 0.43 0.58 0.62 | 27. 57 || cs11 - cadr e moyen 135 135. 00 | 9. 1 6. 6 7. 9 2. 3 - 3. 5 | 0. 73 0. 53 0. 64 0. 19 - 0. 28 | 6. 41 || cs12 - pat ron i ndus.commer. 10 10. 00 | 2.2 4.4 1.8 - 1.9 0.7 | 0.70 1.38 0.57 - 0.61 0.22 | 99. 00 || cs13 - professi on l i béral e 15 15. 00 | 3.9 6.7 3.0 - 0.5 0.5 | 0.99 1.72 0.76 - 0.14 0.12 | 65. 67 || cs14 - cadre supéri eur 69 69. 00 | 9.2 10. 8 9.0 - 1.8 0.8 | 1.07 1.25 1.05 - 0.21 0.09 | 13. 49 || cs15 - expl oi tant agri col e 32 32. 00 | - 7. 2 5.5 - 4. 6 0. 8 - 2.5 | - 1. 25 0. 95 - 0. 80 0. 13 - 0. 44 | 30. 25 || cs16 - sal ari é agri col e 0 0.00 | 0.0 0. 0 0. 0 0.0 0.0 | 0.00 0.00 0. 00 0. 00 0. 00 | 0.00 || cs17 - aut re acti f 13 13. 00 | 1. 2 1. 2 1. 2 - 1. 0 - 1.9 | 0. 34 0. 32 0. 34 - 0. 27 - 0. 51 | 75. 92 || cs99 - i nconnu 3 3.00 | 0.2 0. 7 - 1.7 - 1.4 - 0.9 | 0.12 0. 43 - 0.97 - 0.78 - 0.50 | 332.33 || 24_ - mi ssi ng category 125 125.00 | 11. 4 - 2.4 - 16. 6 - 5.0 - 10. 9 | 0.96 - 0.20 - 1.39 - 0.42 - 0.91 | 7.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 25 . Does your j ob expose you t o heal t h ri sk ? || tr a1 - Lot s of ri sks 108 108. 00 | - 4. 7 - 2.1 4. 5 - 0. 9 - 3.7 | - 0. 43 - 0. 19 0. 40 - 0. 08 - 0. 33 | 8. 26 || tr a2 - Few ri sks 192 192. 00 | - 2. 1 - 1.5 7. 1 0.5 - 1.5 | - 0. 14 - 0. 09 0. 46 0. 03 - 0. 10 | 4. 21 || t ra3 - No ri sk 276 276.00 | 2.5 - 2.5 5. 1 6. 8 3. 9 | 0.13 - 0.13 0. 26 0.35 0. 20 | 2.62 || 25_ - mi ssi ng category 424 424.00 | 2.4 4.7 - 13. 1 - 6.0 0.0 | 0.09 0.17 - 0.48 - 0.22 0.00 | 1.36 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 26 . Do you have work- per sonal l i f e probl ems || con1 - yes 229 229.00 | 2. 8 - 1.6 7. 7 3.2 1. 1 | 0. 16 - 0.09 0.45 0.18 0.06 | 3.37 || con2 - no 338 338.00 | - 5.0 - 3.4 6. 7 3.3 - 0.9 | - 0.22 - 0.15 0.29 0.15 - 0.04 | 1.96 || 26_ - mi ssi ng category 433 433.00 | 2.4 4.6 - 12. 9 - 5.8 - 0.1 | 0.09 0.17 - 0.47 - 0.21 0.00 | 1.31 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 27 . Have you r ecentl y been ner vous || ner 1 - Yes 273 273.00 | 2. 6 - 1.7 0. 6 1.7 0. 8 | 0. 13 - 0.09 0.03 0.09 0.04 | 2.66 || ner 2 - No 726 726.00 | - 2.6 1.7 - 0.6 - 1.8 - 0.8 | - 0.05 0. 03 - 0.01 - 0.04 - 0.02 | 0.38 || 27_ - mi ssi ng category 1 1.00 | 0.4 0.3 - 0.5 1.2 0.8 | 0.35 0.30 - 0.53 1.23 0.77 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 28 . Have you r ecentl y been depressed || éta1 - Yes 122 122.00 | 2. 5 - 1.9 0. 1 0.6 0. 7 | 0. 21 - 0.16 0.01 0.05 0.06 | 7.20 || éta2 - No 874 874.00 | - 2.1 1.8 - 0.2 - 0.4 - 0.6 | - 0.03 0. 02 0.00 - 0.01 - 0.01 | 0.14 || 28_ - mi ssi ng category 4 4.00 | -1. 7 0.4 0.5 - 0.9 - 0.1 | -0. 87 0.18 0.27 - 0.45 - 0.05 | 249.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 30 . Do you own real estate propert i es ? || vi m1 - Yes 81 81. 00 | 2.5 8. 6 - 3.8 - 0.3 - 1.6 | 0.27 0.92 - 0.41 - 0.03 - 0.18 | 11. 35 || vi m2 - No 918 918.00 | - 2.6 - 8.6 3. 6 0.2 1. 9 | - 0.02 - 0.08 0.03 0.00 0.02 | 0.09 || 30_ - mi ssi ng category 1 1.00 | 0.7 0.7 1.6 0.8 - 2.2 | 0.69 0.66 1.63 0.78 - 2.18 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 31 . Do you regul arl y i mpose rest ri ct i ons || rst 1 - Yes 569 569.00 | 1. 5 - 6.3 1. 3 0.8 1.6 | 0. 04 - 0.17 0.04 0.02 0.04 | 0.76 || rst 2 - No 414 414. 00 | - 1. 3 6. 4 - 1. 4 - 1. 0 - 1.6 | - 0. 05 0. 24 - 0. 05 - 0. 04 - 0. 06 | 1. 42 |

| 31_ - mi ssi ng category 17 17. 00 | -0. 6 - 0.1 0.6 0.6 0.1 | -0. 15 - 0.01 0.15 0.13 0.02 | 57. 82 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 58/176

MCA - Multiple Correspondence Analysis

58

| 32 . Your opi ni on on t he evol uti on of French peopl e l i f e l evel || Frç1 - a l ot bett er 78 78. 00 | -1.6 4.0 - 1. 6 0. 4 0.3 | -0.17 0. 43 - 0. 17 0. 05 0. 03 | 11. 82 || Frç2 - a l i tt l e bett er 321 321. 00 | - 0. 1 3.9 - 2. 4 1. 7 - 3.8 | 0. 00 0. 18 - 0. 11 0. 08 - 0. 18 | 2. 12 || Frç3 - i t i s t he same 159 159.00 | - 1.6 - 1.9 0.7 0.4 0.7 | - 0.11 - 0.14 0.05 0.03 0.05 | 5.29 || Frç4 - a l i tt l e worse 276 276. 00 | 0. 1 - 1.7 2. 0 - 1. 3 1.3 | 0. 00 - 0. 09 0. 10 - 0. 07 0. 07 | 2. 62 || Frç5 - a l ot worse 108 108.00 | 3.2 - 3.1 3.4 - 0.8 2.6 | 0.29 - 0.29 0.31 - 0.08 0.24 | 8.26 || Frç6 - I do not know 57 57. 00 | - 0. 1 - 1.8 - 2. 7 - 1. 0 0.3 | - 0. 02 - 0. 23 - 0. 35 - 0. 12 0. 04 | 16. 54 || 32_ - mi ssi ng category 1 1.00 | 0.9 - 1.4 - 0.7 0.5 0.3 | 0.94 - 1.43 - 0.75 0.48 0.25 | 999.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 33 . Do you i nvi t e some f ri ends f or di nner ? || bou1 - Of t en 606 606.00 | 8.1 - 0.6 2.3 4. 5 - 2.9 | 0.21 - 0.02 0. 06 0.11 - 0.07 | 0.65 || bou2 - Rarel y 274 274.00 | - 5.0 0. 2 0. 5 - 1.9 1. 1 | - 0.26 0. 01 0.02 - 0.10 0. 06 | 2.65 || bou3 - Never 120 120.00 | - 5.4 0. 6 - 4.0 - 4.1 2. 9 | - 0.46 0. 05 - 0.35 - 0.35 0. 25 | 7.33 |

+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 34 . Ar e you a member of a r el i gi ous associ ati on ? || asc1 - yes 69 69. 00 | 1. 8 6.2 - 1.2 0. 6 - 1.4 | 0. 21 0.72 - 0.14 0.07 - 0.16 | 13. 49 || asc2 - no 931 931.00 | - 1.8 - 6.2 1. 2 - 0.6 1. 4 | - 0.02 - 0.05 0.01 - 0.01 0.01 | 0.07 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 35 . Do you wat ch TV || Tél 1 - every day 419 419.00 | - 12. 1 2.6 - 5.0 - 1.7 1.3 | - 0.45 0.10 - 0.19 - 0.06 0.05 | 1.39 || Tél 2 - qui t e oft en 226 226.00 | 1.5 0. 2 1.9 0.6 0. 5 | 0.09 0.01 0. 11 0. 04 0. 03 | 3.42 || Tél 3 - not very oft en 231 231.00 | 7.4 - 1.7 3.2 1.8 - 3.2 | 0.43 - 0.10 0.18 0.11 - 0.19 | 3.33 || Tél 4 - never 124 124.00 | 6.9 - 2.0 1.0 - 0.6 1.5 | 0.58 - 0.17 0. 09 - 0.05 0. 13 | 7.06 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 36 . I n order t o change t he soci ety, do we need. . . || cha1 - Progressi ve r eforms 490 490.00 | - 0.5 -0. 4 - 1.7 1.5 -2. 4 | - 0.01 - 0.01 - 0.06 0.05 - 0.08 | 1.04 || cha2 - Radi cal changes 258 258.00 | 2.4 - 4.2 5.5 - 1.4 2.0 | 0.13 - 0.23 0.30 - 0.07 0.11 | 2.88 || cha3 - does not know 29 29. 00 | - 0.4 - 0.5 - 1.8 - 2.4 1.7 | - 0.07 - 0.09 - 0.34 - 0.44 0.31 | 33. 48 || 36_ - mi ssi ng category 223 223.00 | -1. 9 5.1 - 3.0 0.6 0.1 | -0. 11 0.30 - 0.18 0.03 0.01 | 3.48 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 38 . Ar e you a member of at l eas one associ ati on ? || ass1 - yes 536 536.00 | 3. 3 4.6 1. 2 1.9 - 5.8 | 0. 10 0.14 0.03 0.06 - 0.17 | 0.87 || ass2 - no 464 464. 00 | - 3. 3 - 4.6 - 1. 2 - 1. 9 5. 8 | - 0. 11 - 0. 16 - 0. 04 - 0. 06 0. 20 | 1. 16 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 40 . Age and gender |

| enq1 - mal e LE t han 38 yo 101 101. 00 | 4. 1 - 1. 2 0. 4 1. 3 - 1. 4 | 0. 39 - 0. 11 0. 04 0. 12 - 0. 13 | 8. 90 || enq2 - mal e GT 38 yo 35 35. 00 | - 3.1 0.0 - 0.6 2.4 - 3.3 | - 0.52 0.00 - 0.10 0.39 - 0.55 | 27. 57 || enq3 - f emal e LE 38 yo 526 526. 00 | 7. 0 1. 3 3. 9 - 2. 6 6. 3 | 0. 21 0. 04 0. 12 - 0. 08 0. 19 | 0. 90 || enq4 - f emal e GT 38 yo 338 338.00 | - 8.8 -0. 6 - 4.1 1.0 -4. 5 | - 0.39 - 0.03 - 0.18 0.04 - 0.20 | 1.96 || enq* - unknown 0 0. 00 | 0. 0 0. 0 0. 0 0. 0 0. 0 | 0. 00 0. 00 0. 00 0. 00 0. 00 | 0. 00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 41 . Usual bedti me ? || dod1 - 21h. or bef ore 73 73. 00 | - 6.4 2.1 - 2.9 - 3.5 2.6 | - 0.72 0.24 - 0.32 - 0.39 0.29 | 12. 70 || dod2 - bet ween 21h - 22h. 270 270.00 | - 7.5 0.6 -0. 8 0.7 0.3 | - 0.39 0.03 - 0.04 0.04 0.02 | 2.70 || dod3 - bet ween 22h - 23h. 443 443.00 | 1.2 - 1.3 - 0.5 3.2 - 1.1 | 0.04 - 0.05 - 0.02 0.11 - 0.04 | 1.26 || dod4 - bet ween 23h - 24h. 134 134.00 | 8.1 0.7 3.0 -1. 2 - 0.5 | 0.65 0.06 0.24 - 0.10 - 0.04 | 6.46 || dod5 - aft er mi dni ght 63 63. 00 | 6.1 - 1.8 0.7 - 2.0 - 0.7 | 0.74 - 0.22 0.08 - 0.24 - 0.08 | 14. 87 || dod6 - var i abl e 11 11. 00 | 0.1 - 1.3 2.4 0. 3 0. 3 | 0.02 - 0.38 0. 72 0.10 0. 09 | 89. 91 || 41_ - mi ssi ng category 6 6.00 | 1.9 2.1 - 0.8 - 1.9 0.3 | 0.75 0.87 - 0.31 - 0.76 0.13 | 165.67 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 43 . Has t he r espondent been i nter ested by t he survey || I nt 1 - a l ot 332 332. 00 | - 2. 8 1.9 - 3. 3 3.3 - 4.9 | - 0. 13 0.08 - 0. 15 0.15 - 0. 22 | 2. 01 || I nt2 - enough 542 542.00 | 1.0 - 1.7 1. 1 - 1.6 2. 2 | 0.03 - 0.05 0. 03 - 0.05 0. 06 | 0.85 || I nt 3 - a l i tt l e or not 124 124. 00 | 2. 3 0.0 3. 1 - 2. 2 3.8 | 0. 19 0. 00 0. 26 - 0. 18 0. 32 | 7. 06 || 43_ - mi ssi ng category 2 2.00 | 2.1 - 1.1 - 0.4 - 0.4 - 1.4 | 1.46 - 0.76 - 0.29 - 0.29 - 0.96 | 499.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 44 . I deal number of chi l dren ? |

| enf 0 - No one 51 51. 00 | 0.4 - 0.4 1. 5 - 1.9 0. 4 | 0.06 - 0.06 0. 20 - 0.26 0. 06 | 18. 61 || enf 1 - one 39 39. 00 | 0. 5 - 1.3 1. 1 - 0.1 2. 5 | 0. 07 - 0.20 0.17 - 0.02 0.39 | 24. 64 || enf 2 - t wo 431 431.00 | - 2.4 - 3.1 1. 3 0.9 1. 9 | - 0.09 - 0.11 0.05 0.03 0.07 | 1.32 || enf 3 - t hree 393 393.00 | 0.1 2.4 - 2.0 1.3 - 1.5 | 0.00 0. 10 - 0.08 0. 05 - 0.06 | 1.54 || enf 4 - f or and more 86 86. 00 | 3.5 2.4 - 0.8 - 2.3 - 2.7 | 0.36 0.25 - 0.08 - 0.24 - 0.28 | 10. 63 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| 54 . J ob i n 7 categori es || csp1 - ex. agr. - art - commer 95 95. 00 | - 6.4 5.3 - 4.3 3.4 2.5 | - 0.62 0.52 - 0.42 0.33 0.25 | 9.53 || csp2 - prof . l i b. - cad. sup. 84 84. 00 | 10. 1 12.8 9. 5 - 1. 9 0.9 | 1. 06 1. 34 1. 00 - 0. 20 0. 09 | 10. 90 || csp3 - ouvri ers 263 263.00 | - 16. 1 - 9.7 10. 7 - 12. 6 - 3.5 | - 0.86 - 0.51 0.57 - 0.67 - 0.18 | 2.80 || csp4 - empl oyés 198 198.00 | 0.1 - 5.5 - 5.3 11. 7 10. 2 | 0.01 - 0.35 - 0.34 0. 74 0.65 | 4.05 || csp5 - cont remaî - cad. moy. 149 149.00 | 8.2 6.8 7.3 2.3 - 3.1 | 0.62 0.52 0.55 0.17 - 0.24 | 5.71 || csp6 - per sonnel de servi ce 70 70. 00 | - 2.8 -1. 9 - 4.6 5.7 5.6 | - 0.32 - 0.22 - 0.53 0.65 0.65 | 13. 29 || csp7 - aut res 16 16. 00 | 1.2 1.4 0. 4 - 1.5 - 2.1 | 0.30 0. 34 0.09 - 0.37 - 0.51 | 61. 50 || 54_ - mi ssi ng category 125 125.00 | 11. 4 - 2.4 - 16. 6 - 5.0 - 10. 9 | 0.96 - 0.20 - 1.39 - 0.42 - 0.91 | 7.00 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

 CORRELAT I ONS BETWEEN CONT I N UOUS VAR I AB LES AND FACTORS

AXES 1 A 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| VARI ABLES | SUMMARY STATI STI CS | CORRELATI ONS || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || NUM . ( I DEN) SHORT LABEL | COUNT ABS. WT MEAN ST. DEV. | 1 2 3 4 5 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 15 . ( r i ng) Engi neer annual sal a | 806 806. 00 8478. 73 3668. 95 | - 0. 04 0. 06 0. 04 - 0. 01 0. 05 || 19 . ( r med) Doctor annual sal ary | 713 713. 00 19383. 85 12608. 83 | - 0. 05 0. 12 0. 02 0. 03 -0. 06 || 37 . ( âge ) Age | 1000 1000. 00 42. 68 17. 50 | - 0. 40 0. 55 - 0. 14 - 0. 21 0. 28 || 42 . ( nrep) Number of mi ssi ng va | 1000 1000. 00 4. 05 4. 19 | - 0. 20 0. 12 - 0. 20 - 0. 08 0. 12 || 45 . ( f i né) End of st udy age | 997 997. 00 17. 29 3. 88 | 0. 69 0. 13 0. 24 0. 05 - 0. 11 || 46 . ( r sou) Sal ary wi shes | 915 915. 00 7244. 48 4756. 78 | 0. 26 0. 21 0. 15 0. 03 - 0. 09 || 47 . ( r mi n) Resour ces esti mati on | 897 897. 00 5561. 89 2423. 40 | 0. 19 - 0. 01 0. 14 - 0. 08 0. 14 || 48 . ( vaca) Summer hol i days i n n | 1000 1000. 00 18. 31 19. 37 | 0. 38 0. 02 0. 03 - 0. 06 - 0. 07 || 50 . ( POND) Wei ght | 1000 1000. 00 1. 00 0. 09 | - 0. 47 0. 27 - 0. 11 0. 25 - 0. 22 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 59/176

Factorial Analyses with SPAD

59

DEFAC  - Factors description 

This procedure provides help on the interpretation of the factors extracted from a factor

analysis.A factor can be described quickly and clearly by the most significant elements. These maybe cases, categorical variables, continuous variables or frequencies, and used as active orillustrative elements in the preceding analysis.

THE « DESCRIPTION COMMAND » TAB 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 60/176

MCA - Multiple Correspondence Analysis

60

THE « PARAMETERS » TAB 

THE DEFAC RESULTS 

INTERPRETATION TOOLS FOR FACTORIAL AXESPRINTOUT ON FACTOR 1

BY ACTIVE CATEGORIES

+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I DEN. | T. VALUE | CATEGORY LABEL | VARI ABLE LABEL | WEI GHT | NUMBER || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || di e2 | - 17. 40 | CEP | Di pl oma i n 5 categori es | 321.00 | 1 || emp1 | - 16. 14 | Worker | J ob category | 263. 00 | 2 || di e1 | - 10. 75 | No one | Di pl oma i n 5 categori es | 189.00 | 3 || agg1 | - 10. 11 | Lower t han 2.000 | Urban area si ze ( number of i nhabi t ant s) | 83. 00 | 4 || sl o2 | - 8.88 | owner | Occupat i on st atus of housi ng i n 4 categori es | 290.00 | 5 || masc | - 8. 62 | mal e | Gender | 469. 00 | 6 || vmo2 | - 8. 14 | No | Do you own some secur i t i es ? | 879. 00 | 7 || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || M I D D L E A R E A || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || sl o3 | 8.98 | t enant | Occupat i on st atus of housi ng i n 4 categori es | 523.00 | 22 || agc1 | 10. 73 | Lower t han 25 yo | Age i n 5 categori es | 150. 00 | 23 || 49_ | 11. 44 | mi ssi ng category | J ob category | 125.00 | 24 || agg5 | 13. 22 | Par i s | Urban area si ze ( number of i nhabi t ant s) | 326.00 | 25 || di e4 | 13. 86 | Bac - Brevet sup. | Di pl oma i n 5 categori es | 182.00 | 26 |

| emp3 | 14. 64 | Manager | J ob category | 229. 00 | 27 || di e5 | 16. 38 | Uni vers i t y | Di pl oma i n 5 categori es | 150.00 | 28 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +PRINTOUT ON FACTOR 2

BY ACTIVE CATEGORIES

+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I DEN. | T. VALUE | CATEGORY LABEL | VARI ABLE LABEL | WEI GHT | NUMBER || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || vmo2 | - 17. 08 | No | Do you own some secur i t i es ? | 879. 00 | 1 || sl o3 | - 16. 86 | t enant | Occupat i on st atus of housi ng i n 4 categori es | 523.00 | 2 || agc1 | - 13. 04 | Lower t han 25 yo | Age i n 5 categori es | 150. 00 | 3 || emp1 | - 9. 65 | Worker | J ob category | 263. 00 | 4 || agc2 | - 8.90 | 25 t o 34 yo | Age i n 5 categori es | 284.00 | 5 || agg4 | - 8.83 | great er t han 100.000 | Urban area si ze ( number of i nhabi t ant s) | 329.00 | 6 || di e3 | - 8.54 | BEPC- BE-BEPS | Di pl oma i n 5 categori es | 158.00 | 7 || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || M I D D L E A R E A || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || agc3 | 5.86 | 35 t o 49 yo | Age i n 5 categori es | 209.00 | 22 || agg1 | 7. 86 | Lower t han 2. 000 | Ur ban area si ze ( number of i nhabi t ant s) | 83. 00 | 23 || di e5 | 11. 21 | Uni vers i t y | Di pl oma i n 5 categori es | 150.00 | 24 || agc5 | 11. 97 | 65 yo and more | Age i n 5 categori es | 169. 00 | 25 |

| emp3 | 14. 87 | Manager | J ob category | 229. 00 | 26 || vmo1 | 17. 08 | Yes | Do you own some secur i t i es ? | 121. 00 | 27 || sl o2 | 20. 24 | owner | Occupat i on st atus of housi ng i n 4 categori es | 290.00 | 28 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +PRINTOUT ON FACTOR 3

BY ACTIVE CATEGORIES

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 61/176

Factorial Analyses with SPAD

61

+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I DEN. | T. VALUE | CATEGORY LABEL | VARI ABLE LABEL | WEI GHT | NUMBER || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || 49_ | - 16. 60 | mi ssi ng category | J ob category | 125.00 | 1 || f émi | - 12. 80 | gender | Gender | 531.00 | 2 || agc1 | - 11. 84 | Lower t han 25 yo | Age i n 5 categori es | 150. 00 | 3 || sl o2 | - 10. 29 | owner | Occupat i on st atus of housi ng i n 4 categori es | 290.00 | 4 || agg1 | - 10. 05 | Lower t han 2.000 | Urban area si ze ( number of i nhabi t ant s) | 83. 00 | 5 || emp2 | - 8. 50 | Empl oyee | J ob category | 335. 00 | 6 || agc4 | - 6.40 | 50 t o 64 yo | Age i n 5 categori es | 188.00 | 7 || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || M I D D L E A R E A || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |

| agc3 | 6.74 | 35 t o 49 yo | Age i n 5 categori es | 209.00 | 22 || di e5 | 9.75 | Uni vers i t y | Di pl oma i n 5 categori es | 150.00 | 23 || sl o1 | 9. 87 | homeowner | Occupati on st atus of housi ng i n 4 categori es | 120. 00 | 24 || emp1 | 10. 73 | Worker | J ob category | 263. 00 | 25 || agc2 | 12. 62 | 25 t o 34 yo | Age i n 5 categori es | 284.00 | 26 || masc | 12. 80 | mal e | Gender | 469. 00 | 27 || emp3 | 13. 18 | Manager | J ob category | 229. 00 | 28 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 62/176

MCA - Multiple Correspondence Analysis

62

CLUSTERING WITH SPAD

HAC / MIXED : clustering on factors scores

PARTI - DECLA : tree cut and cluster description

CLASS - MINER : clusters characterization

ESCAL : Storing the factorial axes and the partitions

The term cluster analysis encompasses a number of different algorithms and methods for

grouping objects (cases) of similar kind into respective categories. A general questionfacing researchers in many areas of inquiry is how to organize observed data intomeaningful structures, that is, to develop taxonomies.In other words cluster analysis is an exploratory data analysis tool which aims at sortingdifferent objects into groups in a way that the degree of association between two objects ismaximal if they belong to the same group and minimal otherwise.

Note that the above discussions refer to clustering algorithms and do not mentionanything about statistical significance testing. The point here is that, unlike many otherstatistical procedures, cluster analysis methods are mostly used when we do not have any

a priori hypotheses, but are still in the exploratory phase of our research. In a sense, clusteranalysis finds the "most significant solution possible."

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 63/176

Clustering with SPAD

63

RECIP  /  SEMIS  -  CLUSTERING ON FACTORS

SCORES 

 JUSTIFICATION FOR THE USE OF FACTORS SCORES 

The RECIP/SEMIS procedure allows the SPAD user to perform a cluster analysis onfactors scores.

It is equivalent to perform a cluster analysis on a range of p variables than on the p factorsscores extracted from the factorial analysis.Indeed, by transforming the original variables into factors, without reducing the numberof dimensions, despite their extraction ranked in decreasing variance explained, we do not

loose any information. Mathematically, it is simply a rotation of the original space.

However, it is interesting to consider a smaller factorial space with q dimensions (with qlower than p) and perform a cluster analysis on this q first factorial scores. This way, wefocus on the most interesting part of the information (in that sense that the q factorscapture the main part of the overall variability) and we exclude the noisy remaininginformation captured by the last factors.

In general, the data reduction by selecting the first q factors provides better and morerobust clustering.

The factors to retain for the cluster analysis are the ones that engender a smaller spacewhere the point cloud is stable.In general, we retain a little bit more than the half of the factors (for MCA), even if a screeappears after few factors on the eigenvalues graph.

In the parameters tab of this method, you can set the number of factors to retain for thecluster analysis (10 by default).

Working with factorial scores means that whatever the factorial analysis performed, the

cluster analysis will always be processed on quantitative data.The single distance, the Euclidean distance, will be used to measure the resemblancebetween cases, and one aggregating criteria, Ward, will be used to calculate the differencebetween two disjoints groups of cases.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 64/176

RECIP / SEMIS - Clustering on factors scores

64

THE CLUSTERING ALGORITHMS 

The clustering algorithms available in the SPAD software are the HierarchicalAgglomerative Clustering (HAC or AHC, RECIP in SPAD) that provides a partitionshierarchy, and the k-means method that provides a single partition.

A combining use of these two methods (mixed clustering) enables the consolidation of thepartition and increase its stability (SEMIS).

These two methods present the following disadvantage :

•  HAC provides a large number of partitions within one has to be chosen : it is notalways simple to select the most significant cut in the clustering tree. Moreover, theclustering tree is not an optimal tree because the partition produced at a certainlevel depends on the one produced at the previous step.

•  With the k-means method, the number of clusters has to be set by the user before

performing the analysis and the partition produced depends on the initial position ofthe centroïd clusters.

In order to compensate these disadvantages and to try to approach the optimal partition ifit exists, we can combine the use of the HAC and the k-means method : this is the purposeof the mixed clustering method, called SEMIS in SPAD.

A first use combining theses two methods is the following : we perform K-means with alarge number of centroïds and then we build a hierarchical tree by aggregatingsuccessively the large number of clusters provided by the K-means method.

However, this method is relatively unstable on small size samples.It is advised to choose HAC for sample size lower than 10 000 cases. For larger samples,the mixed clustering method reduces a lot time processing and produces stable partitions.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 65/176

Clustering with SPAD

65

In the CORMU parameters, modify the number of retain coordinates : 14.

THE RECIP / SEMIS parameters 

The HAC algorithm (RECIP)

Coordinates used for aggregationWith this parameter, the SPAD User defines the number of factors to retain to performthe cluster analysis. This choice depends on the study of the eigenvalues in theprevious factorial analysis.In our example, we use the 14 first factors.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 66/176

RECIP / SEMIS - Clustering on factors scores

66

The mixed clustering method» (SEMIS)

Starting partitionThree procedures are available .

  The first one consists in searching stable clusters by crossing many partitionsprovided by centroïds randomly selected.The item « Number » defines the number of partitions (2 by default) and « Size »determines the number of centroïds for each partition.

  The others produce a single partition based on N centroïds chosen by the SPADUser or randomly selected.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 67/176

Clustering with SPAD

67

THE HIERARCHY EDITOR 

To access the hierarchy editor, double click on this icon .

THE CLUSTERING TREE

 

This tree is the graphical display of the partitions hierarchy.The interest of it is to suggest graphically the number of clusters that exist in the dataset.We can cut the tree where the gap is the highest.

7%8%7%11% 14% 16%37%7

 7%8%7%48% 30%5

 7%9% 8%7%8%11% 8% 8%14%20%10

 

THE TOOL BAR OF THE HIERARCHY EDITOR 

Display / delete Delete Display the cutslabels node number of the tree

Display Display Vertical orNode number aggregation criteria horizontal tree

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 68/176

RECIP / SEMIS - Clustering on factors scores

68

CURVE OF THE LEVEL INDEXES 

« Edit » - « Curve of the level indexes »

The level index is the gain of inter-cluster inertia obtained by subdividing one node intotwo nodes.The larger bar corresponds the cut of the tree into two clusters.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 69/176

Clustering with SPAD

69

PARTI - DECLA - 

CUT OF THE TREE AND CLUSTERS DESCRIPTION 

The PARTI procedure constructs partitions by pruning an aggregation tree. The procedurecreates the partitions requested by the user or by an automatic search for the bestpartitions, by possibly improving them by iteration on mobile centers (consolidation). Thepartitions created in this way will then be characterized automatically.

The DECLA procedure lets you describe the partitions determined by the PARTIprocedure.

We can define either each cluster of a partition, or globally the partition itself. All theelements available (actives and illustrative) may participate in the characterization:categories of categorical variables, categorical variables themselves, continuous variables,the frequencies and the factorial axes.

THE PARAMETERS OF THE PARTI-DECLA METHOD 

THE « CHOICE OF PARTITIONS » TAB 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 70/176

PARTI - DECLA -

Cut of the tree and clusters description

70

THE « PARTITIONING PARAMETERS » TAB 

THE « PARTITIONS CHARACTERIZATION » TAB 

See the DEMOD method.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 71/176

Clustering with SPAD

71

THE PARTI-DECLA RESULTS 

BUILDING UP PARTITIONS

DETERM I N I NG THE BEST PARTI T I ONS

RESEARCH OF IRREGULAR I T I ES

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| I RREGULARI TY | I RREGULARI TY | || BETWEEN | VALUE | |

+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| 1993- - 1994| - 39. 99 | ************** **************** **************** ****** || 1990- - 1991| - 19. 79 | ************** ************ || 1995- - 1996| - 17. 70 | ************** ********** |+- - - - - - - - - - - - - - +- - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +L I S T OF THE BEST 3 PART I T I ON BETW EEN 3 AND 1 0 C LUSTERS

1 - PARTI TI ON I N 7 CLUSTERS2 - PARTI TI ON I N 10 CLUSTERS3 - PARTI TI ON I N 5 CLUSTERS

CUT " b " O F THE TREE I NTO 7 CLUSTERS

CLUSTERS FORMAT I ON ( ON ACT I VE CASES )

SUMMARY DESCRI P T I ON

+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +| CLUSTER | COUNT | WEI GHT | CONTENT |+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +

| bb1b | 106 | 106. 00 | 1 TO 5 || bb2b | 375 | 375. 00 | 6 TO 23 || bb3b | 70 | 70. 00 | 24 TO 27 || bb4b | 79 | 79. 00 | 28 TO 32 || bb5b | 67 | 67. 00 | 33 TO 36 || bb6b | 141 | 141. 00 | 37 TO 42 || bb7b | 162 | 162. 00 | 43 TO 50 |+- - - - - - - - - +- - - - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - +LOAD I NGS AND TEST - VA LUES BEFORE CONSOL I D AT I ON

AXES 1 A 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| CLUSTERS | TEST-VALUES | LOADI NGS | || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - || I DEN - LABEL COUNT ABS. WT. | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| CUT "b" OF THE TREE I NTO 7 CLUSTERS || || bb1b - Cl uster 1 / 7 106 106.00 | 3.7 - 8.7 - 0.3 2.3 6.9 | 0.18 - 0.39 - 0.01 0.09 0.27 | 0.83 |

| bb2b - Cl uster 2 / 7 375 375.00 | - 15. 2 - 7.2 2.3 - 6.6 4.2 | - 0.32 - 0.14 0.04 - 0.12 0.07 | 0.22 || bb3b - Cl uster 3 / 7 70 70. 00 | - 10. 6 7.0 - 9.4 6.8 0.6 | -0. 63 0.39 - 0.49 0.35 0.03 | 1.75 || bb4b - Cl uster 4 / 7 79 79. 00 | -6. 3 2.4 3.6 8.2 - 4.9 | -0. 35 0. 13 0. 18 0. 39 - 0.23 | 1.57 || bb5b - Cl uster 5 / 7 67 67. 00 | 2. 8 - 2.1 - 4. 3 - 2. 8 - 1.7 | 0. 17 - 0. 12 - 0. 23 - 0. 15 - 0. 09 | 1. 98 || bb6b - Cl uster 6 / 7 141 141.00 | 12. 2 - 1.6 - 1.9 2.9 - 11. 2 | 0.49 - 0.06 - 0.07 0.10 - 0.38 | 0.75 || bb7b - Cl uster 7 / 7 162 162.00 | 15. 4 13. 0 5.8 - 4.8 3.5 | 0.58 0.46 0.19 - 0.15 0.11 | 0.73 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

CLUSTER I NG CONSOL I D AT I ON

AROUND CENTERS OF THE 7 CLUSTERS ACHI EVED BY 10 I TERATI ONS WI TH MOVI NG CENTERSBETWEEN- CLUSTERS I NERTI A I NCREASE+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +| I TERATI ON | TOTAL I NERTI A | I NTER- CLUSTERS| RATI O || | | I NERTI A | |+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +| 0 | 2. 35008 | 0. 77272 | 0. 32881 || 1 | 2. 35008 | 0. 82435 | 0. 35078 || 2 | 2. 35008 | 0. 82613 | 0. 35153 || 3 | 2. 35008 | 0. 82630 | 0. 35160 |

| 4 | 2. 35008 | 0. 82630 | 0. 35160 |+- - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +STOP AFTER I TERATI ON 4. RELATI VE I NCREASE OF BETWEEN- CLUSTER I NERTI AWI TH RESPECT TO THE PREVI OUS I TERATI ON I S ONLY 0. 000 %.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 72/176

PARTI - DECLA -

Cut of the tree and clusters description

72

I N E RT I A D ECOMPOSI T I O N

COMPUTED ON 14 AXES.+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +| | I NERTI AS | COUNTS | WEI GHTS | DI STANCES || I NERTI AS | BEFORE AFTER | BEFORE AFTER | BEFORE AFTER | BEFORE AFTER |+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +| | | | | || BETWEEN CLUSTERS | 0. 7727 0. 8263 | | | || | | | | |

| WI THI N CLUSTER | | | | || | | | | || CLUSTER 1 / 7 | 0. 1299 0. 1731 | 106 128 | 106. 00 128. 00 | 0. 8283 0. 8028 || CLUSTER 2 / 7 | 0. 6116 0. 5710 | 375 358 | 375. 00 358. 00 | 0. 2191 0. 2551 || CLUSTER 3 / 7 | 0. 0930 0. 0945 | 70 72 | 70. 00 72. 00 | 1. 7521 1. 7687 || CLUSTER 4 / 7 | 0. 1233 0. 1336 | 79 82 | 79. 00 82. 00 | 1. 5661 1. 5452 || CLUSTER 5 / 7 | 0. 1293 0. 1293 | 67 67 | 67. 00 67. 00 | 1. 9831 1. 9831 || CLUSTER 6 / 7 | 0. 2054 0. 2180 | 141 149 | 141. 00 149. 00 | 0. 7483 0. 7707 || CLUSTER 7 / 7 | 0. 2849 0. 2043 | 162 144 | 162. 00 144. 00 | 0. 7286 0. 9060 || | | | | || TOTAL I NERTI A | 2. 3501 2. 3501 | | | |+- - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - +RATI O I NTER I NERTI A / TOTAL I NERTI A) : BEFORE . . 0. 3288

AFTER . . 0. 3516LOAD I NGS AND TEST- VA LUES AFTER CONSOL I DAT I ON

AXES 1 A 5+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| CLUSTERS | TEST-VALUES | LOADI NGS | || - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - || I DEN - LABEL COUNT ABS. WT. | 1 2 3 4 5 | 1 2 3 4 5 | DI STO. |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +| CUT "b" OF THE TREE I NTO 7 CLUSTERS || || bb1b - Cl uster 1 / 7 128 128.00 | 3.8 - 8.6 - 1.8 4.1 8.0 | 0.16 - 0.35 - 0.07 0.15 0.28 | 0.80 || bb2b - Cl uster 2 / 7 358 358.00 | - 16. 0 - 6.4 1.9 - 8.1 4.0 | - 0.35 - 0.13 0.04 - 0.15 0.07 | 0.26 || bb3b - Cl uster 3 / 7 72 72. 00 | - 10. 7 8.3 - 9.3 6.5 - 0.1 | -0. 63 0.46 - 0.48 0.32 0.00 | 1.77 || bb4b - Cl uster 4 / 7 82 82. 00 | -5. 8 2.7 2.9 7.9 - 5.6 | -0. 32 0. 14 0. 14 0. 37 - 0.25 | 1.55 || bb5b - Cl uster 5 / 7 67 67. 00 | 2. 8 - 2.1 - 4. 3 - 2. 8 - 1.7 | 0. 17 - 0. 12 - 0. 23 - 0. 15 - 0. 09 | 1. 98 || bb6b - Cl uster 6 / 7 149 149.00 | 13. 3 - 1.2 - 2.6 2.4 - 11. 6 | 0.52 - 0.04 - 0.09 0.08 - 0.38 | 0.77 || bb7b - Cl uster 7 / 7 144 144.00 | 15. 1 11. 5 9.4 - 4.1 4.3 | 0.61 0.43 0.33 - 0.14 0.14 | 0.91 |+- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +- - - - - - - - - - +

CLUSTERS REPRESENTA T I VES

CLUST ER 1 / 7

COUNT: 128- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |

+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 51034| 0980 | | 2| 0. 56936| 0091 | | 3| 0. 58376| 0485 || 4| 0. 58376| 0619 | | 5| 0. 62658| 0368 | | 6| 0. 62658| 0897 || 7| 0. 63989| 0704 | | 8| 0. 66465| 0184 | | 9| 0. 66465| 0232 || 10| 0. 66465| 0238 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 2 / 7

COUNT: 358- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 66989| 0459 | | 2| 0. 80053| 0043 | | 3| 0. 80753| 0322 || 4| 0. 86366| 0393 | | 5| 0. 86366| 0450 | | 6| 0. 86366| 0780 || 7| 0. 86366| 0540 | | 8| 0. 86366| 0460 | | 9| 0. 90535| 0082 || 10| 0. 91404| 0593 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 3 / 7

COUNT: 72- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 58799| 0741 | | 2| 0. 60470| 0940 | | 3| 0. 61735| 0639 || 4| 0. 61735| 0788 | | 5| 0. 69764| 0789 | | 6| 0. 70722| 0758 || 7| 0. 78494| 0766 | | 8| 0. 78494| 0806 | | 9| 0. 82442| 0742 || 10| 0. 82442| 0946 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 4 / 7

COUNT: 82- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 74814| 0156 | | 2| 0. 98976| 0575 | | 3| 1. 01170| 0730 || 4| 1. 07622| 0569 | | 5| 1. 12107| 0721 | | 6| 1. 12879| 0148 |

| 7| 1. 12879| 0660 | | 8| 1. 12879| 0715 | | 9| 1. 14287| 0566 || 10| 1. 14460| 0360 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 5 / 7

COUNT: 67

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 73/176

Clustering with SPAD

73

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 97554| 0358 | | 2| 1. 10787| 0130 | | 3| 1. 12353| 0328 || 4| 1. 27382| 0288 | | 5| 1. 27888| 0825 | | 6| 1. 29654| 0165 || 7| 1. 30224| 0828 | | 8| 1. 30330| 0302 | | 9| 1. 30330| 0326 || 10| 1. 34956| 0208 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 6 / 7

COUNT: 149- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 52061| 0062 | | 2| 0. 52061| 0240 | | 3| 0. 55153| 0419 || 4| 0. 55153| 0611 | | 5| 0. 66158| 0991 | | 6| 0. 70375| 0286 || 7| 0. 70767| 0251 | | 8| 0. 75757| 0497 | | 9| 0. 77031| 0377 || 10| 0. 78869| 0242 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +CLUST ER 7 / 7

COUNT: 144- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. | | RK | DI STANCE | I DENT. |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +| 1| 0. 54714| 0141 | | 2| 0. 58623| 0007 | | 3| 0. 60549| 0243 || 4| 0. 63791| 0200 | | 5| 0. 64338| 0025 | | 6| 0. 72304| 0172 || 7| 0. 72691| 0004 | | 8| 0. 74024| 0006 | | 9| 0. 74024| 0352 |

| 10| 0. 74024| 0343 | | | | | | | | |+- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - ++- - - +- - - - - - - - - - - +- - - - - - - - +D I STANCE 'S MATR I X BETWEEN CLUSTERS

| 1 2 3 4 5 6 7- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

1 | 0. 0002 | 1. 134 0. 0003 | 1. 701 1. 443 0. 0004 | 1. 628 1. 402 1. 856 0. 0005 | 1. 752 1. 608 1. 990 1. 984 0. 0006 | 1. 327 1. 183 1. 746 1. 637 1. 703 0. 0007 | 1. 383 1. 247 1. 820 1. 702 1. 770 1. 283 0. 000

- - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| 1 2 3 4 5 6 7

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 74/176

PARTI - DECLA -

Cut of the tree and clusters description

74

DESCRIPTION OF: CUT "b" OF THE TREE INTO 7 CLUSTER

The characterizing elements are classified by order of importance with the help of astatistical criterion (test-value) with which is associated a probability : the larger the test-value is, the lower the probability, the better the element is defined.

In the case of the description of the classes by the categories of the categorical variables, anoption allows to sort the characterizing categories by decreasing test-values, or bypercentages.

CLUSTERS CHARACTERISATION BY ACTIVE CATEGORIES

CHARACTERISATION BY CATEGORIES OF CLUSTERS OR CATEGORIES

OF CUT "b" OF THE TREE INTO 7 CLUSTERS

Cluster 1 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

12. 80 Cl ust er 1 / 7 12824. 52 0. 000 81. 01 100. 00 15. 80 BEPC- BE-BEPS Di pl oma i n 5 cat egor i es 1584. 73 0. 000 17. 59 71. 88 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 523

3. 10 0. 001 18. 31 40. 63 28. 40 25 t o 34 yo Age i n 5 cat egor i es 2843. 08 0. 001 17. 61 46. 09 33. 50 Empl oyee J ob cat egor y 3352. 85 0. 002 20. 67 24. 22 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150

- 2. 04 0. 021 8. 73 15. 63 22. 90 Manager J ob cat egor y 229- 2. 27 0. 012 8. 97 20. 31 29. 00 owner Occupat i on st atus of housi ng i n 4 categori es 290- 2. 33 0. 010 2. 08 0. 78 4. 80 Ot her J ob category 48- 2. 72 0. 003 3. 61 2. 34 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 83- 3. 01 0. 001 5. 92 7. 81 16. 90 65 yo and more Age i n 5 cat egor i es 169- 3. 28 0. 001 6. 22 10. 16 20. 90 35 t o 49 yo Age i n 5 categori es 209- 3. 81 0. 000 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 4. 49 0. 000 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 87- 6. 27 0. 000 0. 00 0. 00 15. 00 Uni versi t y Di pl oma i n 5 categori es 150- 7. 06 0. 000 0. 00 0. 00 18. 20 Bac - Br evet sup. Di pl oma i n 5 categori es 182- 7. 22 0. 000 0. 00 0. 00 18. 90 No one Di pl oma i n 5 cat egor i es 189

- 10. 07 0. 000 0. 00 0. 00 32. 10 CEP Di pl oma i n 5 categori es 321- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Cluster 2 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

35. 80 Cl ust er 2 / 7 35814. 73 0. 000 68. 54 61. 45 32. 10 CEP Di pl oma i n 5 cat egor i es 32112. 34 0. 000 67. 68 49. 72 26. 30 Worker J ob cat egor y 26311. 58 0. 000 73. 02 38. 55 18. 90 No one Di pl oma i n 5 cat egor i es 1896. 09 0. 000 49. 24 45. 25 32. 90 greater t han 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 3295. 94 0. 000 56. 00 27. 37 17. 50 20. 000 - 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 1755. 32 0. 000 38. 68 94. 97 87. 90 No Do you own some secur i t i es ? 8794. 33 0. 000 50. 89 24. 02 16. 90 65 yo and more Age i n 5 cat egor i es 1694. 14 0. 000 41. 87 61. 17 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 5232. 54 0. 005 44. 15 23. 18 18. 80 50 t o 64 yo Age i n 5 cat egor i es 188

- 2. 58 0. 005 30. 06 27. 37 32. 60 Par i s Ur ban area si ze ( number of i nhabi t ant s) 326- 3. 64 0. 000 22. 67 9. 50 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150- 3. 88 0. 000 26. 41 20. 95 28. 40 25 t o 34 yo Age i n 5 categori es 284- 3. 91 0. 000 10. 42 1. 40 4. 80 Ot her J ob category 48- 4. 20 0. 000 19. 20 6. 70 12. 50 mi ssi ng category J ob category 125- 5. 32 0. 000 14. 88 5. 03 12. 10 Yes Do you own some secur i t i es ? 121

- 7. 50 0. 000 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 8. 47 0. 000 0. 00 0. 00 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 83- 8. 69 0. 000 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 87

- 11. 07 0. 000 7. 42 4. 75 22. 90 Manager J ob cat egor y 229- 11. 86 0. 000 0. 00 0. 00 15. 00 Uni versi t y Di pl oma i n 5 categori es 150- 12. 22 0. 000 0. 00 0. 00 15. 80 BEPC- BE-BEPS Di pl oma i n 5 cat egor i es 158- 13. 28 0. 000 0. 00 0. 00 18. 20 Bac - Br evet sup. Di pl oma i n 5 categori es 182- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 75/176

Clustering with SPAD

75

Cluster 3 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

7. 20 Cl uster 3 / 7 7221. 01 0. 000 86. 75 100. 00 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 839. 61 0. 000 20. 34 81. 94 29. 00 owner Occupat i on st atus of housi ng i n 4 categori es 2908. 27 0. 000 50. 00 33. 33 4. 80 Ot her J ob category 484. 40 0. 000 12. 77 56. 94 32. 10 CEP Di pl oma i n 5 cat egor i es 3212. 94 0. 002 12. 77 33. 33 18. 80 50 t o 64 yo Age i n 5 cat egor i es 188

2. 13 0. 017 7. 85 95. 83 87. 90 No Do you own some secur i t i es ? 879- 2. 13 0. 017 2. 48 4. 17 12. 10 Yes Do you own some secur i t i es ? 121- 2. 22 0. 013 2. 40 4. 17 12. 50 mi ssi ng category J ob category 125- 2. 55 0. 005 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 2. 81 0. 003 3. 06 9. 72 22. 90 Manager J ob cat egor y 229- 3. 02 0. 001 2. 20 5. 56 18. 20 Bac - Br evet sup. Di pl oma i n 5 categori es 182- 3. 08 0. 001 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 87- 3. 10 0. 001 3. 04 11. 11 26. 30 Worker J ob category 263- 3. 26 0. 001 1. 33 2. 78 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150- 3. 79 0. 000 0. 67 1. 39 15. 00 Uni versi t y Di pl oma i n 5 categori es 150- 4. 89 0. 000 0. 00 0. 00 17. 50 20. 000 - 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 175- 7. 33 0. 000 0. 00 0. 00 32. 60 Par i s Ur ban area si ze ( number of i nhabi t ant s) 326- 7. 38 0. 000 0. 00 0. 00 32. 90 greater t han 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 329- 7. 81 0. 000 1. 34 9. 72 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 523

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Cluster 4 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

8. 20 Cl uster 4 / 7 8222. 73 0. 000 94. 25 100. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 873. 15 0. 001 16. 67 24. 39 12. 00 homeowner Occupat i on st at us of housi ng i n 4 cat egor i es 1202. 17 0. 015 11. 38 40. 24 29. 00 owner Occupat i on st atus of housi ng i n 4 categori es 2902. 03 0. 021 11. 96 30. 49 20. 90 35 t o 49 yo Age i n 5 cat egor i es 209

- 1. 98 0. 024 4. 00 7. 32 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 150- 2. 80 0. 003 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 3. 10 0. 001 5. 54 35. 37 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 523- 3. 25 0. 001 0. 00 0. 00 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 83- 5. 29 0. 000 0. 00 0. 00 17. 50 20. 000 - 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 175- 7. 89 0. 000 0. 00 0. 00 32. 60 Par i s Ur ban area si ze ( number of i nhabi t ant s) 326- 7. 94 0. 000 0. 00 0. 00 32. 90 greater t han 100. 000 Ur ban area si ze ( number of i nhabi t ant s) 329

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Cluster 5 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -6. 70 Cl uster 5 / 7 6721. 82 0. 000 100. 00 100. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 3. 65 0. 000 0. 00 0. 00 12. 00 homeowner Occupat i on st atus of housi ng i n 4 categori es 120- 6. 51 0. 000 0. 00 0. 00 29. 00 owner Occupat i on st atus of housi ng i n 4 categori es 290- 9. 91 0. 000 0. 00 0. 00 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 523

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Cluster 6 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

14. 90 Cl ust er 6 / 7 14925. 06 0. 000 80. 77 98. 66 18. 20 Bac - Br evet sup. Di pl oma i n 5 categori es 1825. 87 0. 000 27. 95 42. 95 22. 90 Manager J ob cat egor y 2294. 67 0. 000 30. 40 25. 50 12. 50 mi ssi ng category J ob category 1254. 23 0. 000 27. 33 27. 52 15. 00 Lower t han 25 yo Age i n 5 cat egor i es 1503. 52 0. 000 20. 86 45. 64 32. 60 Par i s Ur ban area si ze ( number of i nhabi t ant s) 3262. 27 0. 012 22. 50 18. 12 12. 00 homeowner Occupat i on st at us of housi ng i n 4 cat egor i es 120

2. 17 0. 015 19. 01 36. 24 28. 40 25 t o 34 yo Age i n 5 cat egor i es 284- 2. 57 0. 005 10. 75 24. 16 33. 50 Empl oyee J ob cat egor y 335- 3. 00 0. 001 7. 98 10. 07 18. 80 50 t o 64 yo Age i n 5 categori es 188- 3. 49 0. 000 6. 51 7. 38 16. 90 65 yo and more Age i n 5 cat egor i es 169- 4. 20 0. 000 1. 20 0. 67 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 83- 4. 21 0. 000 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 4. 95 0. 000 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 87- 6. 87 0. 000 0. 00 0. 00 15. 00 Uni versi t y Di pl oma i n 5 categori es 150- 7. 09 0. 000 0. 00 0. 00 15. 80 BEPC- BE-BEPS Di pl oma i n 5 cat egor i es 158- 7. 41 0. 000 0. 53 0. 67 18. 90 No one Di pl oma i n 5 cat egor i es 189- 7. 85 0. 000 1. 90 3. 36 26. 30 Worker J ob category 263

- 10. 57 0. 000 0. 31 0. 67 32. 10 CEP Di pl oma i n 5 categori es 321- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 76/176

PARTI - DECLA -

Cut of the tree and clusters description

76

Cluster 7 / 7

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - T. VALUE PROB. - - - - PERCENTAGES - - - - CHARACTERI STI C WEI GHT

GRP/ CAT CAT/ GRP GLOBAL CATEGORI ES OF VARI ABLES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

14. 40 Cl ust er 7 / 7 14424. 37 0. 000 88. 67 92. 36 15. 00 Uni versi t y Di pl oma i n 5 categori es 15011. 52 0. 000 40. 17 63. 89 22. 90 Manager J ob cat egor y 2297. 36 0. 000 26. 69 60. 42 32. 60 Par i s Ur ban area si ze ( number of i nhabi t ant s) 3265. 76 0. 000 33. 88 28. 47 12. 10 Yes Do you own some secur i t i es ? 121

2. 21 0. 014 16. 83 61. 11 52. 30 t enant Occupat i on st atus of housi ng i n 4 categori es 523- 2. 80 0. 003 7. 98 10. 42 18. 80 50 t o 64 yo Age i n 5 categori es 188- 4. 12 0. 000 0. 00 0. 00 6. 70 f r ee housi ng, other Occupat i on st atus of housi ng i n 4 categori es 67- 4. 70 0. 000 0. 00 0. 00 8. 30 Lower t han 2. 000 Ur ban area si ze ( number of i nhabi t ant s) 83- 4. 84 0. 000 0. 00 0. 00 8. 70 2. 000 - 20. 000 Ur ban area si ze ( number of i nhabi t ant s) 87- 5. 40 0. 000 6. 27 14. 58 33. 50 Empl oyee J ob cat egor y 335- 5. 76 0. 000 11. 72 71. 53 87. 90 No Do you own some secur i t i es ? 879- 6. 95 0. 000 0. 00 0. 00 15. 80 BEPC- BE-BEPS Di pl oma i n 5 cat egor i es 158- 7. 25 0. 000 0. 53 0. 69 18. 90 No one Di pl oma i n 5 cat egor i es 189- 7. 57 0. 000 0. 00 0. 00 18. 20 Bac - Br evet sup. Di pl oma i n 5 categori es 182- 7. 65 0. 000 3. 12 6. 94 32. 10 CEP Di pl oma i n 5 categori es 321- 8. 65 0. 000 0. 76 1. 39 26. 30 Worker J ob category 263

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

THE GRAPH EDITOR 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 77/176

Clustering with SPAD

77

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 78/176

CLASS - MINER - Clusters description

78

CLASS - MINER  -  CLUSTERS DESCRIPTION 

This procedure lets you describe the partitions created by the PARTI procedure with thevariables that did not participate in the analysis.

We can thus select variables by themes and evaluate their characterizing power on thepartitions constructed (typologies). The parameter settings and the outputs are identical tothose of the DECLA procedure of the PARTI-DECLA icon.

Characteristic elements are classified by order of importance with the help of a statisticalcriterion (test-value) with which is associated a probability: the higher the level of the test-value, and the weaker the probability, the more strongly the element is characterized.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 79/176

Clustering with SPAD

79

ESCAL  -  STORING THE FACTORIAL AXES AND THE

PARTITIONS 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 80/176

Regression and Analysis of Variabce,

General Linear Model

80

THE LINEAR MODEL AND ITS APPLICATIONS 

REGRESSION AND ANALYSIS OF VARIABCE , GENERAL LINEAR MODEL 

OBJECT 

The general purpose of this procedure, called VAREG, is to learn more about therelationship between several independent or predictor variables and a dependentcontinuous variable.

VAREG allows you to perform least squares adjustement models with a constant term. Itcan be used for many different analyses including:

•  Simple regression

•  Multiple regression

•  Analysis of variance

•  Analysis of covariance

VAREG enables you to specify interactions (crossed effects) up to the 3rd order. Eachregression coefficient is associated with the null test, which is valid in the classical contextwhere the random term is assumed to be generated by a Laplace-Gauss law.The REPEATED statement enables you to specify effects in the model that representrepeated measurements on the same experimental unit for the same response.The VAREG procedure generates automatically a rule file that allows you to create a new

data set (with the Deployment – Archiving\Archiving\Predcitions method) containingthe input dataset in addition to predicted values and residuals.The treatment of missing data is handled by the parameters.

OUTPUTS 

Summary statistics on the variables of the model are output: Marginal distributions of thecategorical variables; mean, standard deviation, minimum and maximum of thecontinuous variables. The method supplies the identification of the coefficients of the

model: coefficient of the continuous (endogenous) variables, the categories of the factorsand of the eventual interactions. Subsequently it is possible to output the matrix of thevariances and covariance, or the correlations matrix.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 81/176

The Linear Model and its applications

81

The procedure prints the coefficients, the estimation of their standard deviation, thecorresponding Student’s statistic, as well as the critical probability and the associated testvalue. Also shown are the sum of the squares of the deviations, the multiple correlationcoefficient, and the estimate of the common factor variance of the residuals. Finally, thetest of simultaneous nullity of all the coefficients (test of an endogenous "y" constant) isprovided.

In the case of an analysis of variance, you also get the sum of the squared deviationsaccording to their source (residual, criteria or interaction) as well as Fisher’s statistics, thecritical probabilities and the associated test values. In the case of repeated observations,the repeatability variance is displayed, as well as the estimates obtained including thevariance.

DEFINE A MODEL

 The interface allows you to specify the definition of one or more models. The functions ofthe CTRL, SHIFT keys are standard.

1.  In the Selection list choose the TYPE of the variable(s) you want to define

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 82/176

Regression and Analysis of Variabce,

General Linear Model

82

2.  Then in the Variables Available list, select one or more variables of the TYPE, andconfirm your choices with the transfer button. A double click on a variableconfirms the choice.

To delete a variable, or an interaction, of the model under construction, select it in the listof models and confirm with the transfer button .

3.  Save a modelOnce that you have specified at least one endogenous variable and one exogenousvariable, click on the "Validate" Button to add the model to the Model list.

Delete a modelSelect the model from the list and click on "Delete" button.

Change a modelSelect the model in the list and click on "Modify" button.

PARAMETERS 

The VAREG parameters allow you to handle missng data and to specify wethermeasurments are repeated or not. With the printout parameters, you can specify thedesired ouputs.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 83/176

The Linear Model and its applications

83

Missing data handling for continuous variables (LSUPR)Possible values : Deleted case / Mean imputation

•  If LSUPR = Deleted case, the cases presenting the missing data for one of thevariables of the model (endogenous or exogenous) will be eliminated from theanalysis.

•  If LSUPR = Mean imputation, the missing exogenous data will be replaced by thecorresponding variable.

Warning : if LSUPR = Mean imputation, the endogenous variable must not have anymissing data.

Missing data handling – categorical variables (LZERO)Possible values: Re-coded / Deleted case

•  If LZERO = Re-coded, the missing values will treated as a normal state.

•  If LZERO = Deleted case, the cases with missing data will be eliminated.

Treatment with repetitions (LREP)

Possible values : No (there are no repetitions) / Repetitions in sequence / Repetitionsin disorder

This parameter concerns the treatment of experiment plans.When there are repetitions, the variance of the observations may be estimated on therepetitions of observations, rather than on the whole of the observations. It is notnecessary that the number of repetitions is the same everywhere.

•  Choose LREP = Repetitions in sequence if the repetitions are one under the othersin the data table lines.

•  Choose LREP = Repetitions in confusion if the repetitions are unordered

Output Parameters

Summary statistics on the variables in the model (LSTAT)Possible values : Yes / No

If LSTAT = Yes, one obtains marginal distributions for the categorical variables of themodel, as well as the various statistics concerning the continuous variables : mean,standard deviation, minimum and maximum.

Printout of the covariances matrix (LMAT)Possible values :

•  No (No output)

•  Variances, covariance (Output the variance covariance matrix)

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 84/176

Regression and Analysis of Variabce,

General Linear Model

84

•  Correlations (Output the correlations matrix)

File for Excel applicationPossible values: Yes / NoIf LEXCE = YES, you will have available on output an ASCII delimited file, which can be

directly imported into Excel application

Variables labels (LABEL)Possible values : short / long

•  If LABEL = short, we use 4 characters for categorical variable label and 20 forcontinuous variable label.

•  If LABEL = long, we use 60 characters for categorical variable and continuousvariable label.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 85/176

The Linear Model and its applications

85

OPTIMAL REGRESSIONS RESEARCH 

General principles

This procedure selects the N ‘’best’’ adjustments for a regression. The selection criterioncan be the R2, the adjusted R2 or Mallow's Cp.

Let N be the number of the best adjustments requested, and P be the number of explicative(exogenous) variables for the model. The procedure shows the N best adjustments for allsizes of the models, from 1 to P-1 variables (the adjustment with the P variables is unique).

The procedure supplies the criterion value (R2, adjusted R2 or the Cp), Fisher’s Fassociated with R2, the critical probability associated with this F, and the corresponding

test value.

The list of the variables of the model is then shown with the estimated coefficients, thenullity tests, the critical probability and the associated test value. Finally, a diagramrepresenting the evolution of the criterion as a function of the number of variables in themodels shows a quick summary of the selections.

For the R2 criterion, all the printed selections are optimal. For the other two criteria, theselections are not always optimal (the adjusted R2 and Mallow’s Cp vary in a non-monotone way as a function of the number of variables). A non-optimal selection is

identified if the procedure does not show the coefficients of the variables (only the namesof the variables and the value of the criterion are shown). In this case the selectedadjustment, if it is not optimal for the criterion, is, nonetheless better than the adjustmentsthat were not calculated.

Reference:Selection algorithm is a transcription of the algorithm "leaps and bounds" from Furnival& Wilson. (Technometrics, 174, Vol.16, pp.499-511).

DataThis dataset corresponds to the perception that has 100 companies of their furnishers.Criteria are the following:

  Delivery time  Price index  Price flexibility  Perceived quality  Service quality  Commercial image  Product quality  Satisfaction

The main goal is to find the best model explaining Satisfaction by a subset of the otheritems. 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 86/176

Optimal Regressions Research

86

IdCompany

Size

Delivery

delay

Price

Index

Price

Flexibility

Perceived

Quality

Service

Quality

Commercial

Image

Product

Quality

1 < 50 employees 4,1 0,6 6,9 4,7 2,4 2,3 5,2

2 = 50 employee 1,8 3 6,3 6,6 2,5 4 8,4

3 = 50 employee 3,4 5,2 5,7 6 4,3 2,7 8,2

4 = 50 employee 2,7 1 7,1 5,9 1,8 2,3 7,8

5 < 50 employees 6 0,9 9,6 7,8 3,4 4,6 4,56 = 50 employee 1,9 3,3 7,9 4,8 2,6 1,9 9,7

7 < 50 employees 4,6 2,4 9,5 6,6 3,5 4,5 7,6

8 = 50 employee 1,3 4,2 6,2 5,1 2,8 2,2 6,9

9 < 50 employees 5,5 1,6 9,4 4,7 3,5 3 7,6

10 = 50 employee 4 3,5 6,5 6 3,7 3,2 8,7

11 < 50 employees 2,4 1,6 8,8 4,8 2 2,8 5,8

12 < 50 employees 3,9 2,2 9,1 4,6 3 2,5 8,3

13 = 50 employee 2,8 1,4 8,1 3,8 2,1 1,4 6,6

14 < 50 employees 3,7 1,5 8,6 5,7 2,7 3,7 6,7

15 < 50 employees 4,7 1,3 9,9 6,7 3 2,6 6,8

16 < 50 employees 3,4 2 9,7 4,7 2,7 1,7 4,8

17 < 50 employees 3,2 4,1 5,7 5,1 3,6 2,9 6,2

18 < 50 employees 4,9 1,8 7,7 4,3 3,4 1,5 5,9

19 < 50 employees 5,3 1,4 9,7 6,1 3,3 3,9 6,8

20 < 50 employees 4,7 1,3 9,9 6,7 3 2,6 6,8

21 < 50 employees 3,3 0,9 8,6 4 2,1 1,8 6,3

22 < 50 employees 3,4 0,4 8,3 2,5 1,2 1,7 5,2

23 < 50 employees 3 4 9,1 7,1 3,5 3,4 8,4

24 = 50 employee 2,4 1,5 6,7 4,8 1,9 2,5 7,2

25 < 50 employees 5,1 1,4 8,7 4,8 3,3 2,6 3,8

26 < 50 employees 4,6 2,1 7,9 5,8 3,4 2,8 4,7

27 = 50 employee 2,4 1,5 6,6 4,8 1,9 2,5 7,2

28 < 50 employees 5,2 1,3 9,7 6,1 3,2 3,9 6,7

29 < 50 employees 3,5 2,8 9,9 3,5 3,1 1,7 5,4

30 = 50 employee 4,1 3,7 5,9 5,5 3,9 3 8,431 = 50 employee 3 3,2 6 5,3 3,1 3 8

32 < 50 employees 2,8 3,8 8,9 6,9 3,3 3,2 8,2

33 < 50 employees 5,2 2 9,3 5,9 3,7 2,4 4,6

34 = 50 employee 3,4 3,7 6,4 5,7 3,5 3,4 8,4

35 = 50 employee 2,4 1 7,7 3,4 1,7 1,1 6,2

36 = 50 employee 1,8 3,3 7,5 4,5 2,5 2,4 7,6

37 = 50 employee 3,6 4 5,8 5,8 3,7 2,5 9,3

38 < 50 employees 4 0,9 9,1 5,4 2,4 2,6 7,3

39 = 50 employee 0 2,1 6,9 5,4 1,1 2,6 8,9

40 = 50 employee 2,4 2 6,4 4,5 2,1 2,2 8,8

41 = 50 employee 1,9 3,4 7,6 4,6 2,6 2,5 7,7

42 < 50 employees 5,9 0,9 9,6 7,8 3,4 4,6 4,5

43 < 50 employees 4,9 2,3 9,3 4,5 3,6 1,3 6,244 < 50 employees 5 1,3 8,6 4,7 3,1 2,5 3,7

45 = 50 employee 2 2,6 6,5 3,7 2,4 1,7 8,5

46 < 50 employees 5 2,5 9,4 4,6 3,7 1,4 6,3

47 < 50 employees 3,1 1,9 10 4,5 2,6 3,2 3,8

48 = 50 employee 3,4 3,9 5,6 5,6 3,6 2,3 9,1

49 < 50 employees 5,8 0,2 8,8 4,5 3 2,4 6,7

50 < 50 employees 5,4 2,1 8 3 3,8 1,4 5,2

51 < 50 employees 3,7 0,7 8,2 6 2,1 2,5 5,2

52 = 50 employee 2,6 4,8 8,2 5 3,6 2,5 9

53 = 50 employee 4,5 4,1 6,3 5,9 4,3 3,4 8,8

54 = 50 employee 2,8 2,4 6,7 4,9 2,5 2,6 9,2

55 < 50 employees 3,8 0,8 8,7 2,9 1,6 2,1 5,6

56 < 50 employees 2,9 2,6 7,7 7 2,8 3,6 7,7

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 87/176

The Linear Model and its applications

87

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 88/176

Optimal Regressions Research

88

Fuwil – 3 - Excel sheet output

Missing data handling for exogenous variables

Missing values are replaced by general means

Variable label MeanNumber of missing

valuesDelivery Time 3,515 0

Price Index 2,364 0

Price Flexibility 7,894 0

Perceived Quality 5,248 0

Service Quality 2,916 0

Commercial Image 2,665 0

Product Quality 6,971 0

Usage Index 46,100 0

R² criteriaCurve of R² according to the number of variables

The following graph displays the evolution of the R² criteria according to the number ofvariables entered in the model. Higher is this criteria, better is the model.But as this criterion increases automatically by entering new variables in the model, wemust evaluate the relative gain of adding each new variable. We will see further criteriathat penalize the R² for each new entered variable: the adjusted R² adjusted and theMallow C(p).By looking at the graph below, we see that the R² increases significantly up to 3 variables.

The next variables are redundant and do not bring any more information that couldimprove significantly the model.The R² can be interpreted as the part of the variance explained by the model. It takes itsvalues between 0 and 1.

Curve of R2 accordind to the number of variables

Value of R2

 Number of model's

variables

0.45 0.48 0.52 0.55 0.59 0.62 0.66 0.69 0.73 0.77 0.80

1

2

3

4

5

6

7

8

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 89/176

The Linear Model and its applications

89

1 var

This output presents the 3 best adjustments with one exogenous variable.

Adjustments with 1 variable + constant DDL(Student) = 98Adjustment 1 (Full printout)

R**2 = 0.5051

Fisher = 100.0162

Probability = 0.0000

Test-Value = 8.283

Variable label Coefficient Student Probability Test-Value

Usage Index 0,0676 10,00 0,000 8,28

Adjustment 2 (Full printout)

R**2 = 0.4233

Fisher = 71.9390

Probability = 0.0000

Test-Value = 7.327

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,4215 8,48 0,000 7,33

Adjustment 3 (Full printout)

R**2 = 0.3985

Fisher = 64.9139

Probability = 0.0000

Test-Value = 7.040

Variable label Coefficient Student Probability Test-Value

Service Quality 0,7189 8,06 0,000 7,04

The number of degrees of freedom is 98.

The first adjustment is the best one, with an R² of 0.5051 ; meaning that the varianceexplained by the model represents 50,51 % of the total variance.

The Fisher statistic corresponds to the global validation of the model. This statisticsfollows a fisher distribution with one and 98 degrees of freedom. Its value of 100.02corresponds to a p-value lower than 1/10000 (0.0000). Thus, the model is acceptable. Thisp-value is also expressed as a test-value: 8.283 here.

The Coefficient column presents the estimation of the coefficient of the variable “UsageIndex”: the model can be written: Satisfaction Index = constant + 0.0676 x Usage Index

The Student column tests the nullity of the coefficient for the concerned variable: thisstatistic follows a student distribution with 98 degrees of freedom. Its value of 10corresponds to a p-value lower than 1/10000 (0.0000). The coefficient is significantlydifferent than zero.

This probability is also expressed in test value. Since the model gets only one explanatoryvariable, the test value of the coefficient is the same than the global model.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 90/176

Optimal Regressions Research

90

6 vars

Adjustments with 6 variables + constant DDL(Student) = 93Adjustment 1 (Full printout)

R**2 = 0.8009

Fisher = 62.3410

Probability = 0.0000

Test-Value = 11.408

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,3061 8,10 0,000 7,03

Price Index 0,2446 5,95 0,000 5,47

Price Flexibility 0,2912 7,99 0,000 6,95

Perceived Quality 0,4324 7,39 0,000 6,54

Commercial Image -0,1978 2,35 0,021 2,31

Product Quality -0,0470 1,49 0,139 1,48

Adjustment 2 (Full printout)

R**2 = 0.7993Fisher = 61.7159

Probability = 0.0000

Test-Value = 11.376

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,0777 1,49 0,140 1,47

Price Flexibility 0,2846 7,84 0,000 6,85

Perceived Quality 0,4210 7,13 0,000 6,35

Service Quality 0,4536 5,87 0,000 5,40

Commercial Image -0,1926 2,28 0,025 2,24

Product Quality -0,0417 1,33 0,188 1,32

Adjustment 3 (Full printout)

R**2 = 0.7973

Fisher = 60.9833

Probability = 0.0000

Test-Value = 11.338

Variable label Coefficient Student Probability Test-Value

Price Index -0,0624 1,14 0,256 1,14

Price Flexibility 0,2891 7,83 0,000 6,84

Perceived Quality 0,4167 7,03 0,000 6,28

Service Quality 0,5884 7,93 0,000 6,91

Commercial Image -0,1894 2,23 0,028 2,20Product Quality -0,0453 1,42 0,159 1,41

The three adjustments listed above have 6 exogenous variables.

For the first adjustment, we can see that the variable “Product Quality” has a coefficientnon significantly different than zero to the usual threshold of 5%.

Finally, the best adjustment is obtained with 6 exogenous variables. It is confirmed by thefollowing graphs.

But since one coefficient is not significantly different than zero, we may choose the modelwith 5 variables:

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 91/176

The Linear Model and its applications

91

Adjustments with 5 variables + constant DDL(Student) = 94Adjustment 1 (Full printout)

R**2 = 0.7961

Fisher = 73.4081

Probability = 0.0000

Test-Value = 11.506Variable label Coefficient Student Probability Test-Value

Delivery Time 0,3247 9,05 0,000 7,65

Price Index 0,2291 5,73 0,000 5,29

Price Flexibility 0,2993 8,25 0,000 7,14

Perceived Quality 0,4303 7,31 0,000 6,49

Commercial Image -0,2100 2,49 0,015 2,44

The R² adjusted criterion

Curve of R² adjusted according to the number of explanatory variables

The R² adjusted criterion is based on the standard R², but it imposes a penalty for eachadditional explanatory variable that is used to build the model. To increase this criterion,the entry of a new variable needs to be sufficient (if the variable is redundant with theones already included in the model, the criterion decreases).

The graph below shows that the best models have to be found in the ones with 5 or 6explanatory variables.

Curve of R2 ajusted accordind to the number of variables

Value of R2 ajusted

 Number of model'svariables

0.44 0.48 0.51 0.55 0.58 0.62 0.65 0.68 0.72 0.75 0.79

1

2

3

4

5

6

7

8

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 92/176

Optimal Regressions Research

92

The Mallows CP criterionCurve of Mallows CP according to the number of explanatory variables

Lower is this criterion, better is the adjustment. We get the same results than with theprevious criterions, the best models have 5 or 6 variables.

Curve of Mallows Cp accordind to the number of variables

Value of Mallows Cp

 Number of 

model's

variables

0.0 0.1 0.3 0.4 0.5 0.7 0.8 0.9 1.1 1.2 1.3

1

2

3

4

5

6

7

8

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 93/176

The Linear Model and its applications

93

Formulas of the criterions R², R² adjusted and Mallows Cp

1.   R² :The coefficient of determination R2 (which takes values in the range 0 to 1) is a measureof the proportion of the total variation that is associated with the regression process:

² 1SSE 

 RSST 

= −  

SSE : Error Sum of SquaresSST : Total Sum of Squares.

 2. 

 R² adjusted :The R² adjusted criterion is based on the standard R², but it imposes a penalty for eachadditional explanatory variable that is used to build the model.

( 1)(1 ²)² 1 ( )

n R

 R n p

− −

= − −  

n : the number of observations,p : the number of variables used for the model plus one.

3. 

 Mallows CP - C(p) :The Mallows C(p) is positively related to the error (SSE) and the number ofexplanatory variables in the model :a model with a lot of variables or with a high error

will be penalized by this criterion.

( ) 2SSE 

C p p nSST 

= + −  

References:

  Furnival, G.M. and Wilson, R.W. (1974), “Regression by Leaps and Bounds”Technometrics, 16, 499 -511.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 94/176

Logistic Regression

94

LOGISTIC REGRESSION 

Logistic regression is a model used for prediction of the probability of occurrence of anevent by fitting data to a logistic curve. It makes use of several predictor variables thatmay be either numerical or categorical.Binomial (or binary) logistic regression is a form of regression which is used when thedependent Y is a dichotomy and the independents are of any type X1, X2,..., Xp.

LOGIT INTRODUCTION 

The logistic regression means to explain the probability of a binary event. This probabilitycannot be explained by a traditional regression model using the least squares method.

Thus, we perform a qualified LOGIT transformation whose process uses the generalizedlinear model and establishes a method based on the research of maximum likelihood.

If P is the probability that we are trying to explain, the P/(1-P) ratio must be defined asODDS and the magnitude that is finally explained corresponds to this ODDS logarithm.

We want to explain ( )1 21/ ,P Y X X  =  

Thus: ( ) ( )1 2 1 21/ , 2 / , 1P Y X X P Y X X  = + = =  

The logit of the probability P is the logarithm of the quotient :

1

P

P−

 

( )Logit Log1

PP

P

⎛ ⎞=   ⎜ ⎟

−⎝ ⎠  (1)

Graphical representation of the P logit

0  1/2  1 P

.1

⎟ ⎠

 ⎞⎜⎝ 

⎛ 

− P

P Log  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 95/176

The Linear Model and its applications

95

LOGISTIC MODEL WITH BINARY EXPLANATORY VARIABLES 

The model can be written:

0 1 1 2 2Log1

P X X 

P β β β 

⎛ ⎞= + +⎜ ⎟−⎝ ⎠

  (2)

The logit of the probability is a linear function of the explanatory variables but theprobability itself is a non linear function. Indeed, according to (2)

( )( )

0 1 1 2 2

0 1 1 2 2

exp

1 exp

 X X P

 X X 

 β β β 

 β β β 

+ +=

+ + + 

The model (2) is an additive model for binary categorical exogenous variables (coded 0 or1). The models with categorical exogenous variables with more than 2 categories and withcrossed effects are presented further.

LOGISTIC MODEL WITH CATEGORICAL EXOGENEOUS VARIABLES WITH MORE THAN 2 CATEGORIES 

A categorical variable with no hierarchy in the categories needs to be recoded before itsintroduction in the model into many binary variables (0/1), well known under the nameof design variables.We introduce as much design variables as categories.

But the following problem appears: the k design variables are not independent becausetheir sum makes 1 whatever the individual.

A simple solution is to eliminate one of the design variables. The category not introducedin the model has a zero coefficient by convention. We can consider that it represents thereference situation.

Mathematically, the choice of the reference category has no importance.

We can for example choose as reference the modal category (category with the largestcount).

Consider Y as the dependent variables with 2 categories 1 and 2. Consider Z as acategorical variable with 4 categories corresponding to the race of the individual.Z = 1: WhiteZ = 2: BlackZ = 3: HispanicZ = 4: Others

If we choose the White category as reference, the D matrix is the following:

The three columns of D (D2, D3, D4) correspond to the coding of Z into design variablesthat will be introduced in the model.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 96/176

Logistic Regression

96

Table 1 D Matrix construction

RACE (categories) D2 D3 D4

 White (1)Black (2)

Hispanic (3)

Others (4)

01

0

0

00

1

0

00

0

1

The logistic model is then written this way:

( )

( )

( )( ){

0

2

0 3

4

1/ 1 1 0 0 0

1/ 2 1 1 0 0Logit

1/ 3 1 0 1 01/ 4 1 0 0 1

 D  D

P Y Z 

P Y Z 

P Y Z P Y Z 

 β 

 β β 

 β 

= =

= == +

= == =14243

 

Thus, the explanatory variable Z with k categories is transformed into (k-1) designvariables, notated du. If the first category is the reference, the logit is written:

( ) 0

2

Logit 1/k 

u u

u

P Y Z d   β β =

= = +⎡ ⎤⎣ ⎦   ∑  

For example, we obtain

( ) 0 2 2 3 3 4 4Logit 1/P Y Z d d d   β β β β = = + + +⎡ ⎤⎣ ⎦  

with

du = 1 if Z = u

du = 0 otherwise

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 97/176

The Linear Model and its applications

97

LOGISTIC REGRESSION WITH SPAD

Iterations number :Specifies the maximum number of iterations to perform.By default, Iterations number=25. If convergence is not attained in n iterations, thedisplayed output created by the procedure contain results that are based on the lastmaximum likelihood iteration.

Seuil alpha pour les tests (en %) :Sets the level of significance α for (100 – α)% confidence intervals for regressionparameters or odds ratios. The value α must be between 0 and 100. By default, α isequal 5%.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 98/176

Logistic Regression

98

Parameterization method for categorical variablesConsider a model with one categorical variable A with four categories, 1, 2, 5, and 7.

Comparison to meanThree columns are created to indicate group membership of the nonreferencecategories. For the reference category, all three design variables have a value of -1. Forinstance, if the reference category is 7 (REF='7'), the design matrix columns for A are asfollows.

Comparison to mean Coding  

Design Matrix 

A  A1  A2  A5 

1  1  0  0 

2  0  1  0 

5  0  0  1 

7  -1  -1  -1 

Parameter estimates of a categorical variable main effects using the “Comparison tomean” coding scheme estimate the difference in the effect of each nonreferencecategory compared to the average effect over all 4 categories.

GLMFour columns are created to indicate group membership. The design matrix columnsfor A are as follows.

GLM Coding  

Design Matrix 

A  A1  A2  A5  A7 

1  1  0  0  0 

2  0  1  0  0 

5  0  0  1  0 

7  0  0  0  1 

As in ANOVA, the last category coefficient is fixed to 0. Parameter estimates of acategorical variable main effects using the GLM coding scheme estimate the differencein the effects of each category compared to the last category.

Comparison to a reference

Three columns are created to indicate group membership of the nonreferencecategories. For the reference category, all three design variables have a value of 0. For

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 99/176

The Linear Model and its applications

99

instance, if the reference level is 7 (REF='7'), the design matrix columns for A are asfollows.

Comparison to a Reference Coding  

Design Matrix 

A  A1  A2  A5 

1  1  0  0 

2  0  1  0 

5  0  0  1 

7  0  0  0 

Parameter estimates of a categorical variable main effects using the “Comparison to a

reference” coding scheme estimate the difference in the effect of each nonreferencecategory compared to the effect of the reference category.

Variable selections :

The selection options are available only if the model contains simple factors (nointeraction).

No selectionThe model is estimated with all the input variables, this is the default option.

ForwardThe procedure first estimates parameters for factors forced into the model. Thesefactors are the intercepts and the first n explanatory factors in the model statement,where n is the number specified by the “Number of variables in initial model” (n iszero by default). Next, the procedure computes the score chi-square statistic for eachfactor not in the model and examines the largest of these statistics. If it is significant atthe “Threshold (%) for the variable’s entry in model” level, the corresponding factor isadded to the model. Once a factor is entered in the model, it is never removed from the

model. The process is repeated until none of the remaining effects meet the specifiedlevel for entry or until the “Number of variables in final model” value is reached.

BackwardParameters for the complete model as specified in the model statement are estimatedunless the “Number of variables in initial model” option is specified. In that case, onlythe parameters for the intercepts and the first n explanatory factors in the modelstatement are estimated, where n is the “Number of variables in initial model”. Resultsof the Wald test for individual parameters are examined. The least significant factorthat does not meet the “Threshold (%) for a variable to stay in the model” is removed.

Once a factor is removed from the model, it remains excluded. The process is repeateduntil no other factor in the model meets the specified level for removal or until the“Number of variables in final model” value is reached. Backward selection is often less

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 100/176

Logistic Regression

100

successful than forward or stepwise selection because the full model fit in the first stepis the model most likely to result in a complete or quasi-complete separation ofresponse values.

StepwiseThis option is similar to the FORWARD option except that factors already in the modeldo not necessarily remain. Factors are entered into and removed from the model insuch a way that each forward selection step may be followed by one or more backwardelimination steps. The stepwise selection process terminates if no further factor can beadded to the model or if the factor just entered into the model is the only factorremoved in the subsequent backward elimination.

EXAMPLE BASED ON THE CREDIT.SBA DATASET 

Response variable

1 . Type of client 2 CATEGORIES

Categorical explanatory variables:2 . Age of client 4 CATEGORIES3 . Family Situation 4 CATEGORIES4 . Seniority 5 CATEGORIES5 . Salary domiciliation 2 CATEGORIES6 . Size of savings 4 CATEGORIES7 . Profession 3 CATEGORIES8 . Average outstanding 3 CATEGORIES

9 . Average transactions 4 CATEGORIES10 . Number of withdrawals 3 CATEGORIES11 . Overdraft 2 CATEGORIES12 . Checkbook 2 CATEGORIES

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 101/176

The Linear Model and its applications

101

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 102/176

Logistic Regression

102

REGRESSION LOGISTIQUE MODEL PRESENTATION

 MODEL DEFINITION================  RESPONSE VARI ABLE . . . . . . . . . . . . . . . : Type of cl i ent

NUMBER OF RESPONSE LEVELS . . . . . . . : 2NUMBER OF OBSERVATI ONS . . . . . . . . . . : 468

LI NK FUNCTI ON . . . . . . . . . . . . . . . . . . . : BI NARY LOGI TOPTI MI ZATI ON TECHNI QUE . . . . . . . . . . : FI SHER' S SCORI NG

RESPONSE PROFILE================  VARI ABLE RESPONSE : Type of cl i ent

==========================  ORDER RESPONSE FREQUENCY

- - - - - - - - - - - - - - - - - - - - - - - - - -1 Good 2372 Bad 231

==========================PROBABI LI TY MODELED I S: Type of cl i ent = Good

DESCRITIVE STATISTICS FOR EXPLANATORY VARIABLES===============================================FREQUENCY DISTRIBUTION OF CATEGORICAL VARIABLES========================================================================

  Type of cl i ent- - - - - - - - - - - - - - - - - -

VARI ABLE VALUE Good Bad TOTAL- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Seni ori t y 1 year or l ess 66 133 199

Fr om 1 t o 4 year s 19 28 47Fr om 4 t o 6 year s 42 27 69Fr om 6 t o 12 years 44 22 66Over 12 year s 66 21 87

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Sal ary domi ci l i ati on Sal . domi ci l i ated 204 112 316

Sal . not domi ci l . 33 119 152- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Si ze of savi ngs No savi ng 169 201 370

Less t han 10 KF 34 24 58Fr om 10 t o 100 KF 26 6 32

Mor e t han 100 KF 8 0 8 THI S VARI ABLE I S PARTI ALLY NESTED I N THE RESPONSE VARI ABLE!- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Prof essi on execut i ve 51 26 77

empl oyee 127 110 237ot her 59 95 154

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Age of cl i ent Less t han 23 year s 31 57 88

Fr om 23 t o 40 years 71 79 150Fr om 40 t o 50 years 68 54 122Over 50 year s 67 41 108

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Fami l y Si t uat i on Si ngl e 80 90 170

Mar r i ed 129 92 221Di vor ced 24 37 61Wi dow 4 12 16

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Average outst andi ng Less t han 2 KF 19 79 98

Fr om 2 t o 5 KF 168 140 308Mor e t han 5 KF 50 12 62

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Average t r ansact i ons Less t han 10 KF 44 110 154

Fr om 10 t o 30 KF 32 39 71Fr om 30 t o 50 KF 82 47 129Mor e t han 50 KF 79 35 114

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Number of wi t hdrawal s Less t han 40 113 58 171

Fr om 40 t o 100 87 74 161Mor e t han 100 37 99 136

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Overdr af t Author i zed 83 119 202

For bi dden 154 112 266- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Checkbook Aut hor i zed 231 184 415For bi dden 6 47 53

========================================================================NB : TO ALLOW CALCULATI ONS ONE CASE WI TH OPPOSI TE WAS AFFECTED WI TH EACH

LEVEL CAUSE OF PARTI AL NESTI NG!

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 103/176

The Linear Model and its applications

103

RESULTS ABOUT THE MODEL

FITTING OF MODELCONVERGENCE CRI TERI OM ( . 1E- 07) SATI SFI ED================================================  I NTERCEPT I NTERCEPT AND

ONLY COVARI ATES================================================AKAI KE CRI TERI OM 650. 752 460. 104SCHWARZ CRI TERI OM 654. 900 567. 964

- 2 LOG ( L) 648. 752 408. 104================================================

TESTING GLOBAL NULL HYPOTHESIS : BETA = 0======================================================  CHI - SQUARE DF PROB > KHI 2- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -LI KELI HOOD RATI O 240. 6475 25 < 0. 0001WALD 119. 1086 25 < 0. 0001======================================================

TYPE III ANALYSIS OF EFFECTS=======================================================EFFECT DF WALD CHI - SQU PROB > CHI SQ- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Seni ori t y 4 23. 2572 0. 0001Sal ary domi ci l i ati on 1 25. 9650 < 0. 0001Si ze of savi ngs 3 0. 6047 0. 8953

Prof essi on 2 2. 3555 0. 3080Age of cl i ent 3 8. 0984 0. 0440Fami l y Si t uat i on 3 12. 6296 0. 0055Average outst andi ng 2 6. 4046 0. 0407Average t r ansact i ons 3 8. 0692 0. 0446Number of wi t hdr awal s 2 21. 1787 < 0. 0001Overdr af t 1 0. 2441 0. 6213Checkbook 1 15. 6171 < 0. 0001=======================================================

 ANALYSIS OF MAXIMUM LIKELIHOOD ESTIMATES==================================================================================================PARAMETER DF ESTI MATE STAND. ERROR WALD CHI - SQU. PROB > CHI 2 EXP( ESTI M. )- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -I nter cept 1 - 1. 3248 0. 5152 6. 6123 0. 0101 0. 2659Seni or i t y 1 1 - 1. 0047 0. 2304 19. 0143 < 0. 0001 0. 3662

2 1 - 0. 1850 0. 3369 0. 3016 0. 5829 0. 8311

3 1 0. 7539 0. 3165 5. 6730 0. 0172 2. 12524 1 0. 0304 0. 3123 0. 0094 0. 9226 1. 0308Sal ary domi ci l i at i on 1 1 0. 7396 0. 1451 25. 9650 < 0. 0001 2. 0950Si ze of savi ngs 1 1 0. 0430 0. 5466 0. 0062 0. 9374 1. 0439

2 1 0. 2895 0. 4440 0. 4250 0. 5145 1. 33573 1 0. 0220 0. 5631 0. 0015 0. 9688 1. 0223

Prof essi on 1 1 0. 3516 0. 2681 1. 7197 0. 1897 1. 42132 1 - 0. 0442 0. 1853 0. 0570 0. 8113 0. 9567

Age of cl i ent 1 1 - 0. 7262 0. 2822 6. 6230 0. 0101 0. 48382 1 - 0. 0130 0. 2101 0. 0039 0. 9505 0. 98703 1 0. 4832 0. 2242 4. 6423 0. 0312 1. 6212

Fami l y Si t uat i on 1 1 0. 9222 0. 2983 9. 5593 0. 0020 2. 51472 1 0. 2492 0. 2639 0. 8918 0. 3450 1. 28303 1 - 0. 6348 0. 3555 3. 1889 0. 0741 0. 5300

Average outst andi ng 1 1 - 0. 8553 0. 3446 6. 1612 0. 0131 0. 42522 1 0. 0486 0. 2946 0. 0272 0. 8690 1. 0498

Average t r ansact i ons 1 1 - 0. 5518 0. 2245 6. 0422 0. 0140 0. 57592 1 - 0. 1342 0. 2564 0. 2741 0. 6006 0. 87443 1 0. 1469 0. 2183 0. 4527 0. 5010 1. 1582

Number of wi t hdr awal s 1 1 0. 9794 0. 2213 19. 5817 < 0. 0001 2. 66292 1 0. 0606 0. 1804 0. 1127 0. 7371 1. 0624

Overdr af t 1 1 - 0. 0660 0. 1336 0. 2441 0. 6213 0. 9361Checkbook 1 1 1. 0448 0. 2644 15. 6171 < 0. 0001 2. 8427==================================================================================================

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 104/176

Logistic Regression

104

ODDS RATIO ESTIMATES=========================================================================EFFECT ESTI MATE CONFI DENCE LI MI TS *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Seni or i t y 1 VS 5 0. 244 0. 109 0. 548

2 VS 5 0. 554 0. 200 1. 5383 VS 5 1. 417 0. 535 3. 7554 VS 5 0. 687 0. 263 1. 798

Sal ary domi ci l i at i on 1 VS 2 4. 389 2. 485 7. 752

Si ze of savi ngs 1 VS 4 1. 488 0. 101 22. 0042 VS 4 1. 904 0. 150 24. 2083 VS 4 1. 457 0. 126 16. 898

Prof essi on 1 VS 3 1. 933 0. 816 4. 5772 VS 3 1. 301 0. 745 2. 271

==================================================================================================================================================EFFECT ESTI MATE CONFI DENCE LI MI TS *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Age of cl i ent 1 VS 4 0. 374 0. 146 0. 962

2 VS 4 0. 764 0. 350 1. 6683 VS 4 1. 255 0. 585 2. 690

Fami l y Si t uat i on 1 VS 4 4. 300 0. 851 21. 7342 VS 4 2. 194 0. 455 10. 5793 VS 4 0. 906 0. 166 4. 960

Average outst andi ng 1 VS 3 0. 190 0. 041 0. 8822 VS 3 0. 469 0. 114 1. 922

Average t r ansact i ons 1 VS 4 0. 336 0. 154 0. 7322 VS 4 0. 510 0. 219 1. 1883 VS 4 0. 676 0. 325 1. 404

Number of wi t hdr awal s 1 VS 3 7. 534 3. 164 17. 9392 VS 3 3. 006 1. 419 6. 366

Overdr af t 1 VS 2 0. 876 0. 519 1. 479Checkbook 1 VS 2 8. 081 2. 867 22. 779=========================================================================* 95% WALD CONFI DENCE LI MI TS

CONFUSION MATRIXFREQUENCI ES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| ESTI M Good Bad | TOTAL

- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -OBSERV Good | 191 45 | 236

Bad | 38 194 | 232

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TOTAL | 229 239 | 468

ROW PERCENTAGES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| ESTI M Good Bad | TOTAL

- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -OBSERV Good | 80. 932 19. 068 | 100. 000

Bad | 16. 379 83. 621 | 100. 000- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TOTAL | 48. 932 51. 068 | 100. 000

COLUMN PERCENTAGES- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| ESTI M Good Bad | TOTAL

- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -OBSERV Good | 83. 406 18. 828 | 50. 427

Bad | 16. 594 81. 172 | 49. 573

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TOTAL | 100. 000 100. 000 | 100. 000CLASSI FI CATI ON- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| CLASS. WELL BAD | TOTAL

- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -OBSERV Good | 80. 932 19. 068 | 100. 000

Bad | 83. 621 16. 379 | 100. 000- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - TOTAL | 82. 265 17. 735 | 100. 000

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 105/176

The Discriminant and its methods

105

THE DISCRIMINANT AND ITS METHODS 

FUWILD  -  OPTIMAL DISCRIMINANT ANALYSIS 

Purpose

This method is the branch and bound algorithm of Furnival and Wilson (1974).

The FUWILD procedure selects the N ''best'' adjustments for the linear discriminantanalysis. The selection criteria could be the R2, the adjusted R2 or the Cp of Mallows.

If N is the number of the best adjustments required and P is the number of explanatoryvariables of the model, the procedure calculates the N best adjustments for all sizes ofmodels from 1 to P-1 variables (the adjustment with the P variables is unique).

The procedure supplies the value of the criterion (R2, R2 adjusted or Cp), Fisher's Fassociated with R2, the critical probability associated with this F, and the correspondingtest value.

The list of the variables of the model is then presented with the estimated coefficients, thenull tests, the critical probability and the associated test value. Finally, a diagramrepresenting the evolution of the criterion as a function of the number of the variables inthe models supplies a quick summary of the selections.

Dataset

The dataset is extracted from a survey where 100 respondents judge their suppliers. Thecriteria are :

  Delivery time  Prices level  Prices flexibility  Image  Services  Commercial image  Product quality

About the suppliers, we know the size of the company in two classes: more or less than 50employees.

The goal of the analysis is to study the differences between these two classes. 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 106/176

FUWILD - Optimal Discriminant Analysis

106

IDDelivery

Time

Prices

Level

Prices

FlexibilityImage Services

Commercial

Image

Product

Quality

Supplier's

Company Size

1 4,1 0,6 6,9 4,7 2,4 2,3 5,2 < 50 employees

2 1,8 3 6,3 6,6 2,5 4 8,4 >= 50 employees

3 3,4 5,2 5,7 6 4,3 2,7 8,2 >= 50 employees

4 2,7 1 7,1 5,9 1,8 2,3 7,8 >= 50 employees

5 6 0,9 9,6 7,8 3,4 4,6 4,5 < 50 employees6 1,9 3,3 7,9 4,8 2,6 1,9 9,7 >= 50 employees

7 4,6 2,4 9,5 6,6 3,5 4,5 7,6 < 50 employees

8 1,3 4,2 6,2 5,1 2,8 2,2 6,9 >= 50 employees

9 5,5 1,6 9,4 4,7 3,5 3 7,6 < 50 employees

10 4 3,5 6,5 6 3,7 3,2 8,7 >= 50 employees

11 2,4 1,6 8,8 4,8 2 2,8 5,8 < 50 employees

12 3,9 2,2 9,1 4,6 3 2,5 8,3 < 50 employees

13 2,8 1,4 8,1 3,8 2,1 1,4 6,6 >= 50 employees

14 3,7 1,5 8,6 5,7 2,7 3,7 6,7 < 50 employees

15 4,7 1,3 9,9 6,7 3 2,6 6,8 < 50 employees

16 3,4 2 9,7 4,7 2,7 1,7 4,8 < 50 employees

17 3,2 4,1 5,7 5,1 3,6 2,9 6,2 < 50 employees18 4,9 1,8 7,7 4,3 3,4 1,5 5,9 < 50 employees

19 5,3 1,4 9,7 6,1 3,3 3,9 6,8 < 50 employees

20 4,7 1,3 9,9 6,7 3 2,6 6,8 < 50 employees

21 3,3 0,9 8,6 4 2,1 1,8 6,3 < 50 employees

22 3,4 0,4 8,3 2,5 1,2 1,7 5,2 < 50 employees

23 3 4 9,1 7,1 3,5 3,4 8,4 < 50 employees

24 2,4 1,5 6,7 4,8 1,9 2,5 7,2 >= 50 employees

25 5,1 1,4 8,7 4,8 3,3 2,6 3,8 < 50 employees

26 4,6 2,1 7,9 5,8 3,4 2,8 4,7 < 50 employees

27 2,4 1,5 6,6 4,8 1,9 2,5 7,2 >= 50 employees

28 5,2 1,3 9,7 6,1 3,2 3,9 6,7 < 50 employees

29 3,5 2,8 9,9 3,5 3,1 1,7 5,4 < 50 employees30 4,1 3,7 5,9 5,5 3,9 3 8,4 >= 50 employees

31 3 3,2 6 5,3 3,1 3 8 >= 50 employees

32 2,8 3,8 8,9 6,9 3,3 3,2 8,2 < 50 employees

33 5,2 2 9,3 5,9 3,7 2,4 4,6 < 50 employees

34 3,4 3,7 6,4 5,7 3,5 3,4 8,4 >= 50 employees

35 2,4 1 7,7 3,4 1,7 1,1 6,2 >= 50 employees

36 1,8 3,3 7,5 4,5 2,5 2,4 7,6 >= 50 employees

37 3,6 4 5,8 5,8 3,7 2,5 9,3 >= 50 employees

38 4 0,9 9,1 5,4 2,4 2,6 7,3 < 50 employees

39 0 2,1 6,9 5,4 1,1 2,6 8,9 >= 50 employees

40 2,4 2 6,4 4,5 2,1 2,2 8,8 >= 50 employees

41 1,9 3,4 7,6 4,6 2,6 2,5 7,7 >= 50 employees42 5,9 0,9 9,6 7,8 3,4 4,6 4,5 < 50 employees

43 4,9 2,3 9,3 4,5 3,6 1,3 6,2 < 50 employees

44 5 1,3 8,6 4,7 3,1 2,5 3,7 < 50 employees

45 2 2,6 6,5 3,7 2,4 1,7 8,5 >= 50 employees

46 5 2,5 9,4 4,6 3,7 1,4 6,3 < 50 employees

47 3,1 1,9 10 4,5 2,6 3,2 3,8 < 50 employees

48 3,4 3,9 5,6 5,6 3,6 2,3 9,1 >= 50 employees

49 5,8 0,2 8,8 4,5 3 2,4 6,7 < 50 employees

50 5,4 2,1 8 3 3,8 1,4 5,2 < 50 employees

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 107/176

The Discriminant and its methods

107

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 108/176

FUWILD - Optimal Discriminant Analysis

108

Fuwil – 4

The Fuwil – 4 excel sheet gives the main statistics of each class regarding to theexplanatory variables.

The column “Within-group mean” displays the means of each explanatory variablerespectively for the group 1 and 2. By default, the group 1 is the first category (in the list)of the endogenous variable. In this example, the group 1 concerns small suppliers (< 50employees) and the group 2 bigger suppliers (50 or more employees)The column “General mean” displays the mean of each variable observed on the total set.

Missing data handling for exogenous variables

Missing values are replaced by within-groups means

Group Variable labelWithin-group

meanGeneral mean

Number of

missing

values

1 Delivery Time 4,192 3,515 0

1 Prices Level 1,948 2,364 0

1 Prices Flexibility 8,622 7,894 0

1 Image 5,213 5,248 0

1 Services 3,050 2,916 0

1 Commercial Image 2,692 2,665 0

1 Product Quality 6,090 6,971 0

2 Delivery Time 2,500 3,515 0

2 Prices Level 2,988 2,364 0

2 Prices Flexibility 6,803 7,894 0

2 Image 5,300 5,248 0

2 Services 2,715 2,916 0

2 Commercial Image 2,625 2,665 0

2 Product Quality 8,293 6,971 0  

This table is useful to detect the variables with the main average differences between theclass and the overall sample.For example, the class number 2 (suppliers with more than 50 employees), obtains anaverage quality score of 8.293, while the class number 1 obtains a score of 6.090.The Image variable does not differentiate small suppliers than bigger ones.

With the DEMOD procedure (Descriptive statistics), we would get these results :

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 109/176

The Discriminant and its methods

109

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 110/176

FUWILD - Optimal Discriminant Analysis

110

The R² CriterionCurve of R² according to the number of explanatory variables

This graph displays the evolution of the R² criterion according to the number ofexplanatory variables included in the model. Higher is the R², better is the adjustment.

The R² increases automatically with the number of explanatory variables.Therefore, it is recommended to find a compromise between the best R² and the smallestmodel in terms of explanatory variables.Some other criterions are available in the parameters tab such as : R² adjusted, MallowsCP.

The graph below shows that the R² increases until the entry of the 4th explanatory variable;then adding some other variables do not increase the R² and the adjustment’s quality :these variables are redundant.The R² can be interpreted as the part of variance explained by the linear discriminant

function. It goes from 0 to 1.

Curve of R2 accordind to the number of variables

Number of model's

variables

Value of R2

7

6

5

4

3

2

1

0.43 0.45 0.48 0.50 0.53 0.55 0.57 0.60 0.62 0.65 0.67

 

The excel sheets 1 var to 7 vars display the 3 best adjustments regarding to the R² formodels from 1 to 7 explanatory variables.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 111/176

The Discriminant and its methods

111

1 var

This table lists the 3 best adjustments (R²) with one single explanatory variable.

Adjustments with 1 variable + constante DDL(Student) = 98Adjustment 1 (Full printout)

R**2 = 0.4680Fisher = 86.2000

Probability = 0.0000

Test-Value = 7.845

Variable label Coefficient Student Probability Test-Value

Product Quality -0,4337 9,28 0,000 7,85

Adjustment 2 (Full printout)

R**2 = 0.4173

Fisher = 70.1912

Probability = 0.0000

Test-Value = 7.258

Variable label Coefficient Student Probability Test-Value

Prices Flexibility 0,4683 8,38 0,000 7,26

Adjustment 3 (Full printout)

R**2 = 0.3977

Fisher = 64.7156

Probability = 0.0000

Test-Value = 7.032

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,4799 8,04 0,000 7,03  

The number of degrees of freedom is 98.

The first adjustment is the best one, with a R² of 0.468 ; this means that the between groupvariance (between the two classes) represents 46.8% of the overall variance. A model thatis unable to differentiate the two classes is given a 0 R².

The Fisher statistic corresponds to the global model validation.Higher is the between group variance, higher is the Fisher statistic. This criterion follows aFisher law with 1 and 98 degrees of freedom.The 86.2 Fisher statistic corresponds to a probability lower than 1/10000 (0.0000).

The model is acceptable. This probability is converted into a test-value, here 7.85.

The column “coefficient” contains the estimation of the coefficient « Product Quality»°: thefunction discriminant D is written : D = constant – 0.4337 x Product Quality.

The Student column test the nullity of the coefficient « Product Quality » : this statisticfollows a student law with 98 degrees of freedom; the 9.28 value corresponds to aprobability lower than 1/10000 (0.0000). The coefficient is significantly different from 0.

The probability is also converted into a test value, here we obtain 7.85. As the modelcontains one single explanatory variable, test values of the coefficient and the overall

quality adjustment are equal.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 112/176

FUWILD - Optimal Discriminant Analysis

112

6 vars

The two following adjustments contain both 6 explanatory variables.

Adjustments avec 6 variables + constante DDL(Student) = 93Adjustment 1 (Full printout)

R**2 = 0.6718

Fisher = 31.7290

Probability = 0.0000

Test-Value = 9.210

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,3005 1,12 0,264 1,12

Prices Level 0,1242 0,45 0,656 0,44

Prices Flexibility 0,2418 4,40 0,000 4,18

Services -0,2308 0,45 0,657 0,44

Commercial Image 0,1516 1,85 0,067 1,83

Product Quality -0,2812 5,90 0,000 5,42

Adjustment 2 (Full printout)

R**2 = 0.6716

Fisher = 31.6987

Probability = 0.0000

Test-Value = 9.207

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,1863 3,27 0,002 3,17

Prices Level 0,0070 0,11 0,910 0,11

Prices Flexibility 0,2383 4,33 0,000 4,12

Image -0,0328 0,37 0,711 0,37

Commercial Image 0,1833 1,44 0,152 1,43

Product Quality -0,2790 5,87 0,000 5,40

Adjustment 3 (Full printout)

R**2 = 0.6716

Fisher = 31.6925

Probability = 0.0000

Test-Value = 9.206

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,1844 2,35 0,021 2,31

Prices Flexibility 0,2368 4,34 0,000 4,13

Image -0,0317 0,36 0,722 0,36

Services 0,0029 0,02 0,980 0,02

Commercial Image 0,1831 1,44 0,153 1,43Product Quality -0,2779 5,88 0,000 5,41  

For the first adjustment, the variables « Prices Flexibility» and « Product Quality » are theonly ones significant to 5% (probability that the related coefficient is null, lower than 5%).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 113/176

The Discriminant and its methods

113

3 Vars

Finally, we should search the best adjustments in the models with 3 or 4 explanatoryvariables, where all the coefficients are significant and the models test values are thehighest.

Adjustments avec 3 variables + constante DDL(Student) = 96Adjustment 1 (Full printout)

R**2 = 0.6591

Fisher = 61.8789

Probability = 0.0000

Test-Value = 9.660

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,2031 3,64 0,000 3,51

Prices Flexibility 0,2370 4,55 0,000 4,32

Product Quality -0,2592 5,79 0,000 5,35

Adjustment 2 (Full printout)R**2 = 0.6392

Fisher = 56.6932

Probability = 0.0000

Test-Value = 9.378

Variable label Coefficient Student Probability Test-Value

Prices Flexibility 0,3016 6,06 0,000 5,56

Services 0,2206 2,68 0,009 2,63

Product Quality -0,3097 7,12 0,000 6,36

Adjustment 3 (Full printout)

R**2 = 0.6338

Fisher = 55.3919Probability = 0.0000

Test-Value = 9.303

Variable label Coefficient Student Probability Test-Value

Prices Flexibility 0,3018 6,02 0,000 5,53

Commercial Image 0,1953 2,38 0,019 2,34

Product Quality -0,3323 7,46 0,000 6,61  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 114/176

FUWILD - Optimal Discriminant Analysis

114

The R² adjusted criterionCurve of R² adjusted according to the number of explanatory variables

The R² adjusted criterion is based on the standard R², but it imposes a penalty for eachadditional explanatory variable that is used to build the model. To increase this criterion,

the entry of a new variable needs to be sufficient (if the variable is redundant with theones already included in the model, the criterion decreases).

The graph below shows that the best models have to be found in the ones with 3 or 4explanatory variables.

Curve of R2 ajusted accordind to the number of variables

Number of model's

variables

Value of R2 ajusted

7

6

5

4

3

2

1

0.42 0.45 0.47 0.49 0.52 0.54 0.56 0.59 0.61 0.63 0.66

 4 vars

The firs adjustment with 4 explanatory variables is the following:

Adjustments avec 4 variables + constante DDL(Student) = 95Adjustment 1 (Full printout)

R2AJ = 0.6574

Fisher = 48.4911

Probability = 0.0000

Test-Value = 9.612

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,1840 3,28 0,001 3,18

Prices Flexibility 0,2390 4,64 0,000 4,40

Commercial Image 0,1476 1,86 0,066 1,84

Product Quality -0,2788 6,13 0,000 5,61  

The R² adjusted is about 0.6574; very close to the standard R² of 0.6711. The explanatoryvariables are meaningful, thus the penalty related to the R² adjusted is very small.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 115/176

The Discriminant and its methods

115

The Mallows CP criterionCurve of Mallows CP according to the number of explanatory variables

Lower is this criterion, better is the adjustment. We get the same results than with theprevious criterions, the best models have 3 or 4 variables.

Curve of Mallows Cp accordind to the number of variables

Number of 

model's

variables

Value of Mallows Cp

7

6

5

4

3

2

1

0.00 0.05 0.11 0.16 0.22 0.27 0.32 0.38 0.43 0.49 0.54

 4 vars

Adjustments avec 4 variables + constante DDL(Student) = 95

Adjustment 1 (Full printout)C(P) = 2.2916

Fisher = 48.4607

Probability = 0.0000

Test-Value = 9.610

Variable label Coefficient Student Probability Test-Value

Delivery Time 0,1840 3,28 0,001 3,18

Prices Flexibility 0,2390 4,64 0,000 4,40

Commercial Image 0,1476 1,86 0,066 1,84

Product Quality -0,2788 6,13 0,000 5,61  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 116/176

FUWILD - Optimal Discriminant Analysis

116

Formulas of the criterions R², R² adjusted and Mallows Cp

 4.   R² :The coefficient of determination R2 (which takes values in the range 0 to 1) is a measureof the proportion of the total variation that is associated with the regression process:

² 1SSE 

 RSST 

= −  

SSE : Error Sum of SquaresSST : Total Sum of Squares.

5. 

 R² adjusted :The R² adjusted criterion is based on the standard R², but it imposes a penalty for eachadditional explanatory variable that is used to build the model.

( 1)(1 ²)² 1 ( )

n R

 R n p

− −

= − −  

n : the number of observations,p : the number of variables used for the model plus one.

6. 

 Mallows CP - C(p) :The Mallows C(p) is positively related to the error (SSE) and the number ofexplanatory variables in the model :a model with a lot of variables or with a high error

will be penalized by this criterion.

( ) 2SSE 

C p p nSST 

= + −  

References :

  Furnival, G.M. and Wilson, R.W. (1974), “Regression by Leaps and Bounds”Technometrics, 16, 499 -511.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 117/176

The Discriminant and its methods

117

DIS2GD  -  LINEAR DISCRIMINANT ANALYSIS BASED ON

CONTINUOUS VARIABLES 

This procedure executes a linear discriminant analysis with two groups on continuousvariables, using Fisher's classical method.The procedure provides bootstrap estimates of the bias and the precision of the principalresults of the discrimination : coefficients, case classification probabilities, and globalpercentage classifications. It allows the modification of the costs and a priori probabilitiesof classification in the groups. It manages base, test and anonymous cases.

The procedure outputs in advance the descriptive statistics on the variables of the modelin each of the two groups. The discriminant analysis results follow: classification tables,discriminant function, results of the equivalent regression, and output of assignment to

cases.

If a bootstrap validation is required, the results of the discrimination are output again withthe bootstrap estimates. In particular, the bias and the precision of the global classificationsare shown facing the direct classifications. For anonymous cases, the procedure calculatesthe bootstrap probability of their assignment.

If an evaluation of the case tests is required, the procedure will output the results of thediscrimination for these cases. If the assignment of anonymous cases is requested, only thedisplay of the assignments is shown.

The procedure can archive the rules for the discriminant function so that they can beapplied later on another file with the same structure.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 118/176

DIS2GD - Linear Discriminant Analysis based on

continuous variables

118

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 119/176

The Discriminant and its methods

119

Dis2g – 3

The following table describes the differences observed between the two classes, regardingto the input explanatory variables.

Analyse discriminante linéaire sur l'échantillon DE BASE Description des échantillons 

Libellé de la variableG1 :

< 50 salariés [ 60]G2 :

>= 50 salariés [ 40]T de

StudentProbabilité

Délais de livraison 8.045 0.000

Moyenne 4.192 2.500

Ecart-type 1.029 1.006

Minimum 2.100 0.000

Maximum 6.100 4.900

Flexibilité des prix 8.378 0.000

Moyenne 8.622 6.803

Ecart-type 1.154 0.879

Minimum 5.100 5.000

Maximum 10.000 8.500

Qualité du produit 9.284 0.000

Moyenne 6.090 8.293

Ecart-type 1.282 0.918

Minimum 3.700 6.200

Maximum 8.500 10.000

The first group G1 corresponds to the suppliers with less than 50 employees. There are 60in the sample.The second group G2 corresponds to the suppliers with 50 or more employees, there are40.

SPAD displays the means, standard deviations, minima and maxima for each explanatoryvariable by group.

The Student T column corresponds to the test that the two means of the two groups areequal for each explanatory variable. We reject this hypothesis for the three variables

because the associated probabilities are lower than 1/10000.

The product quality is perceived as significantly higher for the suppliers with more than50 employees (average score of 8.29 against 6.09).Reversely, delivery times and prices flexibility are better for smaller suppliers.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 120/176

DIS2GD - Linear Discriminant Analysis based on

continuous variables

120

Dis2g – 4

This table displays all the correlation matrices associated with the discriminant analysis.

Correlation matrix

Correlation matrix on group 1 : < 50 employees (Cont = 60)Delivery

Time

Prices

Flexibility

Product

Quality

Delivery Time 1,00

Prices Flexibility 0,32 1,00

Product Quality -0,17 0,04 1,00

Correlation matrix on group 2 : >= 50 employees (Cont = 40)

Delivery

Time

Prices

Flexibility

Product

Quality

Delivery Time 1,00

Prices Flexibility -0,12 1,00

Product Quality 0,07 -0,16 1,00

Within-group common correlation

Delivery

Time

Prices

Flexibility

Product

Quality

Delivery Time 1,00

Prices Flexibility 0,17 1,00

Product Quality -0,09 -0,01 1,00

Total correlation

Delivery

Time

Prices

Flexibility

Product

Quality

Delivery Time 1,00

Prices Flexibility 0,51 1,00

Product Quality -0,48 -0,45 1,00  

The first two correlation matrices display the correlations between explanatory variablesinside each group. For example, the correlation between delivery time and pricesflexibility is 0.32 for the group 1 and –0.12 for the group 2.These two matrices allow us to determine redundancies between explanatory variables:this is not the case in this example.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 121/176

The Discriminant and its methods

121

Dis2g – 6

Classification table of the discriminant analysis

Result of the FISHER linear discriminant analysis on sample: TRAINTable of groups counts

Assignment

group: < 50

employees

Assignment

group: >= 50

employees

Total

Original group: < 50 employees 50 10 60

Original group: >= 50 employees 4 36 40

Classification table (counts and percentages)

Well

classifiedMisclassified Total

Original group: < 50 employees 50 10 60

83,33 16,67 100,00

Original group: >= 50 employees 36 4 4090,00 10,00 100,00

Total 86 14 100

86,00 14,00 100,00  

The adjustment presents a good classification rate on the current set: 50 of the 60 smallsuppliers and 36 of 40 big suppliers, respectively 83% and 90%.

Globally, the good classification rate is 86% = (50+36)/100.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 122/176

DIS2GD - Linear Discriminant Analysis based on

continuous variables

122

Dis2g – 9

This table displays the characteristics of the linear discriminant function :

Linear discriminante function

R2 = 0.65913 Fisher = 61.87877 Probability = 0.0000

D2 (Mahalanobis) = 7.89599 T2 (Hotelling) = 189.50369 Probability = 0.0000

Variable label

Correlations

with D.L.F.

(Threshold =

0.201)

D.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

T de Student

(regression)Probability

Delivery Time 0,632 1,191760 0,203073 0,0558 3,6373 0,0004

Prices Flexibility 0,648 1,390700 0,236972 0,0521 4,5482 0,0000

Product Quality -0,686 -1,521000 -0,259174 0,0448 5,7880 0,0000

CONSTANT -3,774790 -0,777758 0,5981 1,3005 0,1966  

The R² is 0.659; it means that the between group variance (that expresses the differencesbetween the two groups) represents 65.9% of the total variance.

The Fisher statistic corresponds to the global model validation.Higher is the between group variance, higher is the Fisher statistic. This criterion follows aFisher law with 1 and 96 degrees of freedom.The 61.87 Fisher’s statistic corresponds to a probability lower than 1/10000 (0.0000).The model is acceptable.

D² is the Mahalanobis distance between the two groups. This distance takes into accountthe relationships between explanatory variables (the common correlation matrix).

The T² of Hotelling is a generalization of the Student test when we have more than one

explanatory variable. It tests the hypothesis that all the means are equal.In this example, T² of Hotelling is 189.503 ; the associated probability is lower than 1/1000:differences between means are significant.

For each explanatory variable, SPAD displays its correlation with the F.L.D. (LinearDiscriminant Function). The threshold of 0.201 corresponds to the limit where we considera correlation as significant (the threshold is given in absolute value).The correlations between each explanatory variable and the linear discriminant functionare significant and quite close: the linear discriminant function is a good well-balancedcompromise between these three variables.

The F.L.D. coefficients give the model equation: therefore the best linear combination ofthe 3 explanatory variables to separate the two groups is the following:

S1(x) = 1.191 x Delivery Time + 1.39 x Prices Flexibility – 1.52 x Product Quality – 3.77.

This equation gives high scores to suppliers that provide good delivery times and prices flexibility(group 1, < 50 employees), and low scores for suppliers that have good quality products (group 2,

>= 50 employees).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 123/176

The Discriminant and its methods

123

Of course, the following equation is equivalent to the previous one but inverses the sign ofscores :

S2(x) = - 1.191 x Delivery Time - 1.39 x Prices Flexibility + 1.52 x Product Quality+ 3.77.

The suppliers’ hierarchy is not modified.

The regression' coefficient’s column is redundant with the discriminant functioncoefficient’s column : they are proportional.

Linear discriminant analysis based on two groups is a particular case of multiple regressions.

This equation : S3(x) = - 0.203 x Delivery Time + 0.236 x Prices Flexibility– 0.299 x Product Quality – 0.777

Is still equivalent to the two previous ones.

The Student’s T and the associated probabilities are calculated from the regressioncoefficients, but are valid for the discriminant function coefficients because of the

proportionality.The Student’s T are the rate between the regression coefficient and their standarddeviation: for example, 3.63 = 0.203 / 0.558.

Thus, we can see that our three coefficients are significant at 1% but not the constant term.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 124/176

DIS2GD - Linear Discriminant Analysis based on

continuous variables

124

BOOTSTRAP Estimations: Dis2g - 12 and Dis2g - 13

SPAD provides a Bootstrap validation for all its discriminant functions : the purpose is tosimulate by resampling several samples to calculate for each one an adjustment. In thisexample, we have chosen 250 samples.

At the end, we obtain 250 estimations for the classification table and for the coefficients ofthe linear discriminant function.

The good classification and misclassification rates are calculated as an average of the 250estimations. It is the same for the coefficients.

Dis2g - 12

Discriminant analysis by bootstrap estimations: 250 random samplesClassification table (Counts and percentages)

Training

sample - Well

classified

Training

sample -

Misclassified

Bootstrap -

Well

classified

Bootstrap -

MisclassifiedTotal

Original group: < 50 employees 50,00 10,00 49,53 10,47 60,00

83,33 16,67 82,55 17,45 100,00

Original group: >= 50 employees 36,00 4,00 35,78 4,22 40,00

90,00 10,00 89,45 10,55 100,00

Total 86,00 14,00 85,31 14,69 100,00

86,00 14,00 85,31 14,69 100,00  

Dis2g – 13

Bootstrap estimations for linear discriminant function

Variable label

Correlations

with D.L.F.

(Mean)

Standard

deviation

D.L.F

coefficients

(Mean)

Standard

deviation

Mean /

Standard

deviation

Delivery Time 0,637 0,051 1,296 0,379 3,418

Prices Flexibility 0,648 0,064 1,500 0,513 2,924

Product Quality -0,691 0,038 -1,633 0,327 4,996

CONSTANT -4,163 4,680 0,889  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 125/176

The Discriminant and its methods

125

Dis2g – 11

In this excel sheet, SPAD displays for each case their observed group, their assignedgroup, the probability of being assigned to this group by the model and their discriminantscore.

The column « Group of origin » gives for each case has to be compared with the “Groupassignment” column. If the model is right, SPAD prints '=='.

The Fisher function or score is calculated by the model with the following equation:S(x) = 1.191 x Delivery Time + 1.39 x Prices Flexibility – 1.52 x Product Quality – 3.77.

For example, for the case n°79, the score –7.767 is calculated this way : (Delivery Time 1.00,Prices Flexibility 7.1, and Product Quality 9.9)

-7.767 = 1.197 x 1.00 + 1.39 x 7.1 – 1.52 x 9.9 – 3.77.

Cases are listed by decreasing scores. Thus the case n°79 gets the lowest score andtherefore the highest probability of assignment to the group 2 (50 and more employees).Reversely, cases with high scores have higher probability of assignment to the group 1(lower than 50 employees).

For each case, SPAD calculates the probabilities to be assigned to each group and assignsthe case to the group with the highest probability. The “indifference” point (equalprobabilities for the two groups) corresponds here to a zero Fisher score; it does notappear in this example

The assignment probability is obtained from the Fisher Score (S(x)):

))(exp(1

))(exp()/( 1

 xS 

 xS  xGP

+=  and then )/(1)/( 12  xGP xGP   −=  

Sample: TRAINING

List of group assignments and related probabilities

Case identifierOriginal

groupAssignment

Assignment

probability

Fisher

function

Individu n° 79 >=50 == 1,000 -7,767

Individu n° 39 >=50 == 1,000 -7,716

Individu n° 65 >=50 == 1,000 -7,661

Individu n° 93 <50 >=50 0,877 -1,962

Individu n° 88 <50 >=50 0,873 -1,932

Individu n° 84 <50 >=50 0,848 -1,720

Individu n° 87 >=50 <50 0,640 0,577

Individu n° 13 >=50 <50 0,687 0,788

Individu n° 85 >=50 <50 0,690 0,802

Individu n° 25 <50 == 1,000 8,623

Individu n° 42 <50 == 1,000 9,763

Individu n° 5 <50 == 1,000 9,882  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 126/176

DIS2GFP - Linear Discriminant Analysis

 based on Principal Factors

126

DIS2GFP  -  LINEAR DISCRIMINANT ANALYSIS

BASED ON PRINCIPAL FACTORS 

General principles

This procedure outputs a linear discriminant analysis with two groups on the factorialcoordinates from a NOT NORMED principal components analysis using the classicalFisher method.

It provides bootstrap estimates of the bias and the precision of the principal results of the

discrimination: coefficients, case classification probabilities, global classificationpercentages. It also allows the modification of the a priori costs and probabilities of theclassification in the groups. It provides the management of the base cases, of the test casesand of the anonymous cases.

The procedure offers a print preview of the descriptive statistics of the model variables ineach of the two groups. Next the results of the discriminant analysis are shown:classification tables, discriminant function, and output of the assignment of cases.

The decision rule is finally expressed as a function of the original variables. The results of

the regression equivalent are only indicative, since the classical hypotheses of normalityare meaningless in this context.

If a bootstrap validation is requested, the results of the discrimination are repeated withthe bootstrap estimates. In particular, the bias and the precision of the global classificationsare shown with the direct classifications. For anonymous cases, the procedure calculatestheir bootstrap probability assignment.

If an evaluation of the test cases is required, the procedure outputs the results of thediscrimination relative to these cases. If the assignment of anonymous cases is required,

only the assignments are output.

The procedure can archive the rules for the discriminant function so they can be appliedlater to another file of the same structure.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 127/176

The Discriminant and its methods

127

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 128/176

DIS2GFP - Linear Discriminant Analysis

 based on Principal Factors

128

Dis2g – 1

This first excel sheet displays the studied model: the variable to explain is the same than inthe previous methods (Suppliers company size), the explanatory variables are theprincipal factors get from the principal component analysis based on all the continuous

variables available in the dataset except Satisfaction index.By default, SPAD assigns the prefix F and the number corresponding to each factor.« F1 », « F2 », …. We ordered SPAD to process this analysis on the 7 factors, that is to say99.99% of the total inertia…

Model : V8=F1+F2+F3+F4+F5+F6+F7

Variable number Variable label

8 Supplier's Company Size

1 F 1

2 F 2

3 F 3

4 F 45 F 5

6 F 6

7 F 7  

EI GENVALUESCOMPUTATI ONS PRECI SI ON SUMMARY : TRACE BEFORE DI AGONALI SATI ON. . 89. 9375

SUM OF EI GENVALUES. . . . . . . . . . . . 89. 9375HI STOGRAM OF THE FI RST 8 EI GENVALUES+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +| NUMBER | EI GENVALUE | PERCENTAGE | CUMULATED | || | | | PERCENTAGE | |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

| 1 | 81. 8822 | 91. 04 | 91. 04 | *************** **************** **************|| 2 | 4. 0759 | 4. 53 | 95. 58 | **** || 3 | 1. 4053 | 1. 56 | 97. 14 | ** || 4 | 1. 2298 | 1. 37 | 98. 51 | ** || 5 | 0. 7842 | 0. 87 | 99. 38 | * || 6 | 0. 3903 | 0. 43 | 99. 81 | * || 7 | 0. 1617 | 0. 18 | 99. 99 | * || 8 | 0. 0081 | 0. 01 | 100. 00 | * |+- - - - - - - - +- - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

 

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 129/176

The Discriminant and its methods

129

Dis2g – 6 : Classification Table

Result of the FISHER linear discriminant analysis on sample: TRAINTable of groups counts

Assignment

group: < 50

employees

Assignment

group: >= 50

employees

Total

Original group: < 50 employees 54 6 60

Original group: >= 50 employees 0 40 40

Classification table (counts and percentages)

Well

classifiedMisclassified Total

Original group: < 50 employees 54 6 60

90,00 10,00 100,00

Original group: >= 50 employees 40 0 40

100,00 0,00 100,00

Total 94 6 100

94,00 6,00 100,00  

The adjustment presents good classification rate on this sample: it assigns correctly 54 of60 small suppliers and all the big ones, respectively 90% and 100%.

The global good classification rate is 94% = (54+40)/100.

Comparison with the model of the previous chapter

We can notice that this model obtains better results than the previous one that only used

three predictors (“Delivery time», «Prices flexibility” and “Quality product”).

Classification table of the previous model :

Result of the FISHER linear discriminant analysis on sample: TRAINTable of groups counts

Assignment

group: < 50

employees

Assignment

group: >= 50

employees

Total

Original group: < 50 employees 50 10 60

Original group: >= 50 employees 4 36 40

Classification table (counts and percentages)

Well

classifiedMisclassified Total

Original group: < 50 employees 50 10 60

83,33 16,67 100,00

Original group: >= 50 employees 36 4 40

90,00 10,00 100,00

Total 86 14 100

86,00 14,00 100,00  

In our current model, we have kept all the available information (almost all theexplanatory variables and all the factors), it is normal to get better results.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 130/176

DIS2GFP - Linear Discriminant Analysis

 based on Principal Factors

130

Dis2g – 9 : results of the model based on principal factors

Linear discriminante function

R2 = 0.71210 Fisher = 32.50721 Probability = 0.0000

D2 (Mahalanobis) = 10.09961 T2 (Hotelling) = 242.39072 Probability = 0.0000

Axe label

Correlations

with D.L.F.

(Threshold =

0.201)

D.L.F.

coefficients

Regression

coefficients

Standarddeviation

(Regression)

T de Student

(regression)Probability

F 1 -0,380 -0,291099 -0,041896 0,0062 6,7769 0,0000

F 2 -0,651 -2,234360 -0,321575 0,0277 11,6055 0,0000

F 3 0,240 1,403290 0,201965 0,0472 4,2798 0,0000

F 4 0,036 0,226999 0,032670 0,0504 0,6477 0,5188

F 5 0,028 0,221504 0,031879 0,0632 0,5046 0,6150

F 6 0,278 3,090510 0,444793 0,0895 4,9676 0,0000

F 7 -0,101 -1,747540 -0,251510 0,1391 1,8078 0,0739

CONSTANT 1,009960 0,000000 0,0559 0,0000 1,0000  

The R² is 0.7120 ; it means that the between group variance represents 71.20 % of the totalvariance.The Fisher statistic is 32.50 corresponding to a probability lower than 1/10000 (0.0000).Thus, the model is accepted.

All the statistics displayed in the above table are described in the previous section page 19.

We can see that the factors 4 and 5 present coefficient none significantly different fromzero (probabilities 0.5188 and 0.6150). The factor 7 presents also a probability greater than0.05.

The coefficients of the Linear Discriminant Function give the following equation:S1(x) = - 0.291 x F1 - 2.23 x F2 + 1.40 x F3 + 0.226 x F4 + 0.221 x F5 + 3.09 x F6- 0.14 x F7 + 1.0099.

Dis2g – 10 : Fisher linear discriminant function rebuilt,starting from original variables

This excel sheet is the most interesting for the user because it displays the equation model

based on the original variables and no longer on the principal factors.Thus, we find the variables « Delivery Time », « Prices flexibility » with strong positivecoefficients.

To understand the coefficients, we have to remember that the equation opposites the twogroups by giving high scores to the small suppliers and low scores to the bigger ones. Bydefault, SPAD always gives high scores to the first category (in the list) of the endogenousvariable.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 131/176

The Discriminant and its methods

131

Remark : coefficients calculation on the original variables

SPAD displays in the table below the linear discriminant function based on originalvariables; it has been calculated from the linear discriminant function based on theprincipal factors. We know that each principal factor is a linear combination of the originalvariables.

The coefficients of these combinations are available in the PCA outputs in the columncalled “Normed eigenvectors”.

Normed eigenvectors

Label variable Axis 1 Axis 2 Axis 3 Axis 4 Axis 5 Axis 6 Axis 7

Delivery Time -0,10 -0,28 0,26 -0,09 -0,74 0,36 -0,04

Prices Level -0,01 0,48 0,03 -0,47 0,35 0,49 -0,07

Prices Flexibility -0,09 -0,40 -0,25 0,49 0,33 0,65 -0,01

Image -0,03 0,26 0,69 0,41 0,11 0,07 0,52

Services -0,06 0,11 0,16 -0,29 -0,18 0,42 -0,03

Commercial Image -0,02 0,15 0,40 0,31 0,07 -0,04 -0,85

Product Quality 0,04 0,65 -0,45 0,43 -0,42 0,12 0,02Frequency of use -0,99 0,07 -0,06 -0,02 0,03 -0,12 0,01  

The factor 1 can be calculated this way :F1 = -0.10 x "Delivery Time" -0.01 x “Prices level" - 0.9 x "Prices flexibility"- 0.03 x "Image" -0.06 x "Services" – 0.02 x "Commercial Image"+ 0.04 x "Product Quality" – 0.99 x "Frequency of use".

… and so on for all the factors. Starting from these equations, SPAD can assign acoefficient to each original variable.

FISHER linear function rebuilt starting original variables

Variable label Category labelD.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

T de Student

(regression)Probability

Delivery Time 2,018560 0,290515 0,0588 4,9366 0,0000

Prices Level 0,590870 0,085039 0,0573 1,4851 0,1410

Prices Flexibility 2,771660 0,398903 0,0684 5,8360 0,0000

Image -0,198512 -0,028570 0,0831 0,3437 0,7319

Services 1,257900 0,181039 0,0430 4,2083 0,0001

Commercial Image 1,674080 0,240937 0,1206 1,9979 0,0487

Product Quality -1,764960 -0,254017 0,0453 5,6085 0,0000

Frequency of use -0,327111 -0,047079 0,0130 3,6132 0,0005CONSTANTE -9,065840 -1,450130  

The linear discriminant function equation is the following :

D1 = 2.02 x Delivery Time + 0.59 x Prices level + 2.77 x Prices Flexibility– 0.20 x Image + 1.26 x Services + 1.67 x Commercial Image – 1.76 x Product Quality

– 0.33 x Frequency of use– 9.07.

The variables « Image » et « Prices Level» are not significant (respective probabilities of

0.7319 et 0.141). The small contribution of the variable « Image » is not surprising, we get

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 132/176

DIS2GFP - Linear Discriminant Analysis

 based on Principal Factors

132

the same result than the ones obtained with the automatic characterization. (see tablebelow)About the variable « Prices level », it is surprising to find it not significant in the modelwhile it appears significant in the automatic characterization.This is due to the correlations existing between the explanatory variables: the prices level

is related to the variables « Delivery Time », « Prices Flexibility »… These variables tend toreduce the specific effect due to the prices level.

Characterisation by continuous variables of categories of 

Supplier's Company Size< 50 employees (Weight = 60.00 Count = 60 )

Characteristic variablesCategory

meanOverall mean

Category Std.

deviation

Overall Std.

deviationTest-value Probability

Prices Flexibility 8,622 7,894 1,154 1,380 6,43 0,000

Delivery Time 4,192 3,515 1,029 1,314 6,27 0,000

Frequency of use 48,767 46,100 8,724 8,944 3,63 0,000

Services 3,050 2,916 0,584 0,747 2,18 0,014

Commercial Image 2,692 2,665 0,859 0,767 0,42 0,336

Image 5,213 5,248 1,281 1,126 -0,38 0,354

Prices Level 1,948 2,364 1,018 1,190 -4,26 0,000

Product Quality 6,090 6,971 1,282 1,577 -6,81 0,000

>= 50 employees (Weight = 40.00 Count = 40 )

Characteristic variablesCategory

meanOverall mean

Category Std.

deviation

Overall Std.

deviationTest-value Probability

Product Quality 8,293 6,971 0,918 1,577 6,81 0,000

Prices Level 2,988 2,364 1,156 1,190 4,26 0,000

Image 5,300 5,248 0,838 1,126 0,38 0,354

Commercial Image 2,625 2,665 0,601 0,767 -0,42 0,336

Services 2,715 2,916 0,905 0,747 -2,18 0,014

Frequency of use 42,100 46,100 7,690 8,944 -3,63 0,000

Delivery Time 2,500 3,515 1,006 1,314 -6,27 0,000

Prices Flexibility 6,803 7,894 0,879 1,380 -6,43 0,000  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 133/176

The Discriminant and its methods

133

Simplified Model : dis2g - 9 et dis2g - 10

We modify our previous model by keeping only the significant principal factors:1, 2, and 3et 6.The results are listed below :

Linear discriminante functionR2 = 0.69976 Fisher = 55.35311 Probability = 0.0000

D2 (Mahalanobis) = 9.51685 T2 (Hotelling) = 228.40439 Probability = 0.0000

Axe label

Correlations

with D.L.F.

(Threshold =

0.201)

D.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

T de Student

(regression)Probability

F 1 -0,380 -0,279138 -0,041896 0,0062 6,7436 0,0000

F 2 -0,651 -2,142560 -0,321575 0,0278 11,5484 0,0000

F 3 0,240 1,345630 0,201965 0,0474 4,2588 0,0000

F 6 0,279 2,963530 0,444793 0,0900 4,9432 0,0000

CONSTANT 0,951688 0,000000 0,0562 0,0000 1,0000

 Factors are orthogonal, then the Student statistics do not change except the rounding error: we keep the same hierarchy and the same relative importance of the factors.The new linear discriminant function is now written :

S1(X) = - 0.27 x F1 - 2.14 x F2 +1.34 x F3 + 2.96 x F6 + 0.95.

FISHER linear function rebuilt starting original variables

Variable label Category labelD.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

T de Student

(regression)Probability

Delivery Time 2,043930 0,306772 0,0353 8,6795 0,0000

Prices Level 0,472917 0,070980 0,0461 1,5395 0,1272

Prices Flexibility 2,460700 0,369324 0,0605 6,1044 0,0000

Image 0,572879 0,085983 0,0340 2,5320 0,0131

Services 1,261200 0,189292 0,0390 4,8564 0,0000

Commercial Image 0,103654 0,015557 0,0196 0,7920 0,4304

Product Quality -1,669530 -0,250579 0,0300 8,3423 0,0000

Frequency of use -0,297403 -0,044637 0,0128 3,4890 0,0007

CONSTANTE -8,387220 -1,401670  

We find the same opposition between the characteristic variables of the small suppliers(Delivery Time and Prices Flexibility) and the bigger ones (Product Quality).

The variable « Commercial Image » is still not significant, but the variable “Image”becomes significant. Moreover, its positive coefficient indicates a characteristic of the smallcompanies. However, it is recommended to interpret this result with care, because theautomatic characterization shows that small suppliers have a lower image score than thebig ones (average of 5.21 compared to 5.3). This is due to the correlations existing betweenvariables. Working on a restricted number of factors was not sufficient to erase them.Finally, by eliminating non significant variables, principal factors, or variables whose thecoefficient’s sign is not coherent, we get back to the model of the previous chapter with thefollowing variables « Delivery Time », « Prices Flexibility » and « Product Quality ».Even if it discriminates less good than the other models studied in this chapter, we may

keep this one because of its coherence regarding to the relative contributions and effectssigns.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 134/176

DISCO - Discriminant Analysis

 based on Qualitative variables

134

DISCO  -  DISCRIMINANT ANALYSIS

BASED ON QUALITATIVE VARIABLES 

SCORE  -  SCORING FUNCTION 

With SPAD, building a scoring function requires the following steps:

  Firstly, we determine the most discriminant variables regarding to the endogenousvariables (The DEMOD and MSMOD procedures)

  Then, we perform a Multiple Correspondence Analysis (MCA) on the qualitativevariables selected.

  We perform a linear discriminant analysis based on the factorial coordinates extractedfrom the Multiple Correspondence Analysis.

  Then, we rebuilt the discriminant function starting from the original qualitativevariables.

  We normalize the coefficients of each explanatory category to get only zero or positivescores. The maximum score is defined by the user (100, 1000…).

  Then, each case is assigned a score regarding to its profile.

NB : The steps 2 and 3 are implemented in the DISCO procedure of the scoring chain.

The SPAD scoring method performs a multiple correspondence analysis for the followingreasons :

  The linear discriminant analysis is a method that requires only input continuousvariables.

  The MCA transforms qualitative variables into continuous factorial coordinates thatcan be used for the discriminant analysis.

  The factorial coordinates are orthogonal and we are liberated from the multicolinearityproblems.

  At last, the factorial coordinates selection optimizes the results.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 135/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 136/176

SCORE - Scoring Function

136

THE SCORING FAVOURITE 

We create a new chain using the Predefined Chain command from the general ChainMenu.

In the favourites tab, select the Scoring rubric and double click on

« Discriminant analysis on categorical variables and scoring ».

SPAD displays the following methods in the diagram.

Import the dataset credit.sda by using the“SPAD Data Archive File” importmethod.

Icon’s methods are grey because you have toconfigure them.

The SCORING parameters will bedefined by default.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 137/176

The Discriminant and its methods

137

DISCO PARAMETERS 

The configuration of the DISCO procedure start by defining the model to build : theendogenous variable and the qualitative exogenous variables :

The model is the following :

V1 = V2 + … + V12

In this « Model » tab, we canspecify the “real” model, i.e. builton the factorial coordinatesextracted from the MCA. 

To proceed, click on the button« Calculation Options »

We decide to build the complete

model, with all the factorialcoordinates.

Click on « OK » to go back the« Model » tab et again « Ok » tofinish the Disco configuration.

Run the methods.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 138/176

SCORE - Scoring Function

138

Right-click on the discriminant method icon to access the results.

Starting by the complete model allows us to keep in a second step only the significantfactors.

We visualize and select thefactorial axes that reallydiscriminate the target variable.

To do this, we use the ratioCoefficient/StDev, that can beinterpreted as a Student’s T.

We could keep all the axes withan absolute ratio greater than1.96.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 139/176

The Discriminant and its methods

139

DISCO RESULTS

Linear Discriminant Function

Model

V1=F1+F2+F3+F4+F5+F6+F7+F8+F9+F10+F11+F12+F13+F14+F15+F16+F17+F18+F19+F20+F21+F22+F23+F24+F25

Linear discriminant function

R2 = 0.41398 Fisher = 12.48967 Probability = 0.0000

D2 (Mahalanobis) = 2.81410 T2 (Hotelling) = 329.19614 Probability = 0.0000

Axis label

Correlationswith D.L.F.(Threshold

= 0.093)

D.L.F.

coefficients

Regression

coefficients

Standarddeviation

(Regression)

RatioCoefficient /

ST.Deviation

F 1 -0,475 -3,228700 -0,950022 0,0729 -13,0262

F 2 0,290 2,342510 0,689267 0,0867 7,9474

F 3 0,104 0,897833 0,264181 0,0925 2,8551

F 4 0,170 1,532160 0,450828 0,0967 4,6611

F 5 -0,007 -0,072457 -0,021320 0,1057 -0,2018

F 6 -0,057 -0,571836 -0,168259 0,1077 -1,5617

F 7 -0,022 -0,227015 -0,066797 0,1099 -0,6076

F 8 0,061 0,641800 0,188845 0,1130 1,6705

F 9 0,139 1,515070 0,445797 0,1173 3,8017

F 10 -0,045 -0,502921 -0,147981 0,1192 -1,2411

F 11 0,004 0,051269 0,015086 0,1224 0,1233

F 12 -0,028 -0,319744 -0,094082 0,1237 -0,7605

F 13 -0,030 -0,356309 -0,104841 0,1279 -0,8197F 14 -0,070 -0,847106 -0,249255 0,1300 -1,9170

F 15 0,045 0,567041 0,166848 0,1350 1,2364

F 16 0,002 0,023938 0,007043 0,1359 0,0518

F 17 -0,017 -0,219652 -0,064631 0,1405 -0,4599

F 18 -0,105 -1,389350 -0,408807 0,1425 -2,8691

F 19 0,049 0,676453 0,199041 0,1487 1,3381

F 20 -0,008 -0,119744 -0,035234 0,1546 -0,2279

F 21 -0,074 -1,071810 -0,315374 0,1553 -2,0303

F 22 0,024 0,367523 0,108141 0,1624 0,6659

F 23 0,068 1,151150 0,338719 0,1819 1,8622

F 24 -0,061 -1,190570 -0,350316 0,2089 -1,6768F 25 0,019 0,608556 0,179063 0,3351 0,5343

CONSTANT 0,018039 0,000000 0,0364 0,0000

The factors with a ratio whose the absolute value is greater than 1,96 are displayed in bold.These factors are to be included in the optimal model.

To build this model, we need to return to the Disco configuration and click on the button« Calculation Options ».

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 140/176

SCORE - Scoring Function

140

NEW CONFIGURATION OF THE DISCO METHOD 

We have specified theoptimal model to use forbuilding the discriminantfunction and in a secondstep the scoring function.

The optimal model is builtwith the followingfactors :

F1 à F4, F9, F18 et F21. 

We have to re-run thechain.

Now that the optimal model is available, we want to partition the dataset into two subsets: one to perform the analysis, the other one to confirm and validate the analysis. This partis called validation. We talk about Learning set and Testing set or test-cases in thefollowing tab.

In this example, wechoose to select randomly25 % of the cases to testthe model based on the 75% remaining cases.

Validation is very useful

for testing that the modeldoes not overfit the dataand has a good predictivepower.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 141/176

The Discriminant and its methods

141

THE DISCO RESULTS 

To measure the prediction performance of the model, we read the following classificationtable :

Result of the FISHER linear discriminant analysis on sample: TRAININGTable of groups counts

Assignment

group:

GOOD

Assignment

group: BADTotal

Original group: GOOD 150 28 178

Original group: BAD 35 138 173

Classification table (counts and percentages)

Well

classifiedMisclassified Total

Original group: GOOD 150 28 178

84,27 15,73 100,00

Original group: BAD 138 35 17379,77 20,23 100,00

Total 288 63 351

82,05 17,95 100,00  

Result of the FISHER linear discriminant analysis on sample: TESTTable of groups counts

Assignment

group:

GOOD

Assignment

group: BADTotal

Original group: GOOD 50 9 59

Original group: BAD 21 37 58

Classification table (counts and percentages)

Well

classifiedMisclassified Total

Original group: GOOD 50 9 59

84,75 15,25 100,00

Original group: BAD 37 21 58

63,79 36,21 100,00

Total 87 30 117

74,36 25,64 100,00  

On the TRAINING SET, 82.05 % of the cases are well classified.On the TESTING SET, 74.36 % of the cases are well classified.

The built model presents a good predictive power on both sets. It does not overfit thetraining set and looks reproductible.

Another validation way would be to use bootstrapping.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 142/176

SCORE - Scoring Function

142

Linear discriminant function

R2 = 0.38387 Fisher = 40.94150 Probability = 0.0000

D2 (Mahalanobis) = 2.48185 T2 (Hotelling) = 290.32867 Probability = 0.0000

Axis label

Correlations

with D.L.F.

(Threshold =0.093)

D.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

Ratio

Coefficient /

ST. Deviation

F 1 -0,475 -3,070890 -0,950022 0,0733 -12,9600

F 2 0,290 2,228010 0,689267 0,0872 7,9070

F 3 0,104 0,853949 0,264181 0,0930 2,8406

F 4 0,170 1,457270 0,450828 0,0972 4,6374

F 9 0,139 1,441010 0,445797 0,1179 3,7824

F 18 -0,105 -1,321450 -0,408807 0,1432 -2,8545

F 21 -0,074 -1,019430 -0,315374 0,1561 -2,0200

CONSTANT 0,015909 0,000000 0,0366 0,0000  

FISHER linear function rebuilt starting original variables

Variable label Category labelD.L.F.

coefficients

Regression

coefficients

Standard

deviation

(Regression)

Ratio

Coefficient /

ST. Deviation

Age of client Less than 23 years -4,413170 -1,365270 0,4690 -2,9112

From 23 to 40 years 1,944030 0,601412 0,2750 2,1873

From 40 to 50 years 1,169940 0,361938 0,2642 1,3698

Over 50 years -0,425731 -0,131706 0,3968 -0,3319

Family Situation Single 0,009629 0,002979 0,3449 0,0086

Married  1,427100 0,441492 0,2180 2,0249

Divorced  -3,502810 -1,083640 0,1992 -5,4407

Widow -6,459620 -1,998370 0,9437 -2,1177

Seniority 1 year or less -6,138460 -1,899020 0,3039 -6,2495

From 1 to 4 years -8,631250 -2,670200 0,3826 -6,9787From 4 to 6 years 9,017780 2,789770 0,5665 4,9243

From 6 to 12 years 2,082220 0,644162 0,4131 1,5592

Over 12 years 9,972050 3,084990 0,6592 4,6798

Salary domiciliation Sal. domiciliated  4,923760 1,523230 0,1359 11,2049

Sal. not domicil. -10,236200 -3,166720 0,2826 -11,2049

Size of savings No saving -1,401220 -0,433488 0,0864 -5,0190

Less than 10 KF 3,659600 1,132150 0,2840 3,9865

From 10 to 100 KF 6,438820 1,991940 0,6526 3,0524

More than 100 KF 12,519100 3,872970 1,1221 3,4515

Profession executive 3,238490 1,001870 0,3962 2,5287

employee 2,657760 0,822213 0,1786 4,6033

other  -5,709430 -1,766290 0,2101 -8,4074

Average outstanding Less than 2 KF -12,409300 -3,839000 0,4235 -9,0644From 2 to 5 KF 2,567540 0,794304 0,1589 4,9987

More than 5 KF 6,859870 2,122200 0,4772 4,4468

Average transactions Less than 10 KF -3,598390 -1,113210 0,3180 -3,5007

From 10 to 30 KF -0,489471 -0,151425 0,2069 -0,7318

From 30 to 50 KF 1,643420 0,508415 0,4045 1,2571

More than 50 KF 3,306170 1,022810 0,2583 3,9605

 Number of withdrawals Less than 40 6,128640 1,895980 0,2382 7,9587

From 40 to 100 -0,076000 -0,023512 0,2219 -0,1060

More than 100 -7,615890 -2,356080 0,3085 -7,6361

Overdraft Authorized   -1,481820 -0,458423 0,3886 -1,1795

Forbidden 1,125290 0,348125 0,2951 1,1795

Checkbook Authorized   1,654080 0,511713 0,0658 7,7814

Forbidden -12,951800 -4,006810 0,5149 -7,7814CONSTANT 0,015909 0,000000  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 143/176

The Discriminant and its methods

143

THE SCORING FUNCTION 

The SCORE procedure transforms the FLD coefficients by using the two following rules :

Minimum coefficient for each variable : for each categorical variable, the smallest

coefficient is set to the value zero. The minimum score possible for a case is zero. It isobtained for a case who, for each variable, presents the assigned category to zero.

Maximum possible of the score function : the value for the maximum possible score ischosen by the user (for example 1000). This maximum corresponds to the sum of thelargest transformed coefficients for each variable.

The score attributed to a case is obtained by adding the transformed coefficients associatedwith the categories of the case. The transformed score function classifies the cases in thesame way as the initial discriminant function.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 144/176

SCORE - Scoring Function

144

THE SCORE CONFIGURATION 

Click OK and run the method.

Parameter to modifyeventually, for assigning a

target category to low

To tick for creating a file

containing the DecisionRules to be a lied on new

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 145/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 146/176

SCORE - Scoring Function

146

OPTIMAL SCORING PILOT 

Double-clicking on the following icon opens the « Optimal Scoring Pilot » interface.Click on “New” in the “File” menu to display the graph below.

The user can define a rate called the Classification Error tolerance Abbreviated as CET inthe parameters Tab of the Score method. In this example, we chose 10% by default.This rate supports the calculation of regions on the score function scale :

The low boundary 528 has been chosen to assign 10.0% of the real good customers to theweak scores group (misclassified) and the high boundary 655 to assign 9.7% of the real badto high scores group (misclassified).

These boundaries are shrinkable if the user wants to modify the misclassification rates.

Three regions are displayed on the graph :

A "green" region, which corresponds to the high scores (here the category GOOD), whereone expects to find the majority of the GOOD customers. In this region, a miss-classifiedcase is a BAD customer assigned to GOOD because of its high score. The boundary iscalculated to contain a rate of miss-classified cases that does not exceed the CET.

In this example: 10.0% of the real BAD customers are assigned to this region and 62.4% of the

real GOOD are well assigned.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 147/176

The Discriminant and its methods

147

A "red" region of low scores containing most of the cases in BAD category - and thereforecorrectly classified - and a percentage not exceeding the CET of cases of GOOD andtherefore miss-classified.

In this example : 64.5% of the real BAD are well assigned and 9.7% of the real GOOD aremisclassified.

An intermediary “Orange” region between the boundaries of the red and green regions,where group assignment is left undecided. This region of indecision shrinks when the userincreases the CET.

In this example : 25.5% of the real BAD and 27.8% of the real GOOD are assigned to theOrange region.

Sometimes, it is not necessary to keep this intermediary region, for direct marketingcampaigns. Then, by clicking on the Single score checkbox, we keep only two regions (Redand Green) and one single boundary.

Modifying the boundaries by using the scores tableThis part of the user interface allows us to modify manually the CET andtherefore the boundaries.

The « Data » viewThis view is not available when the number of cases is greater than 10 000.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 148/176

SCORE - Scoring Function

148

The fields of the data view are described below :

Identifier :The cases identifier truncated to 40 characters.

Weight :The weight defined in the Weighting Tab of the DISCO method; 1 by default.

Sample :The set assignment for each case : learning or test set.

Score :The score calculated for each case.

Group :The original group (G1 or G2) of the case.

Assign. :Displays the group assignment determined by the model (G1, NC or G2).« NC » means that the case is not assigned, or assigned to the orange region.

Err. G1 - Error group 1 :

If the case belongs to the group 1 (G1) and is assigned to :the group 1, no errorthe orange zone, error coded (x)the group 2, error coded (xx)

Err. G2 - Error group 2 :If the case belongs to the group 2 (G2) and is assigned to:

the group 2, no errorthe orange zone, error coded (x)the group 1, error coded (xx)

Sort the data by a field:Clicking on any field name allows us to :

- sort the data in increasing order,- sort the data in decreasing order,- return to the initial order.

The case profile :By clicking on any case in the “Data” view, it is possible to :

- locate the case on the previous graph, the case is displayed in red with itsidentifier, press « Escape » to return to the « Data » view- display its profile in a condensed view- display its “questionnaire” and scores associated, the cases categories andassociated scores are blue. We know its original group and its assignment.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 149/176

The Discriminant and its methods

149

Interactive simulations, after clicking on « Questionnaire and score » :It is possible to simulate new score by clicking on the chosen categories. Theybecome red.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 150/176

SCORE - Scoring Function

150

Density curves

This graph draws respectively the density curves of the real BAD and the real GOODcustomers.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 151/176

The Discriminant and its methods

151

Lift or Gain Curve 

Horizontal axis : % of the scored cases selected, sorted by decreasing scoresVertical axis : % of the target category captured by the selection

The optimal curve is the one in grey where the selection captured the entire targetcategory and only the target category.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 152/176

SCORE - Scoring Function

152

ROC  Curve (Receiver Operating Curve) 

Sensitivity : Percentage of the target category captured (GOOD classified as GOOD)Specificity : Percentage of the other category well-classified1-Specificity : Percentage of the other category misclassified in the target category

(BAD misclassified as GOOD)

Closer is the curve to the upper left part of the graph, better is the separation between thetwo categories of the target variables.When the densities are equal, the ROC curve is confounded with the diagonal of the

square.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 153/176

The Discriminant and its methods

153

APPLY THE SCORING FUNCTION TO A NEW DATASET 

Firstly, you need to archive the model rules using the method “Predictive model rules file” fromthe “Deployment – Archiving\Archiving” rubric. Connect this new method to the scoring methodas follows:

  Give a name to the rule file and specify its location  Import the new dataset on which you want to apply the model you archived.  Connect the method “Predictive model deployment” from the “Deployment –

Archiving\Deployment” rubric and configure it.

  Run this new method and check the data view.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 154/176

IDT 1 - Interactive Decision Tree 1

154

IDT 1  -  INTERACTIVE DECISION TREE 1

IDT 2  -  INTERACTIVE DECISION TREE 2 

The IDT procedure produces decision trees from a data set. It is a discriminant procedurefor predicting the values of a categorical variable (variable to explain, with K groups) from

a set of explanatory variables that may be categorical, ordinal or continuous.The IDT procedure gives the user a choice of three well-established methods in DataMining: CHAID, C4.5 and C&RT. The model produced by the method is a Decision Tree,which can be evaluated with a test sample or by crossed validation. The procedureincludes additional information that lets you refine the results: integration of adjustmentwith the a priori group inclusion probabilities, and the introduction of a cost matrix forincorrect assignment.

The IDT procedure lets the user interactively manipulate the decision tree produced by themethod: pruning from the root, interactive segmentation of a node, and description of the

properties of a segmentation. The procedure also offers a fully interactive mode, in whichthe construction of the tree is entirely based on the user's ideas. Several supporting tools (alist of the best segmentations, descriptive statistics et al.) lets you choose the tree whichbest corresponds to the problem to be solved.

At all stages of the design conceived by the user, it is possible to output the reports inHTML format: on the complete decision tree, or locally on each node including a subset ofthe database analyzed.

To illustrate this method, we use the same dataset than for the scoring function: the CreditEnglish.sba dataset.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 155/176

The Discriminant and its methods

155

MARGINAL DISTRIBUTIONS OF THE CATEGORICAL VARIABLES 

GOOD_BAD JOB

Categories label Counts % Categories label Counts %

GOOD 237 50,64 Executive 77 16,45

BAD 231 49,36 Employee 237 50,64

Other  154 32,91

Overall 468 100,00

Overall 468 100,00

AGE

Categories label Counts % CHECKIN ACCOUNT

LT 23 years 88 18,80 Categories label Counts %

GE 23 LT 40 years 150 32,05 LT 2KF Account 98 20,94

GE 40 LT 50 years 122 26,07 GE 2 LT 5KF Account 308 65,81

GE 50 years 108 23,08 GE 5KF Account 62 13,25

Overall 468 100,00 Overall 468 100,00

MARITAL AVERAGE TRANSACTIONS

Categories label Counts % Categories label Counts %Single 170 36,32 LT 10 KF Trans. 154 32,91

Married  221 47,22 GE 10 LT 30 KF Trans 71 15,17

Divorced 61 13,03 GE 30 LT 50 KF Trans 129 27,56

Widowed  16 3,42 GE 50 KF Trans. 114 24,36

Overall 468 100,00 Overall 468 100,00

SENIORITY WITHDRAWALS

Categories label Counts % Categories label Counts %

LE 1 year  199 42,52 LT 40 With. 171 36,54

GT 1 LE 4 years 47 10,04 GE 40 LT 100 With. 161 34,40

GT 4 LE 6 years 69 14,74 GE 100 With. 136 29,06

GT 6 LT 12 years 66 14,10GT 12 years 87 18,59 Overall 468 100,00

Overall 468 100,00 NEGATIVE ACCOUNT BALANCE

Categories label Counts %

SALARY Allowed 202 43,16

Categories label Counts %  Not allowed 266 56,84

SALARY AT THE BANK  316 67,52

 NO SALARY 152 32,48 Overall 468 100,00

Overall 468 100,00 CHEQUE AUTHORIZATION

Categories label Counts %

SAVINGS CHEQUE OK 415 88,68

Categories label Counts %  NO CHEQUE 53 11,32

 No saving 370 79,06

LT 10 KF Sav. 58 12,39 Overall 468 100,00

GE 10 LT 100 KF Sav. 32 6,84

GE 100 KF Sav. 8 1,71

Overall 468 100,00  

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 156/176

IDT 2 - Interactive Decision Tree 2

156

IDT 1

The IDT1 procedure prepares the data for the construction of the tree (Procedure IDT2). Inparticular, it handles the missing data of the selected variables. The procedure outputs areport of the treatment of the missing data.

You also have available by default an automatic characterization of the variable todiscriminate by the set of the selected explanatory variables.

This characterization gives you a better selection of the explanatory variables; for example,by removing all of those that have no connection with the variable to discriminate.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 157/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 158/176

IDT 2 - Interactive Decision Tree 2

158

The CHAID algorithm that is particularly well suited for the analysis of larger datasetsand for a first exploration of the data.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 159/176

The Discriminant and its methods

159

The CART algorithm (written CR-T in SPAD because of the copyright) 

C&RT builds classification and regression trees for predicting continuous dependentvariables (regression) and categorical predictor variables (classification). The classic C&RTalgorithm was popularized by Breiman et al. (Breiman, Friedman, Olshen, & Stone, 1984;see also Ripley, 1996).

For classification, the CART algorithm uses the Gini impurity criterion. It is based onsquared probabilities of membership for each target category in the node. It reaches itsminimum (zero) when all cases in the node fall into a single target category.

CART is based on a decade of research, assuring stable performance and reliable results.CART's proven methodology is characterized by:

Reliable pruning strategy – CART algorithm considers that no stopping rule could be

relied on to discover the optimal tree, so CART integrates the notion of over-growing treesand then pruning back; this idea, fundamental to CART, ensures that important structureis not overlooked by stopping too soon.

Powerful binary-split search approach - CART's binary decision trees are more sparingwith data and detect more structure before too little data are left for learning. Otherdecision-tree approaches use multi-way splits that fragment the data rapidly, making itdifficult to detect rules that require broad ranges of data to discover.

Automatic self-validation procedures - In the search for patterns in databases it is

essential to avoid the trap of "over fitting" or finding patterns that apply only to thetraining data. CART's embedded test disciplines ensure that the patterns found will holdup when applied to new data. Further, the testing and selection of the optimal tree are anintegral part of the CART algorithm.

In addition, CART accommodates many different types of modeling problems byproviding a unique combination of automated solutions:

- surrogate splitters intelligently handle missing values;- adjustable misclassification penalties help avoid the most costly errors…

The classification and regression trees (C&RT) algorithms are generally aimed at achievingthe best possible predictive accuracy. Operationally, the most accurate prediction isdefined as the prediction with the minimum costs. The notion of costs was developed as away to generalize, to a broader range of prediction situations, the idea that the bestprediction has the lowest misclassification rate.In most applications, the cost is measured in terms of proportion of misclassified cases, orvariance. In this context, it follows, therefore, that a prediction would be considered best ifit has the lowest misclassification rate or the smallest variance. The need for minimizingcosts, rather than just the proportion of misclassified cases, arises when some predictionsthat fail are more catastrophic than others, or when some predictions that fail occur more

frequently than others.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 160/176

IDT 2 - Interactive Decision Tree 2

160

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 161/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 162/176

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 163/176

The Discriminant and its methods

163

IDT2 parameters

Type of analysis:

By default: Automatic

Automatic:The tree is growth automatically with respect to the stopped criterions defined by the user.

Automatic and crossed validation:

The tree is growth automatically and the procedure evaluates the error by crossed validation.In this case, you have to define the number of division or subsets to be used for the crossed-validation.

Interactive :

The tree is not growth at all. In the graphic interface, the user can develop it manually.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 164/176

IDT 2 - Interactive Decision Tree 2

164

Thresholds:

Minimum count for cutting (splitting) a segment (node): (by default: 5)This parameter defines the minimum count for splitting a node. Behind this threshold,no further split can be performed.By increasing this parameter, one reduces the size of the tree.

Admissible count: (by default: 1)

This parameter defines the minimum count for a leave after a split.By increasing this parameter, one reduces the size of the tree.

N umber of tree level: (by default: 10)This parameter defines the depth of the tree.By decreasing this parameter, one reduces the size of the tree.

Specification threshold: (by default: 0.9)This parameter defines the threshold from where we consider a node to belong to asingle target category. Then, no further split can be performed.

When target categories present real unbalanced weights, it is recommended to choose 1.By decreasing this parameter, one reduces the size of the tree.

Configure the method and run it. Right-click on the IDT2 method, choose the Resultscommand and click on Interactive decision tree Editor.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 165/176

The Discriminant and its methods

165

INTERACTIVE DECISION TREE EDITOR 

Results windows

The tool for viewing the Decision Tree produced by the IDT procedure comprises several

windows grouped together in a tabs page. They correspond to several different levels ofinformation relative to the model constructed.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 166/176

IDT 2 - Interactive Decision Tree 2

166

View the data

In a grid, this window shows all the data under analysis. Used together with theinformation window on the roots, it lets you follow the path of a case in the Decision Tree.

You can copy the contents of the grid to the clipboard and then paste it into a spreadsheetapplication.

The data grid has the format "Cases x Variables»:

•  the leftmost column in grey corresponds to the identifier of individuals;

•  the next column contains the variable to explain;

•  the following columns represent the explanatory variables;

•  the column furthest on the right indicates the weight associated with each case.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 167/176

The Discriminant and its methods

167

View the Decision Tree

This window offers a graphical representation of the Decision Tree. You can adjust thedisplay scale with the Zoom in/Zoom out commands, or by clicking on the correspondingicons in the Toolbar.

The tree is presented horizontally, starting from the root on the left and moving towardsthe terminal node leaves on the right. Each node offers the distribution of the estimatedconditional probability of the predicted variable in absolute (real elements), and relative(percentages). In the upper right of the window, a caption is available associatingcategorical variables with the color codes used. Attention! If an adjustment is requested,the tool shows the adjusted estimated probabilities. In the upper part of the node there isshown the decision rule (variable -- operator -- value) related to the creation of a leaf node.

By clicking on a terminal node in the Tree, it is possible to obtain additional information,

supplied on the right side of the window: the full path from the root to the active terminalleaf node, and the relevance of the candidate variables for the segmentation. The lattermay be sorted according to the name of the variable, or according to the value of thequality of the segmentation (click on the List's header).

You can also explore further the subset of individuals circumscribed by the terminal node,or control interactively your analysis.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 168/176

IDT 2 - Interactive Decision Tree 2

168

Information on the roots

When you click on a specific node, you can carry out an in-depth analysis by shifting tothe Local exploration window via the Local exploration menu, or by clicking on thecorresponding tab.

Path information and the relevance of the variables are repeated.

It is also possible to view individual elements present in the selected node, together withtheir values for each variable in the analysis. Note that for each root there corresponds aconclusion assigned by the method: the individuals that do not correspond to thisconclusion are shown in red.

Finally, it is possible to go deeper into the analysis of the root by requesting, for eachvariable, in the lower part of the window, descriptive statistics on the whole set of

individuals (the root of the Tree).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 169/176

The Discriminant and its methods

169

Information on the Decision Tree

This Window lets you judge the quality of the Decision Tree. The Window is divided intoseveral areas:

•  Characteristics of the Tree: shows the properties of the Decision Tree produced by themethod, as well as the number of nodes in the tree, the number of terminal node leaves

and its maximum depth. Also shown is the size of the sample used for training, for thetest and, if required, for the pruning.

•  Impact of the attributes: shows the role of each attribute in the elaboration of the Tree.The value indicated represents the weighted mean of the impact of each attribute on allthe segmentation candidates. Less importance is given to the impacts measured on thelower parts of the Tree.

•  Confusion Matrix: lists the confrontation between the predictions of the tree and thevalues observed on the dependent variable to predict. The matrix may be measured onthe training sample, on the test sample, or in crossed validation. These last two optionsare active if they were requested during the parameter setting for the procedure.

•  Profile: presents the current confusion matrix in the form of a row profile (e.g. tomeasure sensitivities), or in the form of a column profile (to measure the specifics).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 170/176

IDT 2 - Interactive Decision Tree 2

170

Explore and modify the Decision Tree

The originality of the IDT procedure rests largely on the fact that the user can explore andedit the Tree: either by changing the tree produced by the induction procedure, or byconstructing it from zero using their expert knowledge.

Several tools are made available to the user, allowing them to set the properties of thenode segmentations, while letting them prune the parts of the Tree that are of littleinterest.The operators available that can be applied, either on the set of the nodes (leaves, moregenerally), or on a previously selected node.

Operations on a node in the TreeBy right clicking on a node, you have access the context menu. According to the status ofthe node (leaf or internal node), different options are available. The options let the userspecify precisely the Tree which will be best suited for the current analysis.

Two main operators are available: Prune for a node within the tree, and Segment for a leafnode.

Prune a sub-TreePruning a sub-tree consists in deleting the nodes and leaves located beneath a node thathas been previously selected. This operation is necessary when we consider that thecorresponding sub-tree does not add anything to the active analysis; or when we want tomanually induce another segmentation starting from the selected node.

Warning! This operation is only possible on the internal nodes of the Tree.Procedure

1.  Select the node from which you wish to start the pruning process2.  Right click and the Prune menu is available, if the node is not a leaf3.  Click on the Prune menu

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 171/176

The Discriminant and its methods

171

Segment a Tree nodeFor each tree node, we have the list of candidate variables for segmentation, with theirrespective impacts. At your convenience, you can sort these variables by name or byrelevance so as to recover the variables of interest.

A first originality of the IDT procedure is the ability given to the user to introduce thesegmentation that they seem the most pertinent, either by following suggestions of theIDT method, or by choosing themselves the segmentation variable.

A second very useful innovation is the possibility given to the user to themselves changethe properties of a segmentation, by letting them introduce the discretization limit for acontinuous variable. For example, the method proposes, for a segmentation according toage, to set the limit at the 17.5 age level. On the basis of their personal knowledge of theproblem, the user may decide to change this value and manually set a limit of 18 years,corresponding to adulthood.

Segmentation is impossible in three specific cases:

•  the terminal node is not a leaf. In other words, it has already been segmented, andalready has child nodes;

•  the node is empty and there are no cases on the node;

•  The node is pure, which means that a single attribute of the variable to predict isattached to the node, and in this case the decision rule is unambiguous, so it ispointless to take the analysis any further.

Attention! In this setting the rules for halting the expansion of the tree are deactivated(e.g., minimum elements on the node, specialization threshold etc.).

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 172/176

IDT 2 - Interactive Decision Tree 2

172

Change the properties of a segmentationIDT lets the user select the variable the most relevant for a segmentation. It also lets theuser change the properties of the segmentation they have selected.

According to the type of the variable in play in the segmentation, the procedure changes:

•  change the discrete/continuous threshold for the continuous variables

•  change the re-grouping of the categorical variables (nominal or ordinals)

Procedure The procedure for changing the properties of the segmentation has a part in common withthe procedure for manual segmentation.

1.  Select the leaf node you want to segment2.  Right click on the leaf node - So that the Segment with... menu is active, the leaf node

must be a leaf and the segmentation must be possible.3.  In the dialogue box which appears, we see the list for explanatory variables candidates

and the segmentations they propose. The sort function of the variables respects the sortfunction requested in the Decision Tree window.

4.  To change the properties of the variable under selection, you must click on the Changebutton

5.  Depending on the type of the variable, one of two dialogue boxes appear:

•  for the continuous variables6.  the dialogue box indicates the variable on which we are working, and offers the

discretization limit used up until now7.  the user must then enter a new threshold. Attention! The edit area does not

accept numerical values, and the decimal point character is the full-stop.8.  now validate your new threshold by clicking on the OK button

•  for the categorical variables (nominal -- ordinals)6.  The dialogue box indicates, in the list on the left, the sub-trees (leaves coming

from the segmentation procedure) are output, and in the list on the right, thelevels available for developing the sub-trees

7.  to change the content of a sub-tree, all its contents (the levels of the explanatoryvariable) must be passed to the list on the right with the help of the ">>" button,then transfer them, by default, to another sub-tree, with the help of the "<<"

button8.  you can add or delete a sub-tree with the help of the "+" et "-" buttons9.  when the changes have been completed, you must validate the new

segmentation with the help of the OK button

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 173/176

The Discriminant and its methods

173

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 174/176

IDT 2 - Interactive Decision Tree 2

174

Edit a Tree by levels

The user can examine various options on each node. They also have the ability tointeractively edit the Decision Tree while moving through the hierarchical structure of thenodes.

In this context, the procedure will carry out the operation requested on all the leavessituated at the lowest level of the tree.

Two types of operation are available:

•  Move up one level: the procedure prunes all the nodes situated on the penultimatelevel of the tree.

•  Go down one level: for each leaf situated on the last level of the tree, the procedurelooks for the most efficient segmentation. Warning! The rules for stopping theexpansion of the tree are deactivated at this level.

ProcedureAccording to the operation requested, click on the Operations menu -- Go up one level orOperations -- Go down one level

Continue with an Automatic Analysis

At all stages during the exploration and editing of the Tree, the user has the option torequest the procedure to continue the construction of the model automatically; using theoptions specified when the method's parameters were set. Users can, for example, identifythe segmentation which they find the most interesting on the tree root, then ask theapplication to automatically continue the search for the best tree following this first cut.

All the options selected are active in this context, in particular the rules for halting theexpansion of the Tree.

Procedure

•  Make sure the Decision Tree is selected

•  Click on the menu for: Operations -- Automatic analysis

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 175/176

The Discriminant and its methods

175

Save and Backup procedures

On the first execution of the procedure, the Decision Tree is saved with the title of DefaultAnalysis. The Tree is shown by default when the IDT procedure starts up.

You can freely edit and change the Tree supplied by default. The results can then be savedin two different ways:

•  you can either save the Tree by overwriting the previous version,

•  or save a new version of the Tree for the same problem with a suitable title.

At anytime you can reload into IDT a decision tree that you have saved. The differentversions are identified by the titles you have given them.

Warning! Changing the analysis parameters automatically deletes all the saves carried

out for the problem analyzed. If you want to save your work permanently, you areadvised to use reports or exports.

Save the current treeOn the execution of the IDT procedure, a tree with the title Default Analysis isautomatically created. This is the tree shown when the IDT procedure starts up. The usercan personalize this tree, and save the results of their changes permanently.

In general, it is possible to save any tree on which the user is working.Procedure

1.  Click on the File — Save menu2.  IDT deletes the old version of the tree and replaces it with the new one.

Save a new version of the TreeWhen working on an analysis, the user may wish to work in parallel on several differentscenarios corresponding to multiple trains of thought: you therefore have the option tosave individual versions of the Tree with different names.ProcedureThe user wants to save a new version of the Tree, from which they have pruned a branch.

1.  Proceed with pruning a part of the Tree2.  Then click on the File -- Save as... menu3.  A dialogue box appears, asking the user to give a new title to this new version of

the Tree. This operation is obligatory, since the different versions aredistinguished by their titles.

4.  Click on the OK button

Warning! by clicking on File -- Save, the user overwrites the version in memory.

8/18/2019 SPAD7 Data Miner Guide

http://slidepdf.com/reader/full/spad7-data-miner-guide 176/176

IDT 2 - Interactive Decision Tree 2

Load a Decision TreeAt any time in IDT, the user can load into memory a previously saved version of the tree.For each version of the tree there is a title assigned by the user.

Procedure1.  Click on the menu File -- Open2.  A box lists the different versions associated with the current problem3.  Select the analysis you want by clicking on its title4.  And confirm by clicking on OK

Export RulesAny Decision Tree may be transformed into a rules base without loss of information. Arule is a path leading from the root to the given terminal leaf node. The conclusionassociated with the rule corresponds to the conclusion associated with the leaf node.

The rule is therefore of the form: If condition Then conclusion

IDT produces the list of rules associated with a tree in HTML format. Additionalinformation is provided: the support for the rule which corresponds to the number ofindividuals concerned by the rule; this is an indicator of the reliability of the rule. Theconfidence of the rule indicates the percentage of individuals correctly classified by therule. This is an indicator of the precision of the rule.