46
8 October 2010 1 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time series Organize data and check their validity Extract and generate new data Analyze data and perform statistical tests Predict other data through Regressions Comes in different packages Version 18 Windows Competitors: Eviews, SAS, Lisrel, R, Stata

8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

Embed Size (px)

Citation preview

Page 1: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 1

PASW-SPSS

Predictive Analytics SoftWare Statistical Package for Social Sciences

Collect data from experiments or questionnaires or time series

Organize data and check their validity Extract and generate new data Analyze data and perform statistical tests Predict other data through Regressions

Comes in different packages Version 18 Windows

Competitors: Eviews, SAS, Lisrel, R, Stata

Page 2: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 2

Questionnaire and variables Answers

Open Single closed Multiple closed: must be coded into many

variables Single/Multiple with “other (please specify)”

Variable types Numeric, text, date

Variable measure scale, ordinal, nominal

Missing value: usually the last possible value

Page 3: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 3

Variables

Variable view Name Type Label Values Missing Measure

Page 4: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 4

Variables User-missing values: coded with specific number

No answer Wrong or incomprehensible answer Impossible to answer

System-missing values: represented through a · you forgot to input the datum no result of transformation

bad mathematical operations partial recoding

Code-book Print Variable View window File Display data file information

Page 5: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 5

Program overview

Three different windows Data/Variables .sav

Cases in rows, variables in columns Variable information

Output .spv summary and results of every

operation File Export …

Page 6: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 6

Program overview

Menus change slightly according to Data or Syntax or Output window

File open, save current window, print

Edit cut, paste, copy, options Options Output Labels Options General Language

View value labels

Page 7: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 7

Program overview Transform:

modify data content with horizontal operations Data:

modify data structure vertical operations

Analyze: analyze data and perform tests Graphs: produce graphs Windows: move windows Help: tutorial, case studies, statistic coach

Page 8: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 8

Program overview

Interesting icons

Recall recently used dialogs

Variables

Value labels

Page 9: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 9

Inserting data

Inserting manually Getting data from Excel files

copy & paste File Open… Data Excel

Getting data from Text files File Open… Data Text

Delimited width Fixed width

Page 10: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 10

Exercises

Build the SPSS’ variables structure for customer satisfaction questionnaire

Import the SPSS data for Internet Behavior questionnaire and then build its data structure

Page 11: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 11

Statistical tests for PASW-SPSS

Example: we want to study the age of Internet users, checking whether the expected value is 35 years or not The only information we have are the

observations on a sample of 100 users, which are: 25; 26; 27; 28; 29; 30; 31; 30; 33; 34; 35; 36; 37; 38; 30; 30; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 20; 54; 55; 56; 57; 20; 20; 20; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35.

Page 12: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 12

Statistical tests for PASW-SPSS

Test’s hypotheses: H0: expected value is 35 H1: expected value is not 35

We calculate the age average on the sample, 36.2, which is an estimation for the expected value. We compare this result with the 35 of the H0 hypothesis and we find a difference of +1.2.

We ask ourselves whether this difference is: large , implying that the expected value is not 35 and thus H0

must be rejected small and it can be caused by random fluctuation in the

sample choice and therefore H0 must be accepted.

Page 13: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 13

Statistical tests for PASW-SPSS

In order to answer, the test provides us with a significance: probability that H0 is not false In this example significance is 16%

If significance is large, we accept H0

this implies that we do not know If significance is small, we reject H0

this implies that we are almost sure that H0 is false

Page 14: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 14

Typical univariate analysis techniques

Variables

Numerical description

Graphical descriptio

n

Parametric test

Non-parametric

test

nominal

Frequencies (one-

dimensional contingency

table)

Bar plotPie chart

---

Chi-square for a one-

dimensional contingency

table

scaleDescriptive statistics

HistogramBoxplot

Student’s t for one variable

Sign test

Page 15: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 15

Univariate data analysis Analyze

Descriptive statistics Frequencies… do it for every variable after data input!!!

Descriptive statistics Descriptives… Options

Chi-square test for a one-dimensional contingency table Nonparametric tests One-Sample

Settings: chi-square test Options … H0: classification follows a predetermined distribution

Page 16: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 16

Univariate data analysis Analyze

Student’s t test for one variable Compare means One-sample t-test

Test value H0: expected value = m

Sign test Nonparametric tests One-Sample

Settings: Binomial Options … Custom cut point Cut point …

H0: median = m

Page 17: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 17

Exercises Describe variable “come” Describe variable “mates” Is variable university’s grade distributed

uniformly? Is number of passed exams significantly

different from 22? Are the people who play football significantly

different from the people who do not play football?

Page 18: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 18

Typical bivariate analysis techniques

Variables

DescriptionParametric

testNon-parametric

testForecast

nominal vs

nominal

Bar plot

Two-dimensional contingency

table

---Chi square for a two-dimensional

contingency table---

binary nominalvs scale

Boxplot

Comparemeans

Student’s t for two

populationsMann-Whitney

Regression

Logistic regression

(nominal as dependent)

non binary

nominalvs scale

BoxplotComparemeans

One-way analysis of variance (ANOVA)

Kruskal-Wallis Regression

scale vs scale

ScatterplotPearson

Student’s t for paired data

Spearman

Wilcoxon signed rank test

Regression

Page 19: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 19

Contingency tables (crosstabs)

Analyze Descriptive statistics Crosstabs…

Cells… Percentages Chi-square test for a two-dimensional

contingency table Statistics… Chi-square H0: classifications are independent

Page 20: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 20

Comparing means

Analyze Compare means Means Student’s t for two populations

Independent-Samples T Test H0: expected value on population A = expected value on

population B

One-way analysis of variance (ANOVA) One-way ANOVA H0: expected value of the variable is the same on all populations

Page 21: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 21

Paired and not paired data

Not paired data Example: sex and number of exams You usually have a scale variable and a

nominal variable which splits the sample into groups (populations)

Paired data Example: MathA grade and MathB grade You usually have two scale variables on

the whole sample

Page 22: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 22

Typical paired data analysis techniques

Pearson and Spearman correlations Analyze correlate bivariate

H0: correlation is 0 Analyze correlate partial

Control variable for spurious correlation

Student’s t test for paired data Analyze Compare means Paired-Samples

T Test H0: expected value of ζ – expected value of ξ = m (unfortunately SPSS is able to test only for m=0)

Page 23: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 23

Exercises Is there a relation between

sex and playing volleyball? sex and the preferred exam? university’s and apartment’s mates’ grades. passed exams and year of birth? the degree and living with other students? having passed the Decision Theory exam and

practicing one of the indicated sports? passed exams and the degree course? passed exams and days spent here during exams? passed exams and days spent here during exams,

controlling for year of birth? university’s grade (considered scale) and living with

other students?

Page 24: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 24

Graphs Graphs Chart Builder…

Graph’s types Bar and pie Histogram

double-click double-click on bars Binning Boxplot Scatterplot

Graph’s modification Element properties (Box, X-axis, Y-axis) Chart Editor Chart templates Copy the graph

Page 25: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 25

Exercises Vertical axis from

0 to 62.5 by steps of 2.5

Green background Orange bars with

little squares and white background

No vertical label

Page 26: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 26

Exercises Vertical axis from

0 to 40 by steps of 10

Vertical degree value labels

Grey horizontal background

No vertical background and no vertical frame

Bars with vertical orange and white stripes

Rotation with a look from above

Page 27: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 27

Exercises “Hate them”

sector orange with red plusses

Large legend

Page 28: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 28

Exercises Vertical axis from

0 to 105 by steps of 5

No background Very large boxes No numbers next

to outliers Red box Thick blue median

line

Page 29: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 29

Exercises Vertical axis with

steps of 5 Green background Stacked histogram Thick red normal

line Legend Horizontal axis

value labels from 1950 to 2000 with steps of 5

Horizontal axis value aligned horizontally

No vertical axis title

Page 30: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 30

Exercises Vertical axis with

steps of 20 Yellow background Small blue full

points Horizontal axis

from 1970 to 1990 with steps of 10

Page 31: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 31

Exercises Boxplot in the

horizontal direction

Blue background Orange boxes Red, bold, large

vertical axis value label

No vertical axis title

No None and Statistics categories

Horizontal axis from 0 to 50 with steps of 5

Expanded boxes

Page 32: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 32

Is a variable normally distributed? Histogram with normal curve

Graphs Chart Builder Histogram Display normal curve

Skewness (negative: tail left, positive: tail right) and Kurtosis (neg: flat, 0: normal, pos: too pointy)

Analyze Descriptive Statistics Descriptive Options

Q-Q plot (data must be on the line) Analyze Descriptive Statistics Q-Q plot

Kolmogorov-Smirnov test Analyze Nonparametric tests One Sample

Settings: Kolmogorov-Smirnov test Options Normal H0: variable follows a known distribution

Page 33: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 33

Exercises Does this variable come from a normal

distribution: birth year? first year of elementary school? days passed here during exam’s months? months considered as numbers (1 to 12)?

Page 34: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 34

Data Sort cases …

Split file …

Select cases …

Weight cases …

Page 35: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 35

Transform Works only WITHIN case (in horizontal) Recode into different variables … Compute Variable …

Logical operators Functions, usage of MEAN function

Count values within cases … Date and time wizard … Replace missing values …

Page 36: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 36

Exercises Recode variable degree course into variable degree_type with

values 1-Bachelor and 2-Master. Do it automatically for the coded answers and then manually for the non-coded “other” answers, deciding which answer is Bachelor and which “Master”.

Build a new variable equal to 1 for the male students who like Law and the female students who like Computer Science, and equal to 0 otherwise (missing when some information is missing).

Build a new variable called MisUnderstood equal to 1-yes if the student answered that it does not live with other students but then answered to the mates’ opinion question. 0-no otherwise.

Build a new variable which is 1 if the birth year is above the degree type’s (Bachelor or Master) average and 0 if it is below or equal. Hint: averages must not be calculated with mean function, but separately with analyze Compare Means

Page 37: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 37

Exercises Build a new variable counting how many exams (of the 6 exams

list, not the total) has the student passed minus the average of the passed exams for its degree type (Bachelor or Master)

Build a new variable equal to the number of practiced sports for Bachelor students, the number of practiced sports multiplied by 1.3 for male Master students and by 1.2 for female Master students.

Build a new variable containing the age when elementary school started; put it equal to 9 (missing) when this number is smaller than 4 or larger than 8.

Build a new date variable for the start of elementary school, supposing 1 for the missing day and October for the missing month.

Build a new variable with the number of months passed from the start of elementary school till now.

Order the cases by birth year. Build a new variable replacing the missing values of the passed exams with the average of the two closest cases. Then calculate the autocorrelation.

Page 38: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 38

Weighted indexes

Indexes are always scale variables

With binary variables (0-1)

)max(

)min(

numerator

variablevariableweightindex i

iii

ii

iii

weight

variableweightindex

Page 39: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 39

Exercises on indexes Build an index to measure the sportiveness of the subject. Recode variable favorite assigning high values to scientific

subjects. Build an index to measure the interest in scientific subjects, including also the passed scientific exams.

Build an index to measure the “participation” of the subject in academic activities and in unibz, using:

the number of passed exams divided by the mean of passed exams for their degree type (weight=1, assume max is 5)

living with other students (weight=2) the grade given to university (weight=3) the likeness of room’s mates (weight=1)

Build an index to measure the general attitude of the subject when asked to grade something.

Recode variables day_el/month_el/year_el into appropriate binary variables which tell when they are missing (0) and when not (1). Build an index to measure the “memory” of the subject using presence of information on its first school’s day/month/year.

Page 40: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 40

Exercises on tests Study the relation between the sportiveness of the

subject and the number of passed exams. Study the relation between the interest in scientific

subjects and the sportiveness. Does the sportiveness depends on the year of

birth? And on the degree course? Is there a relation between interest in scientific

subjects and participation? Is there a relation between the memory of the

subject and the general attitude towards grading? If any of the previous relations exists, check

whether there is a partial correlation due to birth year.

Page 41: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 41

Regression

year_el = c0 + c1 × birthyear_el = 67.2 + 0.97 × birth

R2 = 0,899

Page 42: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 42

Regression

Analyze Regression Linear

Plots … Standardized residuals plot Save … How to draw a scatterplot with regression

line Curve estimation Binary Logistic

Page 43: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 43

Exercises Using the appropriate regression, decided

after looking at the variables types and at the scatterplot, build a regression model for:

Sportiveness based on having passed the Commercial Law A and Commercial Law B exams

Degree type based on year of birth Interest in scientific subjects based on sex and age Attitude towards grading based on year of birth Calculate approximate age when questionnaire

was submitted. Build a regression model for degree type based on age.

Page 44: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 44

Nonparametric tests philosophy Non-parametric tests consider only order

and not values For example: 2, 5, 6, 7 is the same of 2, 5, 6,

999

We have already seen: Chi-square Spearman correlation

Non-parametric tests check the position of distributions

Parametric tests usually check the averages’ values

Page 45: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 45

Analyze Nonparametric Tests

Mann-Whitney test Independent Samples Settings: Mann-Whitney U H0: position/order of distribution for population A = position/order

of distribution for population B

Kruskal-Wallis test Independent Samples Settings: Kruskal-Wallis 1-

way ANOVA H0: position/order of distribution of populations is the same

Wilcoxon matched pair signed rank test Related Samples Settings: Wilcoxon matched-

pair signed-rank H0: position/order of distribution of variable 1 =

position/order of distribution of variable 2

Page 46: 8 October 20101 PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time

8 October 2010 46

Exercises on non-parametric tests Study the relation between the order of

year of birth and sex. Study the relation between the order of

sportiveness of the subject and the number of passed exams.

Does the position of the distribution of sportiveness depend on the degree course? And on the degree type?

Study the relation between the order of interest in scientific subjects and the sportiveness.