Upload
derick-woods
View
216
Download
0
Embed Size (px)
Citation preview
8 October 2010 1
PASW-SPSS
Predictive Analytics SoftWare Statistical Package for Social Sciences
Collect data from experiments or questionnaires or time series
Organize data and check their validity Extract and generate new data Analyze data and perform statistical tests Predict other data through Regressions
Comes in different packages Version 18 Windows
Competitors: Eviews, SAS, Lisrel, R, Stata
8 October 2010 2
Questionnaire and variables Answers
Open Single closed Multiple closed: must be coded into many
variables Single/Multiple with “other (please specify)”
Variable types Numeric, text, date
Variable measure scale, ordinal, nominal
Missing value: usually the last possible value
8 October 2010 3
Variables
Variable view Name Type Label Values Missing Measure
8 October 2010 4
Variables User-missing values: coded with specific number
No answer Wrong or incomprehensible answer Impossible to answer
System-missing values: represented through a · you forgot to input the datum no result of transformation
bad mathematical operations partial recoding
Code-book Print Variable View window File Display data file information
8 October 2010 5
Program overview
Three different windows Data/Variables .sav
Cases in rows, variables in columns Variable information
Output .spv summary and results of every
operation File Export …
8 October 2010 6
Program overview
Menus change slightly according to Data or Syntax or Output window
File open, save current window, print
Edit cut, paste, copy, options Options Output Labels Options General Language
View value labels
8 October 2010 7
Program overview Transform:
modify data content with horizontal operations Data:
modify data structure vertical operations
Analyze: analyze data and perform tests Graphs: produce graphs Windows: move windows Help: tutorial, case studies, statistic coach
8 October 2010 8
Program overview
Interesting icons
Recall recently used dialogs
Variables
Value labels
8 October 2010 9
Inserting data
Inserting manually Getting data from Excel files
copy & paste File Open… Data Excel
Getting data from Text files File Open… Data Text
Delimited width Fixed width
8 October 2010 10
Exercises
Build the SPSS’ variables structure for customer satisfaction questionnaire
Import the SPSS data for Internet Behavior questionnaire and then build its data structure
8 October 2010 11
Statistical tests for PASW-SPSS
Example: we want to study the age of Internet users, checking whether the expected value is 35 years or not The only information we have are the
observations on a sample of 100 users, which are: 25; 26; 27; 28; 29; 30; 31; 30; 33; 34; 35; 36; 37; 38; 30; 30; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 20; 54; 55; 56; 57; 20; 20; 20; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35.
8 October 2010 12
Statistical tests for PASW-SPSS
Test’s hypotheses: H0: expected value is 35 H1: expected value is not 35
We calculate the age average on the sample, 36.2, which is an estimation for the expected value. We compare this result with the 35 of the H0 hypothesis and we find a difference of +1.2.
We ask ourselves whether this difference is: large , implying that the expected value is not 35 and thus H0
must be rejected small and it can be caused by random fluctuation in the
sample choice and therefore H0 must be accepted.
8 October 2010 13
Statistical tests for PASW-SPSS
In order to answer, the test provides us with a significance: probability that H0 is not false In this example significance is 16%
If significance is large, we accept H0
this implies that we do not know If significance is small, we reject H0
this implies that we are almost sure that H0 is false
8 October 2010 14
Typical univariate analysis techniques
Variables
Numerical description
Graphical descriptio
n
Parametric test
Non-parametric
test
nominal
Frequencies (one-
dimensional contingency
table)
Bar plotPie chart
---
Chi-square for a one-
dimensional contingency
table
scaleDescriptive statistics
HistogramBoxplot
Student’s t for one variable
Sign test
8 October 2010 15
Univariate data analysis Analyze
Descriptive statistics Frequencies… do it for every variable after data input!!!
Descriptive statistics Descriptives… Options
Chi-square test for a one-dimensional contingency table Nonparametric tests One-Sample
Settings: chi-square test Options … H0: classification follows a predetermined distribution
8 October 2010 16
Univariate data analysis Analyze
Student’s t test for one variable Compare means One-sample t-test
Test value H0: expected value = m
Sign test Nonparametric tests One-Sample
Settings: Binomial Options … Custom cut point Cut point …
H0: median = m
8 October 2010 17
Exercises Describe variable “come” Describe variable “mates” Is variable university’s grade distributed
uniformly? Is number of passed exams significantly
different from 22? Are the people who play football significantly
different from the people who do not play football?
8 October 2010 18
Typical bivariate analysis techniques
Variables
DescriptionParametric
testNon-parametric
testForecast
nominal vs
nominal
Bar plot
Two-dimensional contingency
table
---Chi square for a two-dimensional
contingency table---
binary nominalvs scale
Boxplot
Comparemeans
Student’s t for two
populationsMann-Whitney
Regression
Logistic regression
(nominal as dependent)
non binary
nominalvs scale
BoxplotComparemeans
One-way analysis of variance (ANOVA)
Kruskal-Wallis Regression
scale vs scale
ScatterplotPearson
Student’s t for paired data
Spearman
Wilcoxon signed rank test
Regression
8 October 2010 19
Contingency tables (crosstabs)
Analyze Descriptive statistics Crosstabs…
Cells… Percentages Chi-square test for a two-dimensional
contingency table Statistics… Chi-square H0: classifications are independent
8 October 2010 20
Comparing means
Analyze Compare means Means Student’s t for two populations
Independent-Samples T Test H0: expected value on population A = expected value on
population B
One-way analysis of variance (ANOVA) One-way ANOVA H0: expected value of the variable is the same on all populations
8 October 2010 21
Paired and not paired data
Not paired data Example: sex and number of exams You usually have a scale variable and a
nominal variable which splits the sample into groups (populations)
Paired data Example: MathA grade and MathB grade You usually have two scale variables on
the whole sample
8 October 2010 22
Typical paired data analysis techniques
Pearson and Spearman correlations Analyze correlate bivariate
H0: correlation is 0 Analyze correlate partial
Control variable for spurious correlation
Student’s t test for paired data Analyze Compare means Paired-Samples
T Test H0: expected value of ζ – expected value of ξ = m (unfortunately SPSS is able to test only for m=0)
8 October 2010 23
Exercises Is there a relation between
sex and playing volleyball? sex and the preferred exam? university’s and apartment’s mates’ grades. passed exams and year of birth? the degree and living with other students? having passed the Decision Theory exam and
practicing one of the indicated sports? passed exams and the degree course? passed exams and days spent here during exams? passed exams and days spent here during exams,
controlling for year of birth? university’s grade (considered scale) and living with
other students?
8 October 2010 24
Graphs Graphs Chart Builder…
Graph’s types Bar and pie Histogram
double-click double-click on bars Binning Boxplot Scatterplot
Graph’s modification Element properties (Box, X-axis, Y-axis) Chart Editor Chart templates Copy the graph
8 October 2010 25
Exercises Vertical axis from
0 to 62.5 by steps of 2.5
Green background Orange bars with
little squares and white background
No vertical label
8 October 2010 26
Exercises Vertical axis from
0 to 40 by steps of 10
Vertical degree value labels
Grey horizontal background
No vertical background and no vertical frame
Bars with vertical orange and white stripes
Rotation with a look from above
8 October 2010 27
Exercises “Hate them”
sector orange with red plusses
Large legend
8 October 2010 28
Exercises Vertical axis from
0 to 105 by steps of 5
No background Very large boxes No numbers next
to outliers Red box Thick blue median
line
8 October 2010 29
Exercises Vertical axis with
steps of 5 Green background Stacked histogram Thick red normal
line Legend Horizontal axis
value labels from 1950 to 2000 with steps of 5
Horizontal axis value aligned horizontally
No vertical axis title
8 October 2010 30
Exercises Vertical axis with
steps of 20 Yellow background Small blue full
points Horizontal axis
from 1970 to 1990 with steps of 10
8 October 2010 31
Exercises Boxplot in the
horizontal direction
Blue background Orange boxes Red, bold, large
vertical axis value label
No vertical axis title
No None and Statistics categories
Horizontal axis from 0 to 50 with steps of 5
Expanded boxes
8 October 2010 32
Is a variable normally distributed? Histogram with normal curve
Graphs Chart Builder Histogram Display normal curve
Skewness (negative: tail left, positive: tail right) and Kurtosis (neg: flat, 0: normal, pos: too pointy)
Analyze Descriptive Statistics Descriptive Options
Q-Q plot (data must be on the line) Analyze Descriptive Statistics Q-Q plot
Kolmogorov-Smirnov test Analyze Nonparametric tests One Sample
Settings: Kolmogorov-Smirnov test Options Normal H0: variable follows a known distribution
8 October 2010 33
Exercises Does this variable come from a normal
distribution: birth year? first year of elementary school? days passed here during exam’s months? months considered as numbers (1 to 12)?
8 October 2010 34
Data Sort cases …
Split file …
Select cases …
Weight cases …
8 October 2010 35
Transform Works only WITHIN case (in horizontal) Recode into different variables … Compute Variable …
Logical operators Functions, usage of MEAN function
Count values within cases … Date and time wizard … Replace missing values …
8 October 2010 36
Exercises Recode variable degree course into variable degree_type with
values 1-Bachelor and 2-Master. Do it automatically for the coded answers and then manually for the non-coded “other” answers, deciding which answer is Bachelor and which “Master”.
Build a new variable equal to 1 for the male students who like Law and the female students who like Computer Science, and equal to 0 otherwise (missing when some information is missing).
Build a new variable called MisUnderstood equal to 1-yes if the student answered that it does not live with other students but then answered to the mates’ opinion question. 0-no otherwise.
Build a new variable which is 1 if the birth year is above the degree type’s (Bachelor or Master) average and 0 if it is below or equal. Hint: averages must not be calculated with mean function, but separately with analyze Compare Means
8 October 2010 37
Exercises Build a new variable counting how many exams (of the 6 exams
list, not the total) has the student passed minus the average of the passed exams for its degree type (Bachelor or Master)
Build a new variable equal to the number of practiced sports for Bachelor students, the number of practiced sports multiplied by 1.3 for male Master students and by 1.2 for female Master students.
Build a new variable containing the age when elementary school started; put it equal to 9 (missing) when this number is smaller than 4 or larger than 8.
Build a new date variable for the start of elementary school, supposing 1 for the missing day and October for the missing month.
Build a new variable with the number of months passed from the start of elementary school till now.
Order the cases by birth year. Build a new variable replacing the missing values of the passed exams with the average of the two closest cases. Then calculate the autocorrelation.
8 October 2010 38
Weighted indexes
Indexes are always scale variables
With binary variables (0-1)
)max(
)min(
numerator
variablevariableweightindex i
iii
ii
iii
weight
variableweightindex
8 October 2010 39
Exercises on indexes Build an index to measure the sportiveness of the subject. Recode variable favorite assigning high values to scientific
subjects. Build an index to measure the interest in scientific subjects, including also the passed scientific exams.
Build an index to measure the “participation” of the subject in academic activities and in unibz, using:
the number of passed exams divided by the mean of passed exams for their degree type (weight=1, assume max is 5)
living with other students (weight=2) the grade given to university (weight=3) the likeness of room’s mates (weight=1)
Build an index to measure the general attitude of the subject when asked to grade something.
Recode variables day_el/month_el/year_el into appropriate binary variables which tell when they are missing (0) and when not (1). Build an index to measure the “memory” of the subject using presence of information on its first school’s day/month/year.
8 October 2010 40
Exercises on tests Study the relation between the sportiveness of the
subject and the number of passed exams. Study the relation between the interest in scientific
subjects and the sportiveness. Does the sportiveness depends on the year of
birth? And on the degree course? Is there a relation between interest in scientific
subjects and participation? Is there a relation between the memory of the
subject and the general attitude towards grading? If any of the previous relations exists, check
whether there is a partial correlation due to birth year.
8 October 2010 41
Regression
year_el = c0 + c1 × birthyear_el = 67.2 + 0.97 × birth
R2 = 0,899
8 October 2010 42
Regression
Analyze Regression Linear
Plots … Standardized residuals plot Save … How to draw a scatterplot with regression
line Curve estimation Binary Logistic
8 October 2010 43
Exercises Using the appropriate regression, decided
after looking at the variables types and at the scatterplot, build a regression model for:
Sportiveness based on having passed the Commercial Law A and Commercial Law B exams
Degree type based on year of birth Interest in scientific subjects based on sex and age Attitude towards grading based on year of birth Calculate approximate age when questionnaire
was submitted. Build a regression model for degree type based on age.
8 October 2010 44
Nonparametric tests philosophy Non-parametric tests consider only order
and not values For example: 2, 5, 6, 7 is the same of 2, 5, 6,
999
We have already seen: Chi-square Spearman correlation
Non-parametric tests check the position of distributions
Parametric tests usually check the averages’ values
8 October 2010 45
Analyze Nonparametric Tests
Mann-Whitney test Independent Samples Settings: Mann-Whitney U H0: position/order of distribution for population A = position/order
of distribution for population B
Kruskal-Wallis test Independent Samples Settings: Kruskal-Wallis 1-
way ANOVA H0: position/order of distribution of populations is the same
Wilcoxon matched pair signed rank test Related Samples Settings: Wilcoxon matched-
pair signed-rank H0: position/order of distribution of variable 1 =
position/order of distribution of variable 2
8 October 2010 46
Exercises on non-parametric tests Study the relation between the order of
year of birth and sex. Study the relation between the order of
sportiveness of the subject and the number of passed exams.
Does the position of the distribution of sportiveness depend on the degree course? And on the degree type?
Study the relation between the order of interest in scientific subjects and the sportiveness.