SPSS and its usage - Bijay Lal Pradhan, Ph.D. · 2016-09-23 · Spreadsheet-like system for defining, entering, editing, and displaying data. Extension of the saved file will be

SPSS and its usage

Dr. Bijay Lal [email protected]

http://bijaylalpradhan.com.np

2073/06/07 – 06/12

Copyright @ Dr Bijay Lal Pradhan

mailto:[email protected]

Ground Rule

• Mobile

• Penalty

• System

• Involvement


Object of session I

• Define Statistics and SPSS

• Install SPSS 20 and crack

• Open and exit SPSS

• Importing and exporting data

• Different format of files


What is Statistics?

• Singular form: The process of collection,

organization, presentation, analysis and

interpretation of number facts.

• Plural form: Aggregate of facts which has

different characteristics.

– Comparable

– Numerous factors effects

– Numerically expressed

– Systematically collected

– Purposefully collected

– Accurate reasonably Copyright @ Dr Bijay Lal Pradhan

Introduction: What is SPSS?

• Originally it is an acronym of Statistical

Package for the Social Science but now it

stands for Statistical Product and Service

Solutions

• One of the most popular statistical

packages which can perform highly

complex data organization, presentation

and analysis with simple instructions.


The Three Windows:

Data editor

Output viewer

Syntax editor


The Three Windows: Data Editor

• Data Editor

Spreadsheet-like system for defining, entering, editing,

and displaying data. Extension of the saved file will be

“sav.”


The Three Windows: Output Viewer

• Output Viewer

Displays output and errors. Extension of the saved file will

be “spv.”


The Three Windows: Syntax editor

• Syntax Editor

Text editor for syntax composition. Extension of the

saved file will be “sps.”


The basics of managing

software.


Installation of SPSS 20.0

• You have software SPSS 20.0 in your computer

• There are two folders namely setup and crack

• Open setup folder and double click on

application file “setup”.

• Follow the instruction and install SPSS in your

computer.

• Don’t go for licensing process. Copy "lservrc"

from crack folder and paste it into the installed

directory (C:\Programme\

IBM\SPSS\Statistics\20)Copyright @ Dr Bijay Lal Pradhan

Opening ScreenFrom start button click on IBM SPSS Statistics 20


Obtain the data

• Open your saved file with SPSS

• data1.sav


Variable View

Var

iab

le

des

crip

tio

ns

Drop down

menus

Action

buttons


Variable View window: Type

• Type

– Click on the ‘type’ box. The two basic types of variables that you will use are numeric and string. This column

enables you to specify the type of variable.


Variable View window: Width

• Width

– Width allows you to determine the number of characters SPSS will allow to be entered for the variable


Variable View window: Decimals

• Decimals

– Number of decimals

– It has to be less than or equal to 16

3.14159265


Variable View window: Label

• Label

– You can specify the details of the variable

– You can write characters with spaces up to 256

characters


Variable View window: Values

• Values

– This is used and to suggest which

numbers represent which categories when

the variable represents a category


Defining the value labels

• Click the cell in the values column as shown below

• For the value, and the label, you can put up to 60

characters.

• After defining the values click add and then click OK.

Click


Measure scale ??

• Nominal

• Ordinal

• Scale


Nominal

Gender Caste Marital status


Ordinal?

First Second Third..

ScaleScale


Scales of Measure

Scale Basic

Characteristics

Examples Examples

Nominal Numbers identify

& classify objects

Social Security

nos., numbering

of football

players

Brand nos.,

store types

Percentages,

mode

Chi-square,

binomial test

Ordinal Nos. indicate the

relative positions

of objects but not

the magnitude of

differences

between them

Quality rankings,

rankings of

teams in a

tournament

Preference

rankings,

market

position,

social class

Percentile,

median

quartile

deviation

Rank-order

correlation,

Friedman

ANOVA

Scale Zero point is fixed,

ratios of scale

values can be

compared

Length, weight Age, sales,

income, costs

Arithmatic,

Geometric

harmonic

mean range

MD SD

Z test, t-test,

ANOVA test

all other tests

Permissible Statistics

Descriptive Inferential


Data Editor Action

buttons


SPSS output viewer

Drop down

menus Action

buttons

Navigation

window


SPSS Viewer – export results


Syntax Editor

Drop down

menus

Action

buttons

Navigation

window

Export

Import


Import


Data management

with SPSS


Practice 1

A study was conducted to know the attitude of a

bank’s customer towards the bank. The question

asked to the customer was:

• “Do you feel safe in your transactions with the bank?”

• The respondents were to answer the question on a

seven-point scale (1 = Strongly Disagree, 7 = Strongly

Agree). There were other variables mentioned below on

which data was collected.

• Strongly disagree 1

• Moderately disagree 2

• Little disagree 3

Construct the following variables in the variable view

on the basis of following information

No difference 4

Little agree 5

Moderately agree 6

Strongly agree 7

Other variable

1. Sex of the respondent

Male - M Female - F

2. Marital status

Married - M Single - S

3. Income of the respondent (in rupees)

4. Age of the respondent (in years)

5. Educational background of the respondent

Below higher secondary - 1

Higher secondary - 2

Graduation - 3

Post graduation - 4

Click


Entering Data

• Copy paste can be done to copy it from

word to SPSS.

• First copy paste in to MS Excel and then

to SPSS.

• Save the data in Excel and import to SPSS

• Or save in CSV format then to SPSS


Variable/Case in and out

• Entering new variable

• Deleting the existing variable

• Entering new case

• Deleting the existing cases


Saving the data

• To save the data file you created simply click ‘file’ and

click ‘save as.’ You can save the file in different forms

by clicking “Save as type.”

Click


Sorting the data

• Click ‘Data’ and then click Sort Cases


Sorting the data (cont’d)

• Double Click ‘Name of the students.’ Then click

ok.

Click

Click


Transforming data

• Click ‘Transform’ and then click ‘Compute Variable…’

Transforming data (cont’d)• Example: Adding a new variable named ‘corrected_CI’

which is corrected confidence interval

– Type in corrected_CI in the ‘Target Variable’ box. Then type

in ‘8-CI’ in the ‘Numeric Expression’ box. Click OK

Click

Transforming data (cont’d)

• In the same way find the log(income)

• Type in “ln_income” in the ‘Target

Variable’ box. Then type in ‘lnincome ’ in

the ‘Numeric Expression’ box. Click OK

• In the similar manner Create a new

variable named “sqrtage” which is the

square root of age.


Visual Binning

• Visual Binning is the process of

arranging data in a suitable class. So

that we can tabulate data and can be

drawn conclusion from the scale type

of data.


The basic analysis

using SPSS


Scopes & Limitations of Data Analysis

If the data were collected from a random sample drawn from

a well-defined population in such a way that every unit in the

population has a known non-zero probability of being

included in the sample, then the information derived from

such sample can be generalized to the population (inferential

statistics). If the data were collected from a non-random

sample, then the information derived from sample cannot be

generalized (descriptive statistics).

If data and variables are not properly organized in a

computer, then computer software fail to provide meaningful

results.

Collection Organization Analysis ReportingPresentation


Condensation of Data

Summarizing data in tables and graphs (stem and leaf display, line graph, bar graph, pie chart and Histogram, measure of central tendency and measure of dispersion.

1. small tables (frequency tables)

2. graphs or diagrams (histogram, bar

graph, pie chart etc.)

3. summary statistics (percentage, mean,

standard deviation etc.)


The basic analysis of SPSS that will be

introduced in this class

• Frequencies– This analysis produces frequency tables showing

frequency counts and percentages of the values of individual variables.

• Graphical Presentation

– Pie chart, Bar chart, Histogram, Area chart, Line chart,

Scatter plot

• Descriptive Statistics– This analysis shows the maximum, minimum,

mean, and standard deviation of the variablesCopyright @ Dr Bijay Lal Pradhan

Descriptive & Inferential Statistics

Statistics

Descriptive Inferential

Tabular Graphical Numerical

Estimation Hypothesis Testing

Point Interval Parametric Non-Parametric

The methods of inferential statistics are applicable when results are

obtained from a random.

Uncertainty always remains while generalizing results from a sample

to a population. The degree of uncertainty is measured in terms of

probability in inferential statistics. Copyright @ Dr Bijay Lal Pradhan

What

type of

data

?

1. Prepare frequency table

2. Compute mode

3. Compute median (ordinal)

4. Draw graphs

• Bar diagram

• Pie-chart

5. Chi-square test

1. Prepare frequency table (discrete)

2. Compute mean. Median and mode

3. Compute positional statistics

4. Compute SD, range etc.

5. Draw graphs.

• Steam-and-leaf plot (discrete).

• Box-Whisker plot.

• Histogram (continuous).

• Bar diagram (discrete).

6. Z, t, F & 2 tests

7. Transform into categorical.

Nominal or Ordinal Scale data

Univariate Data Analysis

Analysis of data of a single variable at a time is univariate

analysis. The suitable univariate data analysis methods by scale of

variables are listed below


Bivariate Data Analysis

Analysis of data of two variables at a time. The kinds of data

analysis are listed below.

Nominal

Ordinal1. Prepare two-way frequency tables

2. Compute row or column percentages

3. Draw charts and diagrams

4. Test hypotheses (chi-square test of independence)

Scale

1. Prepare two-way frequency tables

2. Draw Scatter diagram

3. Test hypotheses (chi-square, z, t, F tests)

4. Carry out correlation & simple regression analysis


Frequency Distribution

Frequency distribution of a nominal/ordinal data


Frequency Distribution


Frequency table of scale data

Frequency distribution

Stem Leaf Display

Income of the Respondent Stem-and-Leaf Plot

Frequency Stem & Leaf

.00 0 .

28.00 0 . 5555566667777777889999999999

14.00 1 . 01122222333444

12.00 1 . 556667788899

12.00 2 . 011122233444

4.00 2 . 5577

Stem width: 10000

Each leaf: 1 case(s)

Diagrammatic Presentation

• Bar diagram

• Line graphs

• Pie diagram

• Scatter diagram

• Histogram


Descriptive measures

• Measure of Central Tendency

– Mean – Arithmetic, Geometric, Harmonic

– Median

– Mode

• Measure of dispersion

– range

– QD

– SD


Mean value for ungrouped data

Next method for mean


Mean Value for Grouped DataHeight Mid value frequency

146-150 148 3

151-155 153 10

156-160 158 21

161-165 163 29

166-170 168 25

171-175 173 10

176-160 178 2

Go to: Data>weight cases

Select weight cases by frequency and select ok

Then find out mean using mid value as variable

Changing Report Row-Column


Skewness - Kurtosis

• Use compare mean and find out skewness

and kurtosis of the data


Bivariate Data Analysis

Analysis of data of two variables at a time. The kinds of data

analysis are listed below.

Nominal

Ordinal1. Prepare two-way frequency tables

2. Compute row or column percentages

3. Draw charts and diagrams

4. Test hypotheses (chi-square test of independence)

Scale

1. Prepare two-way frequency tables

2. Draw Scatter diagram

3. Test hypotheses (chi-square, z, t, F tests)

4. Carry out correlation & simple regression analysis


Estimation

• Point Estimation

• Interval estimation

– Confidence Interval

• (Analyse>descriptive statistics>explore>estimation)


Fundamental of Hypothesis Testing

• There two types of statistical inferences, Estimation and

Hypothesis Testing

• Hypothesis Testing: A hypothesis is a claim (assumption)

about one or more population parameters.

– Average price of a lunch in hetauda is μ = Rs 200

– The population mean monthly cell phone bill of this city

is: μ = Rs 125

– The average number of TV sets in Homes is equal to

three; μ = 2


• It Is always about a population parameter, not about a

sample statistic

• Sample evidence is used to assess the probability that the

claim about the population parameter is true

A. It starts with Null Hypothesis, H0

• We begin with the assumption that H0 is true and any

difference between the sample statistic and true population

parameter is due to chance and not a real (systematic)

difference.

• Always contains “=” , “≤” or “” sign

• May or may not be rejected

0H :μ 3 and X=2.79


B. Next we state the Alternative Hypothesis, H1

• Is the opposite of the null hypothesis– e.g., The average number of TV sets in

homes is not equal to 2 ( H1: μ ≠ 2 )• Never contains the “=” , “≤” or “” sign• May or may not be proven• Is generally the hypothesis that the

researcher is trying to prove. Evidence is always examined with respect to H1, never with respect to H0.

• We never “accept” H0, we either “reject” or “not reject” it


A. Rejection Region Method:

• Divide the distribution into rejection and non-rejection

regions

• Defines the unlikely values of the sample statistic if the

null hypothesis is true, the critical value(s)

– Defines rejection region of the sampling distribution

• Rejection region(s) is designated by , (level of

significance)

– Typical values are .01, .05, or .10

• is selected by the researcher at the beginning

• provides the critical value(s) of the test


H0: μ ≥ 12

H1: μ < 12

0

H0: μ ≤

12 H1: μ

> 12

a

a

Representscritical value

Lower-tail test

0Upper-tail test

Two-tail test

Rejection

region is

shaded

/2

0

a/2aH0: μ = 12

H1: μ ≠ 12

Rejection Region or Critical Value Approach:

Level of significance =

Non-rejection region


• P-Value Approach –• P-value=Max. Probability of (Type I Error), calculated from the

sample.

• Given the sample information what is the size of blue are?

H0: μ ≥ 12

H1: μ < 12

H0: μ ≤ 12

H1: μ > 120Upper-tail test

Two-tail test 0

H0: μ = 12

H1: μ ≠ 12

0Copyright @ Dr Bijay Lal Pradhan

Type I and II Errors:

• The size of , the rejection region, affects the risk of making different

types of incorrect decisions.

Type I Error– Rejecting a true null hypothesis when it should NOT be rejected

– Considered a serious type of error

– The probability of Type I Error is

– It is also called level of significance of the test

Type II Error– Fail to reject a false null hypothesis that should have been rejected

– The probability of Type II Error is β


Truth

Decision H0 true H0 false

Retain H0

Correct retention

Type II error

Reject H0 Type I error Correct rejection

α ≡ probability of a Type I error

β ≡ Probability of a Type II error

Two types of decision errors:

Type I error = erroneous rejection of true H0

Type II error = erroneous retention of false H0

• P-Value approach to Hypothesis Testing:

• That is to say that P-value is the smallest value of

for which H0 can be rejected based on the sample

information

• Convert Sample Statistic (e.g., sample mean) to Test

Statistic (e.g., Z statistic )

• Obtain the p-value from a table or computer

• Compare the p-value with

– If p-value < , reject H0

– If p-value , do not reject H0


P-value (Observed Significance Level)

• P-value - Measure of the strength of evidence the sample data provides against the null hypothesis:

P(Evidence This strong or stronger against H0 | H0 is true)

)(: obszZPpvalP

Test of Hypothesis for the Mean

The test statistic is:

n

S

μXt 1-n

σ Unknownσ known

The test statistic is:

n

σ

μXZ


Steps to Hypothesis Testing

1. State the H0 and H1 clearly

2. Identify the test statistic (two-tail, one-tail, and

type of test to be used)

3. Depending on the type of risk you are willing to

take, specify the level of significance,

4. Find the decision rule, critical values, and rejection

regions. If –CV<actual value (sample statistic) <+CV,

then do not reject the H0

5. Collect the data and do the calculation for the

actual values of the test statistic from the sample


Steps to Hypothesis testing, continued

Make statistical decision

Do not Reject H0 Reject H0

Conclude H0 may be true

Make

management/business/admi

nistrative decision

Conclude H1 is “true”

(There is sufficient evidence of

H1)


When do we use a two-tail test?

when do we use a one-tail test?

• The answer depends on the question you are trying to answer.

• A two-tail is used when the researcher has no idea which

direction the study will go, interested in both direction.

• (example: testing a new technique, a new product, a new theory and we don’t know

the direction)

• A new machine is producing 12 fluid once can of soft drink. The quality control

manager is concern with cans containing too much or too little. Then, the test is a

two-tailed test. That is the two rejection regions in tails is most likely (higher

probability) to provide evidence of H1.

oz 12 :H

oz12:H

1

0

12Copyright @ Dr Bijay Lal Pradhan

• One-tail test is used when the researcher is interested in

the direction.

• Example: The soft-drink company puts a label on cans

claiming they contain 12 oz. A consumer advocate desires

to test this statement. She would assume that each can

contains at least 12 oz and tries to find evidence to the

contrary. That is, she examines the evidence for less than

12 0z.

• What tail of the distribution is the most logical (higher

probability) to find that evidence? The only way to reject

the claim is to get evidence of less than 12 oz, left tail.

oz 12 :H

oz12:H

1

0

12 1411.5Copyright @ Dr Bijay Lal Pradhan

Type of Hypothesis

• What to test

• Significance of means– Single mean test

– Double mean test• Dependent pairs

• Independent pairs

– More than two mean test


Correlation and Regression


How do we measure association

between two variables?

1. For ordinal and nominal variable

• Odds Ratio (OR)

• Chi square test of independence of

attributes

2. For scale variables

• Correlation Coefficient R

• Coefficient of Determination (R-Square)


Example

• A researcher believes that there is a linear relationship between BMI (Kg/m2) of pregnant mothers and the birth-weight (BW in Kg) of their newborn

• The following data set provide information on 15 pregnant mothers who were contacted for this study


BMI (Kg/m2) Birth-weight (Kg)

20 2.730 2.950 3.445 3.010 2.230 3.140 3.325 2.350 3.520 2.510 1.555 3.860 3.750 3.135 2.8


Scatter Diagram

• Scatter diagram is a graphical method to display the relationship between two variables

• Scatter diagram plots pairs of bivariate observations (x, y) on the X-Y plane

• Y is called the dependent variable

• X is called an independent variable


0

0.5

1

1.5

2

2.5

3

3.5

4

0 10 20 30 40 50 60 70

Scatter diagram of BMI and Birthweight


Is there a linear relationship

between BMI and BW?

• Scatter diagrams are important for

initial exploration of the relationship

between two quantitative variables

• In the above example, we may wish to

summarize this relationship by a

straight line drawn through the scatter

of pointsCopyright @ Dr Bijay Lal Pradhan

Simple Linear Regression

• Although we could fit a line "by eye" e.g. using a transparent ruler, this would be a subjective approach and therefore unsatisfactory.

• An objective, and therefore better, way of determining the position of a straight line is to use the method of least squares.

• Using this method, we choose a line such that the sum of squares of vertical distances of all points from the line is minimized.


Least-squares or regression line

• These vertical distances, i.e., the distance

between y values and their corresponding

estimated values on the line are called

residuals

• The line which fits the best is called the

regression line or, sometimes, the least-squares line

• The line always passes through the point

defined by the mean of Y and the mean of

XCopyright @ Dr Bijay Lal Pradhan

Linear Regression Model

• The method of least-squares is available in most of the statistical packages (and also on some calculators) and is usually referred to as linear regression

• Y is also known as an outcome variable

• X is also called as a predictor

Estimated Regression Line

ˆˆˆ 0

ˆ. . . int

ˆ 0 . . .

y = + x = 1.775351 + 0. 330187 x

1.775351 is called y ercept

0. 330187 is called the slope


Application of Regression Line

This equation allows you to estimate BW

of other newborns when the BMI is

given.

e.g., for a mother who has BMI=40, i.e.

X = 40 we predict BW to be

ˆˆˆ 0 (40) 3.096y = + x = 1.775351 + 0. 330187


Correlation Coefficient, R

• R is a measure of strength of the linear association between two variables, x and y.

• Most statistical packages and some hand calculators can calculate R

• For the data in our Example R=0.94

• R has some unique characteristicsCopyright @ Dr Bijay Lal Pradhan

Correlation Coefficient, R

• R takes values between -1 and +1

• R=0 represents no linear relationshipbetween the two variables

• R>0 implies a direct linear relationship

• R<0 implies an inverse linearrelationship

• The closer R comes to either +1 or -1,the stronger is the linear relationship

Coefficient of Determination

• R2 is another important measure of linear association between x and y (0 R2 1)

• R2 measures the proportion of the total variation in y which is explained by x

• For example r2 = 0.8751, indicates that 87.51% of the variation in BW is explained by the independent variable x (BMI).


Difference between Correlation and

Regression

• Correlation Coefficient, R, measures

the strength of bivariate association

• The regression line is a prediction equation that estimates the values of

y for any given x


Limitations of the correlation

coefficient

• Though R measures how closely the two variables approximate a straight line, it does not validly measures the strength of nonlinear relationship

• When the sample size, n, is small we also have to be careful with the reliability of the correlation

• Outliers could have a marked effect on R

• Causal Linear Relationship

Regression Analysis

• Click ‘Analyze,’ ‘Regression,’ then click

‘Linear’ from the main menu.


Regression Analysis

• For example let’s analyze the model

• Put ‘Beginning Salary’ as Dependent and ‘Educational Level’ as

Independent.

edusalbegin 10

ClickClick


Regression Analysis

• Clicking OK gives the result


Plotting the regression line

• Click ‘Graphs,’ ‘Legacy Dialogs,’

‘Interactive,’ and ‘Scatterplot’ from the main menu.


Plotting the regression line

• Drag ‘Current Salary’ into the vertical axis box

and ‘Beginning Salary’ in the horizontal axis box.

• Click ‘Fit’ bar. Make sure the Method is

regression in the Fit box. Then click ‘OK’.

ClickSet this to

Regression!



Is the model significant?• r2 is the proportion of the variance in y that is

explained by our regression model

• SE is also another measure check significance through

• F-statistic:

F(dfŷ,dfer) =sŷ

2

ser2

=......=r2 (n - 2)2

1 – r2

complicated

rearranging

And we should know the significance of reg.

coeff.t =

byx

S.E.

If all these satisfies than we can say model is

Fit.


For further Questions:[email protected]://bijaylalpradhan.com.np


mailto:[email protected]

Documents

SPSS and its usage - Bijay Lal Pradhan, Ph.D. · 2016-09-23 · Spreadsheet-like system for defining, entering, editing, and displaying data. Extension of the saved file will be