27
Organizing Your Data for Statistical Analysis in SPSS Edward A. Greenberg, PhD ASU HEALTH SOLUTIONS DATA LAB REVISED JANUARY 4, 2013

Organizing Your Data for Statistical Analysis in SPSSeagle/spss/organizing_data_for_spss.pdf · SPSS Data Sets • Rows are cases or observations • Columns are variables (measurements)

  • Upload
    buidat

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Organizing Your Data for Statistical Analysis in SPSS Edward A. Greenberg, PhD

ASU HEALTH SOLUTIONS DATA LAB

REVISED JANUARY 4, 2013

SPSS Data Sets

SPSS Data Sets

SPSS Data Sets

• Rows are cases or observations

• Columns are variables (measurements)

• Up to 231-1 columns (2,147,493,647)

• No limit on the number of cases

Variable Types

• Numeric (40 character maximum

length)

• Dates and times (various formats)

• Other variations of numeric (currency,

comma, scientific notation, etc.)

• String (32,767 maximum length)

Variable Names

• Variable names must be unique.

• Variable names may be up to 64

characters in length.

• Names can contain letters, numbers, or

special characters.

• Names must start with a letter or @, #,

or $.

Unit of Analysis

What constitutes a “case?”

• A person

• A household

• An organization

• An experimental trial

Level of Measurement

• Nominal

• Ordinal

• Interval

• Ratio } Scale

Labeling Data

• Variable names may be short and

cryptic.

• Variable labels can be up to 255

characters.

• SPSS procedures display at least 40

characters of variable labels.

• Value labels can be up to 120

characters.

Order of Variables

• The order of variables in the SPSS data

file normally should be the same as the

order of items in the questionnaire.

• Use variable names that help you

identify the scale or instrument to which

they apply.

Case Numbers

• Each case in an SPSS file should

include a case number.

• Often this will be the first variable in the

file.

• The case number does not identify the

subject but it links the data record to

the subject’s questionnaire.

• Useful for correcting data entry errors

Create a Codebook

• When preparing to enter your data into SPSS, prepare a codebook for the data set.

• The codebook documents all of the items to be entered in the data set:

– Variable names and labels

– Variable types and formats

– Coded values for categorical items

– Missing values

Sample Codebook

VARIABLE

NAME TYPE & LENGTH

DESCRIPTION / VARIABLE LABEL / CODED VALUE / VALUE

LABEL

CASENO

NUM 3

Case number

Case number

SEX STR 1 6. I am:

M Male

F Female

AGE NUM 2 7. My age is:

(Code actual age in years)

EDUC NUM 1 8. What is the highest level of education that you have completed?

Education level

1 No formal education

2 Some grade school

3 Completed grade school

4 Some high school

5 Completed high school

6 Some college

7 Completed college

8 Some graduate work

9 A graduate degree

Missing Data

Data may be missing for several reasons:

• Don’t know

• Refused to answer

• Not applicable

• Skipped a question

• Instrument problem

• Data entry omission

Missing Values

SPSS provides several ways of

designating numeric data as “missing

values.”

• A blank cell is treated as “system

missing,” represented by a dot (“.”) in

the SPSS Data Editor.

• Specific values can be declared as

“user missing” values.

Missing Values

• Up to three “user missing” values can

be declared for a variable.

• Or, a range of values plus one

additional value can be declared to be

missing.

Missing Values

Missing Values

In this example, variable AGEWED has

three labeled values that are to be treated

as missing

Missing Values

The three values are declared to be

missing in the Missing Values dialog.

Missing Values

• Expressions handle missing values in

different ways.

• The result of (var1+var2+var3)/3 is

missing if any of the three variables is

missing.

• The result of MEAN(var1, var2, var3) is

missing if all three of the variables are

missing.

Missing Values in Procedures

The FREQUENCIES procedure excludes

cases with missing values from computations.

Multiple Responses

• Multiple-response items are questions that

can have more than one value for each

case.

• Two ways of coding:

– For each response, a variable can have one

of two values e.g., 1=Yes and 2=No (“multiple-

dichotomy” method)

– Create a series of variables for 1st choice, 2nd

choice, etc. (“multiple categories” method)

MULT RESPONSE Procedure

• In the MULT RESPONSE procedure, multiple response variables are combines into groups.

• The MULT RESPONSE procedure counts responses in multiple response groups in frequency or cross tabular tables.

• Total percentages of responses generally will exceed 100%.

Repeated Measures

• Data that are recorded on more than

one occasion for each subject

• Some procedures, such as GLM,

require that all measurements for a

case be on the same data record.

• Other procedures, such as the MIXED

procedure, may expect one data record

per occasion.

One data record per subject, one variable per occasion on which it is measured

Repeated Measures

One data record per occasion per subject

Repeated Measures

Repeated Measures

The good news is that SPSS allows you

to easily restructure a data set

• Restructure selected variables into

cases

• Restructure selected cases into

variables

• Transpose all data