Dr. Michael R. Hyman, NMSU
Data Preparation
2
File, Record, and Field
3
Data Matrix
4
Data Entry
Process of transforming data from research projects to computers
5
(1) Validation(2) Editing(3) Coding(4) Data entry/transcription(5) Machine cleaning of data
Five Steps for Data Preparation
6
Check that interviews conducted as specified• Ensure respondent qualified• Interviewer looked/acted professionally• Interview conducted in proper environment• All appropriate questions asked
Validation
7
Check for:• Omissions• Ambiguities• Inconsistencies• Proper skip patterns • Properly recorded answers, especially
to open-ended questions
Editing: Personal Interviews
8
Check for:• All questionnaire sections and key
questions answered• Respondents understood instructions
and took task seriously• No missing pages• Questionnaire returned before cutoff
date
Editing: Self-Administered Questionnaires
9
Solutions for Editing Problems
• Re-contact respondent
• Discard questionnaire
• Use only good items
– Data analysis implications (beyond scope of class)
10
Coding
• Process of grouping and assigning numeric codes to different question responses
• Closed-ended questions easier because pre-coded
11
Pre-coding Example
12
Coding an Open-Ended Question
• Generate list of responses
• Consolidate responses (subjective judgment)
• Set response category codes
• Assign independent response category and record associated numeric code
13
Portion of Travel Study Code Book
14
• Validated, edited, and coded questionnaires given to data entry operator
• More accurate and efficient to go directly from questionnaire to data entry device and storage medium
• Skip coding sheets
Data Entry Process
15
Data Transcription
16
• Checking entered data for internal logic by either the data entry device or another connected device
• Excel/Quattro and SPSS rely on dumb data entry• Require data cleaning
Intelligent Data Entry
17
Machine Cleaning of Data
• Computerized error check– Identifies and suggests fixes for logical
errors• Marginal report
– Computer-generated table of response frequencies for questions
– Monitor entry of valid codes and skip patterns
18
Machine Cleaning Instructions
19
Recoding Data
20
Recoding Data
• Using computers to convert original codes used for raw data into codes that are more suitable for analysis
• Var1 = 8 - Var1
21
Collapsing a Five-Point Likert Scale
22
Coping with Missing Data
23
24
Item Non-response to Questions of Fact
25
Ways to Handle Missing Responses
• Leave blank
• Case-wise deletion
• Pair-wise deletion
• Mean response
• Imputed response