25
1 Postcollection Processing of Survey Data: Coding and Weighting Professor Ron Fricker Naval Postgraduate School Monterey, California Reading Assignment: Groves et al. chapter 10 3/26/13

Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

1

Postcollection Processing of Survey Data: Coding and Weighting

Professor Ron Fricker!Naval Postgraduate School!

Monterey, California!

Reading Assignment:!Groves et al. chapter 10!

3/26/13

Page 2: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

2

Goals for this Lecture

•  Talk about coding and what it means to code survey data!

•  Discuss the various types of editing from simple coding checks to assessing the quality of the survey data!

•  Describe the various types of survey weights and how to calculate and use them!–  Weights for differential selection probabilities!–  Unit nonresponse weights!–  Poststratification weights!

3/26/13

Page 3: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Post-survey Processing

•  There are a number of steps between collection of survey data and analysis!–  Varies by mode of data collection!–  Easier or harder depending

on a number of factors!

3

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 3

Page 4: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Terminology

•  Coding: Process of turning text-based answers into numerically-coded categories!

•  Data entry: Process of entering numeric data into data files!

•  Editing: Examination of data file to detect errors and inconsistencies (to possibly correct or delete)!

•  Imputation: Repair of item-level missing data by estimating and inserting an answer in the data field!

•  Weighting: Adjustment of survey statistic computations to account for sampling design, nonresponse, and noncoverage!

•  Variance estimation: Computation of the variance of the sampling distribution of a statistic !

4 3/26/13 4

Page 5: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Coding for Closed-ended Items

•  Example: “Indicate the extent to which you agree or disagree with the following statement: OA4109 is the best class I’ve taken at NPS.”!

! !Strongly agree!! !Agree!! !Neutral!! !Disagree!! !Strongly disagree!

•  When creating codes, be sure direction of scale from positive to negative always the same !–  Makes it easier to interpret and report results!

5

Code 1 2 3 4 5

3/26/13 5

Page 6: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

More on Coding Closed-ended Responses

•  When re-coding like this, main issue is whether the imposed structure is correct:!–  “Agree” – “Strongly Agree” = 1!–  “Neutral” – “Agree” = 1!–  Etc.!

•  Clearly, for example, this is not reasonable:!!!

!Absolutely Agree!!Very strongly agree!!Neutral!!Absolutely disagree!

3/26/13 6

Code 1 2 3 4

Page 7: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

My Take

•  Only matters if doing quantitative analysis!•  No issues with purely numeric scales!•  Three and 5-point Likert scales okay!

–  Clear extremes and neutral with equal distance!–  Just must assume agree and disagree equal

distance between neutral and appropriate end!•  Even Likert scales and qualitative scales with

higher number of points problematic (at best)!–  Good reason not to try to label every point on

longer scales – just stick with numbers and labeled endpoints!

3/26/13 7

Page 8: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Coding for Open-ended Items

•  Coding is both an act of translation and an act of summarization!

8

“Well, in our house we have a dog, a cat, and two parakeets. Oh yeah, and my stepdaughter visits every Tuesday and brings her ferret…”

“code = 5”

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 8

Page 9: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Example of Field Coding

9 Source: Survey Methodology, 1st ed., Groves, et al, 2004.

9

Page 10: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Example of Standard Codes

10

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 10

Page 11: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Editing

•  Most basic form of editing is accomplished via different kinds of data checks!–  Range checks!–  Ratio checks!–  Balance checks!–  Outlier checks!–  Consistency checks!–  Logic checks!–  Comparisons to historical data!

•  Basic point: Sanity check the raw data file!

3/26/13 11

Page 12: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Editing Suggestions

•  Tables and plots help easily identify odd data!–  Bar charts, histograms and scatterplots!–  Don’t forget to look at amount of missingness!

•  Can identify problem items or skip patterns!•  On longer / complicated instruments, logic

checks important!–  If service = Navy and rank in {1LT, 2LT, CPT,

MAJ, LTC, COL} then problem_flag_1 = 1!–  If family_sep_allowance > 0 and deploy_ind = 0

then problem_flag_2 = 1!

12 3/26/13 12

Page 13: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Weighting

•  Three main types:!– Weighting for differential selection

probabilities!– Weighting to adjust for unit nonresponse!– Poststratification weighting!

• Variance reduction!• Undercoverage!• Unit nonresponse!

3/26/13 13 13

Page 14: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

What is a Weight?

•  Think of it like the number of units in the population that the respondent represents!

•  SRS is simplest: imagine a population of size N=2,000,000 from which we take a sample of size n=1,000 –  Then the probability of selecting any one unit is

or –  Each unit in the sample gets a weight of

•  That is, each unit in the sample represents

2,000 units in the population!

3/26/13 14

π = n N π = 0.0005

1 π = N n = 2,000

Page 15: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Weights with Stratification

•  Imagine of the 2,000,000 individuals, 500,000 are Marine Corps and 1,500,000 are Army!–  The sample will be 500 from each service!

•  Then!

•  And!

3/26/13 15

πArmy = nArmy NArmy = 500 1,500,000 = 0.00033

πUSMC = nUSMC NUSMC = 500 500,000 = 0.001

wArmy = 1 0.00033= 3,000

wUSMC = 1 0.001= 1,000

Page 16: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Weighting for Differential Probabilities

•  From the Horvitz-Thompson estimator, the weight is just the inverse of the selection probability:!

•  An equivalent way to write this is !

16

1 1

1 1 1n n

w i i ii ii

y y w yN Nπ= =

= =∑ ∑

3/26/13 16

1 1

n n

w i i ii i

y w y w= =

=∑ ∑

Page 17: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Examples

•  Consider SRS (just to show it makes sense):!

•  Now, assume two strata with N1+N2=N, samples of size n1<N1 and n2<N2 and n1+n2=n:

( ) ( )1 1 1

1

1 1

/ /1

/

n n n

i i i i ni i i

w in ni

ii i

w y N n y N n yy y

N nw N n

= = =

=

= =

= = = =∑ ∑ ∑

∑∑ ∑

yw =wi yi

i=1

n

wii=1

n

∑=

N1 / n1( ) yii=1

n1

∑ + N2 / n2( ) yii=n1+1

n

N1 / n1 + N2 / n2i=1

n2

∑i=1

n1

∑=

N1y1 + N2 y2

N

3/26/13 17

Page 18: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Weighting to Adjust for Unit Nonresponse

•  Weight is the inverse of response rate for each categorical group!–  Same idea as weight for differential selection probability!

•  Assumes data is missing at random (MAR) within observed groups!–  Example: In SRS-based survey of NPS students, response

rate for males was 50% while for females it was 80%!•  Nonresponse weight for males = 1/0.5 = 2.0!•  Nonresponse weight for females = 1/0.8 = 1.25!•  Idea is that each male respondent’s answer counts for

both himself and one nonrespondent while each female respondent’s answer counts for herself and a quarter of a nonrespondent!

3/26/13 18

Page 19: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Calculating Sampling Weights

19

wLatino = 24,937,500 / 62,500 = 399

wNon-Latino = 174,562,500 / 62,500 = 2,793

So, Latinos oversampled at a rate of 7 times that of Non-Latinos (2,793/399 = 7)

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 19

Page 20: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Adjusting Weights for Nonresponse

20

•  Note the differential response by age •  Nonresponse rate lower if younger

•  Hence, they are proportionally underrepresented

•  What’s the major assumption with adjusting for nonresponse?

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 20

Page 21: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Poststratification Weighting

•  Poststratification weights most frequently used to ensure survey totals match known population totals!

•  Can only adjust for nonresponse using variables known on everyone in sample!– Has to be based on external (not survey)

data!– Often these are demographic types of

variables!

21 3/26/13 21

Page 22: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Example Continued with Poststratification Weighting •  Population is known to be 52% female

and 48% male!– However, survey results – perhaps using

sampling weights and/or nonresponse weights – differ !

– Survey results show 50% female and 50% male!

– Then adjust female weight by 0.52/0.50 = 1.04 and male weight by 0.48/0.50 = 0.96!

3/26/13 22 22

Page 23: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Weighting: Putting It All Together

23

Source: Survey Methodology, 1st ed., Groves, et al, 2004.

3/26/13 23

Page 24: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

Poststratification Weighting

•  When there are multiple poststratification variables can get pretty complicated!

•  Options!–  Raking (aka sample balancing)!

•  Method for sequentially adjusting weights for each variable until converges!

•  See www.abtassociates.com/presentations/raking_survey_data_2_JOS.pdf!

–  Logistic regression!•  Fit logistic regression model and use predicted

probabilities as weights!

24 3/26/13 24

Page 25: Postcollection Processing of Survey Data: Coding and Weightingfaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture... · 2013. 3. 27. · Data: Coding and Weighting Professor

25

What We Have Covered

•  Talked about coding, what it means and how to code survey data!

•  Discussed various types of editing, from coding checks to assessing data quality!

•  Described the various types of survey weights and how to calculate and use them!–  First-stage ratio adjustment!–  Differential selection probabilities!–  Unit nonresponse !–  Poststratification!

3/26/13