Upload
kerry-martin
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
1
What is… The Analysis of Longitudinal Survey Data
Paul Lambert University of Stirling
Prepared for: National Centre for Research Methods, Research Methods Festival, St Catherine’s College, Oxford, 7 July 2010
Also see: www.longitudinal.stir.ac.uk / www.dames.org.uk
July 2010: LDA 2
So what’s distinct about the analysis of longitudinal survey data?
You already know..Working with (survey) datasets with longitudinal
information (data about time) and the specialist techniques of statistical analysis that are appropriate
You maybe don’t realise..1)Groups of techniques and data types
2)Complex data and data management components
July 2010: LDA 3
1) Types of longitudinal survey data
• Survey resources
• Longitudinal [‘..of or about time..’]i. {Analysis is concerned with time} ii.Data is concerned with more than one time point
• [e.g. Taris 2000; Blossfeld and Rohwer 2002]
iii.Repeated measures over time • [e.g. Menard 2002; Martin et al 2006]
Data analysis is used to give a parsimonious summary of patterns of relations between variables in the survey dataset
July 2010: LDA 4
Types of data and analysis traditions for longitudinal surveys cf. www.longitudinal.stir.ac.uk
0. Temporal effects in cross-sectional data
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Events history datasets 5. Time series analyses
Temporal effects in single cross-sectional surveys
• Temporal effects are (a) present and (b) of interest in most social science studies
• We can measure differences between people in terms of their age / year of birth
• These matter empirically & are interesting substantively
• But we can’t tell if differences are due to age or period or cohort (or other things that are collinear with these, e.g. life course stage or major events)
July 2010: LDA 5
[Data type: 1/6]
Longitudinal statements from cross-sectional data are common...
• We typically fit linear/curvilinear trend lines for time effects
• Treiman (2009: 162): nonlinear specifications of time and age effects
– Year of birth effect on literacy in China: discontinuity at 1955; curve 1955-1967; knot at 1967
7
gamma = 0.2626 ASE = 0.010 Cramér's V = 0.1385 Pearson chi2(12) = 700.6095 Pr = 0.000
100.00 100.00 100.00 100.00 100.00 Total 1,993 3,816 3,662 2,702 12,173 0.90 1.49 2.02 4.26 2.17 5. very poor 18 57 74 115 264 3.26 5.74 6.77 13.10 7.28 4. poor 65 219 248 354 886 13.85 19.73 21.16 31.35 21.78 3. fair 276 753 775 847 2,651 49.67 46.59 47.13 38.86 45.54 2. good 990 1,778 1,726 1,050 5,544 32.31 26.44 22.91 12.44 23.23 1. excellent 644 1,009 839 336 2,828 last 12 months 1. Degree 2. Diplom 3. Higher 4. Low sc Total health status over educ4
12671 12173 12671 0.0000 0.0000 yob -0.2080 -0.3330 1.0000 12173 12173 0.0000 educ4 0.2219 1.0000 12671 qhlstat 1.0000 qhlstat educ4 yob
. pwcorr qhlstat educ4 yob, obs sig
.
Within 20’s 0.15
yob cohort, 30’s 0.28
Gamma on 40’s 0.22
educ to health 50’s 0.23
is… 60’s 0.22
70’s 0.15
80’s 0.10
Repeated cross-sections: Surveys onsame topics, on multipleoccasions, to differentpeople
8
Total 423,108 100.00 2004 11,120 2.63 100.00 2003 13,070 3.09 97.37 2002 10,930 2.58 94.28 2001 11,467 2.71 91.70 2000 10,389 2.46 88.99 1998 10,940 2.59 86.53 1996 11,798 2.79 83.95 1995 12,621 2.98 81.16 1994 12,393 2.93 78.18 1993 12,474 2.95 75.25 1992 12,913 3.05 72.30 1991 12,856 3.04 69.25 1990 12,235 2.89 66.21 1989 13,119 3.10 63.32 1988 13,000 3.07 60.22 1987 13,519 3.20 57.14 1986 13,355 3.16 53.95 1985 13,203 3.12 50.79 1984 12,651 2.99 47.67 1983 13,204 3.12 44.68 1982 13,617 3.22 41.56 1981 16,034 3.79 38.34 1980 15,662 3.70 34.55 1979 15,311 3.62 30.85 1978 15,946 3.77 27.23 1977 16,206 3.83 23.46 1976 16,513 3.90 19.63 1975 16,804 3.97 15.73 1974 15,157 3.58 11.76 1973 18,134 4.29 8.18 1972 16,467 3.89 3.89 onwards) Freq. Percent Cum. 1988 year from (financial survey year
. tab year
Adults aged 25-65 only
Data example: GHS pooled ‘time-series’ dataset (UKDA, SN: 5664)
[Data type: 2/6]
July 2010: LDA 9
Repeated cross sections
Easy to communicate & appealing: how things have changed between certain time points
Can distinguishes any 2 of age / period / cohortEasier to analyse – less data management However..
Don’t get other QnLR attractions (nature of changers; residual heterogeneity; causality; durations)
Hidden complications: are sampling methods, variable operationalisations really comparable? More on this below...
July 2010: LDA 10
Example: Labour Force Survey yearly stats
Percent of UK workers with a higher degree, by employment category and gender (m / f )
Sample size ~35,000 m / 30,000 f each year
1991 1996 2001
Profess. 14.4 19.9 24.9
Non-Prof. 1.3 2.5 3.5
Profess. 11.0 24.4 28.3
Non-Prof 0.6 2.3 3.2
July 2010: LDA 11
LFS and time (example in SPSS from www.longitudinal.stir.ac.uk)
Log regression: odds of being a professional from LFS adult workers in 1991,1996 and 2001
2.383 .000 10.842
-.955 .000 .385
.777 .000 2.174
-.857 .000 .424
.094 .000 1.098
-.195 .000 .823
-.030 .000 .971
-4.232 .000 .015
Higher degree
Female
Age in years (/10)
Age in years squared (/1000)
Time point 1991
Time point 2001
(Time in years)* (Higher Degree)
Constant
aB Sig. Exp(B)
Nagelkere R2=0.11a.
July 2010: LDA 12
Panel Datasets
– ‘classic’ longitudinal design– incorporates ‘follow-up’, ‘repeated measures’, and
‘cohort’; large and small in scale
Several major panel studies in UK, e.g. www.esds.ac.uk/longitudinal
Many cross-sectional surveys feature additional panel elements
Information collected on the same cases at more than one point in time
[Data type: 3/6]
13
Illustration: Unbalanced panelWave* Person Person-level Vars
1 1 1 38 1 36
1 2 2 34 2 0
1 3 2 6 9 -
2 1 1 39 1 38
2 2 2 35 1 16
3 1 1 40 1 36
3 2 2 36 1 18
3 3 2 8 9 -
N_w=3 N_p=3 *also ‘sweep’, ‘contact’,..
Complex data example: BHPS panel dataset [SN 5151]
July 2010: LDA 14
31877 100.00 XXXXXXXXXXXXXXXXX 17941 56.28 100.00 (other patterns) 593 1.86 43.72 11............... 631 1.98 41.86 ................1 632 1.98 39.88 ........1........ 840 2.64 37.90 ..........1...... 964 3.02 35.26 1................ 1224 3.84 32.24 ......11111...... 2032 6.37 28.40 ..........1111111 2726 8.55 22.02 ........111111111 4294 13.47 13.47 11111111111111111 Freq. Percent Cum. Pattern
1 1 2 6 9 17 17Distribution of T_i: min 5% 25% 50% 75% 95% max
(pid*year uniquely identifies each observation) Span(year) = 17 periods Delta(year) = 1 unit year: 1991, 1992, ..., 2007 T = 17 pid: 10002251, 10004491, ..., 1.794e+08 n = 31877
. xtdes, i(pid) t(year)
Total 224,624 100.00 2007 14,910 6.64 100.00 2006 15,392 6.85 93.36 2005 15,627 6.96 86.51 2004 15,791 7.03 79.55 2003 16,238 7.23 72.52 2002 16,597 7.39 65.29 2001 18,867 8.40 57.91 2000 15,603 6.95 49.51 1999 15,623 6.96 42.56 1998 10,906 4.86 35.60 1997 11,193 4.98 30.75 1996 9,438 4.20 25.77 1995 9,249 4.12 21.56 1994 9,481 4.22 17.45 1993 9,600 4.27 13.23 1992 9,845 4.38 8.95 1991 10,264 4.57 4.57 year Freq. Percent Cum.
. tab year
July 2010: LDA 15
Panel data advantages
• Study ‘changers’ – how many of them, what are they like, what caused change
• Control for individuals’ unknown characteristics (‘residual heterogeneity’)
• Develop a full and reliable life history – e.g. family formation, employment patterns
July 2010: LDA 16
Example: Panel transitions
Young people’s household circumstance changes by subjective well-being between 1994 and 1995.
BHPS youth panel, 11-14yrs in 1994, row percents. Stays happy
Cheers up
Becomes miserable
Stays miserable
N
HH Stable 54% 19% 10% 18% 499
HH Changes 42% 22% 14% 22% 81
July 2010: LDA 17
Panel data can be ‘wide’ or ‘long’
• Depends upon the analytical approach
• Wide format is simpler to envisage but analysis will need unbalanced data or missing value imputations
• Long format is harder to manipulate (e.g. to cross-check), but is more flexible in the types of analysis it supports
1991 1992 1993 1994 1995
1991
1992
1993
1994
1995
1996
Panel models: Regression style models with various estimators to recognise the repeated contacts: e.g. random effects; fixed
effects; population average; linear
(model: influences on GHQ score in the BHPS; Stata examples available via www.dames.org.uk/workshops)
July 2010: LDA 18
legend: * p<0.05; ** p<0.01; *** p<0.001 ll -317544 -317544 -61415 -277029 r2_p r2 .02755 .02755 .0438 .00203 N 103306 103306 103306 103306 103306 _cons 10.56*** 10.56*** 11.21*** 10.64*** 10.22*** labvot .01591 .01591 .104 -.07397 .04198 convot -.6563*** -.6563*** -.5992*** -.1973* -.502*** noed .5503*** .5503*** .6927*** -.1867 .5079*** hied -.1179** -.1179 -.07051 .04218 -.02425 age2 -.000918*** -.000918*** -.001081*** .000299* -.000469*** age .09416*** .09416*** .1125*** .01852 .06013*** fem 1.147*** 1.147*** 1.038*** 0 1.164*** lninc -.2985*** -.2985*** -.4575*** -.1574*** -.1826*** Variable lin2 clus2 be2 fe2 re2
July 2010: LDA 19
Cohort Datasets
– Intuitive type of repeated contact data – e.g. ‘7-up’ series
− Often contributes to cross-cohort comparisons − e.g. UK Birth cohort studies in 1946, 1958, 1970 and 2000
Information on a group of cases which share a common circumstance, collected repeatedly
as they progress through a life course
[Data type: 4/6]
July 2010: LDA 20
Cohort data and analysis in the social sciences
• Many circumstances parallel other panel types: Large scale studies ambitious & expensive Small scale cohorts still quite common…
Attrition problems often more severe Considerable study duration limits
Glenn (2005) argues that ‘cohort analysis’ should be specifically directed to understanding effects of ageing/progression over time• Other uses of cohort data are just = panel data• It remains hard - even with extensive cohort data - to
authoritatively understand ageing effects (age = period – cohort)
21
Event history data analysis[esp. Blossfeld et al 2007]
• Data sources are panel / cohort studies, or retrospective interviews (…recall errors..)Analysis of event durations: ‘Event history analysis’;
‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; ..
Analysis of event patterns: ‘Sequence analysis’; ‘trajectory analysis’; ‘optimal matching analysis’; ‘latent growth curves’
Focus shifts to length of time in a ‘state’ -
analyse determinants/patterns to time in state(s)
[Data type: 5/6]
July 2010: LDA 22
Key to event histories is ‘state space’ Episodes within state space : Lifetime work histories for 3 adults born 1935 State space Person 1 FT work PT work Not in work Person 2 FT work PT work Not in work Person 3 FT work PT work Not in work 1950 1960 1970 1980 1990 2000
July 2010: LDA 23
Example: Cox regression (SPSS example at www.longitudinal.stir.ac.uk)
Cox regression estimates: risks of quicker exit from firstemployment state of BHPS adults
.194 .081 .017
-.617 .179 .001
-.062 .003 .000
.000 .000 .000
-.013 .001 .000
.214 .109 .049
-.003 .002 .061
.000 .004 .897
.006 .001 .000
Female
Self-employed
Age in 1990
Age in 1990 squared
Hope-Goldthorpe scale
Female*self-employed
Female* HG scale
Self-employed*HG scale
Female*Age in 1990
B SE Sig.
24
Time series data
Examples:• Unemployment rates by year in UK• University entrance rates by year by country
Comments: – Panel = many variables few time points
= ‘cross-sectional time series’ to economists– Time series = few variables, many time points– Descriptive analyses – e.g. charts of statistics over time– Advanced modelling analyses typically involve including ‘autoregressive’
terms (e.g. lag effects) amongst explanatory factors
Statistical summary of one particular concept, collected at repeated time points from one or
more subjects
[Data type: 6/6]
July 2010: LDA 25
….Six types of data/analysis…!0. Temporal effects in cross-sectional data
1. Repeated cross-sections
2. Panel datasets 3. Cohort studies
4. Event history datasets 5. Time series analyses
2. Data management issues • Working with longitudinal survey data is made
more challenging by important issues of ‘data management’ Variable operationalisations for comparisons
e.g. strategies for standardisation, harmonisation
Linking datasets internally to a study Linking with other datasets to enhance analysis
[Value of organising your data and files – e.g. Long, 2009]
Recognising data structure in analysis e.g. missing data; survey effects; modelling specifications
[..and then there’s another thing..]
Dealing with complex dataIn the UK we host many projects and centres which
contribute to enabling the analysis of complex longitudinal data for social science research
– Specifying suitably complex statistical models • Examples at the Centre for Multilevel Modelling (‘E-Stat’ a
generic tool for specifying advanced models; Realcom – for analysing longitudinal missing data); Lancaster-Warwick-Stirling NCRM Node; ULSC (Essex) on survey design effects
– Resources on accessing and handling complex data• e.g. ESDS; ADMIN Node; Obesity e-lab; DAMES Node
• ..Session 17 in yesterday’s programme..
July 2010: LDA 27
My own pet project concerns comparability of variables over time..(see www.dames.org.uk)
July 2010: LDA 28
Unskilled
Skilled manual
Petty-bourg.
Non-manual
Salariat
Source: Females from LFS/GHS, using data from Li and Heath (2008)
percent of year category
Goldthorpe class scheme harmonised over time
July 2010: LDA 29
Managers and Administrators
Professional
Associate professional and technical
Clerical and secretarial
Craft and related
Personal and protective servicesSales
Plant and machine operativesOther occupations
.
higher degree
first degree
teaching qf
other higher qf
nursing qf
gce a levels
gce o levels or equiv
commercial qf, no o levels
cse grade 2-5,scot grade 4-5apprenticeship
other qf
no qf
.white
black-carib
black-african
black-other
indianpakistani
bangladeshi
chinese
other ethnic grp
2030
4050
0 1 2 3Source: British Household Panel Survey 2007, adults aged 18+ and father's Cambridge Scale score.Points at 1-3 show category mean. Points at 0 show individual values (scaled mean=28, sd=6; pop. mean=28, sd=18).
…‘Effect proportional scaling’ using parents’ occupational advantage
30
3. Some closing comments on the analysis of longitudinal survey data
Why bother with all this..?– Focus on change / stability
– Focus on the life course Distinguish age, period and cohort effects Career trajectories / life course sequences
– Focus on time / durations Substantive role of durations (e.g. Unemployment)
– Getting the ‘full picture’ Causality and residual heterogeneity Examining multivariate relationships Representative conclusions
[e.g. Abbott 2006; Mayer 2005; Menard 2002; Baltagi 2001; Rose 2000; Dale and Davies 1994; Hannan and Tuma 1979; Moser 1958]
31
Research traditions• ‘geographers study space and economists study time’
[adage quoted in Fotheringham et al. 2000:245] Vast economics literature using techniques for temporal analysis Other social science disciplines to some degree catching up Though methodological research on longitudinal models, and data quality,
cross-cuts disciplines [e.g. Dale and Davies, 1994]
• Data expansions c1990 -> more encompassing models; new substantive applications areas – For example: – [Platt 2005] - ethnic minorities’ social mobility 1971-2001– [Pahl & Pevalin 2005] – Friendship patterns over time– [Verbakel & de Graaf 2008] – spouses effect on careers 1941-2003
• …One challenge is getting used to talking about time in a more disciplined way: e.g. traditional sociological characterisations of ‘the past’ and ‘social change’ may not be empirically satisfactory
32
What’s exciting in the analysis of longitudinal social survey data?
• A personal view:
By and large, the core analytical & methodological issues have been recognised for some time
What is exciting is the rapid expansion of secondary quantitative longitudinal data, its quality, its volume and its accessibility
(a) - new data
(b) - new tools for accessing, handling and
modelling large and complex data
References• Abbott, A. (2006). 'Mobility: What? When? How?' in Morgan, S.L., Grusky, D.B. and Fields, G.S. (eds.) Mobility and
Inequality. Stanford: Stanford University Press.• Baltagi, B.H. (2001). Econometric Analysis of Panel Data. New York: Wiley.• Blossfeld, H.P. and Rohwer, G. (2002). Techniques of Event History Modelling: New Approaches to Causal Analysis,
2nd Edition. Mawah, NJ: Lawrence Erlbaum Associates.• Blossfeld, H. P., Grolsch, K., & Rohwer, G. (2007). Event History Analysis with Stata. New York: Lawrence Erlbaum • Davies, R.B. (1994). 'From Cross-Sectional to Longitudinal Analysis' in Dale, A. and Davies, R.B. (eds.) Analysing
Social and Political Change : A casebook of methods. London: Sage.• Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2000). Quantitative Geography: Perspectives on Spatial Data
Analysis. London: Sage.• Glenn, N. D. (2005). Cohort Analysis, 2nd Edition. London: Sage.• Hannan, M. T., & Tuma, N. B. (1979). Methods for Temporal Analysis. Annual Review of Sociology, 5, 303-328.• Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in
the United Kingdom, 1972-2005 [computer file]. 2nd Ed. Colchester, Essex: UK Data Archive [distributor], SN: 5666.• Long, J.S. (2009). The Workflow of Data Analysis using Stata. Boca Raton, Texas: • Martin, J., Bynner, J., Kalton, G., Boyle, P., Goldstein, H., Gayle, V., Parsons, S. and Piesse, A. 2006. Strategic Review
of Panel and Cohort Studies. London: Longview, and www.longviewuk.com/• Mayer, K.U. 2005. 'Life courses and life chances in a comparative perspective' in Svallfors, S. (ed.) Analyzing
Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press.• Menard, S. 2002. Longitudinal Research, 2nd Edition. London: Sage, Number 76 in Quantitative Applications in the
Social Sciences Series.• Moser, C. A. (1958). Survey Methods in Social Investigation. London: Heinemann.• Pahl, R., & Pevalin, D. (2005). Between family and friends: a longitudinal study of friendship choice. British Journal of
Sociology, 56(3), 433-450.• Platt, L. (2005). Migration and Social Mobility: The Life Chances of Britain's Minority Ethnic Communities . Bristol:
The Policy Press.• Rose, D. (2000). Researching Social and Economic Change: The Uses of Household Panel Studies. London: Routledge.• Taris, T.W. (2000). A Primer in Longitudinal Data Analysis. London: Sage.• Treiman, D.J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Josey Bass. • Verbakel, E., & de Graaf, P. M. (2008). Resources of the Partner: Support or Restriction in the Occupational Career
Developments in the Netherlands Between 1940 and 2003. European Sociological Review, 24(1), 81-95.33