Upload
gyles-hubbard
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Data Collection, Harmonisation and Storage (An international perspective)
Jon Johnson (CLS, Senior Database Manager)
Sub-brand to go here
CLS is an ESRC Resource Centre based at the Institute of Education
22
Contents1 Introduction
2 Survey Data ‘production line’
3 Data Management Compared
4 National Longitudinal Surveys
5 PSID and HRS (USA)
6 MCS, NCDS and BCS70 (UK)
7 LISS Panel (Netherlands)
8 Management strategies compared
9 Storage, maintenance and output
10 Meta Data Standards
11 New Requirements
33
IntroductionIn November 2008 CLS (MCS,NCDS, BCS70) and ULSC (BHPS, Understanding Society) were commissioned as part of Objective 5 of the Survey Resources Network by the ESRC to:
Examine potential efficiencies in data management processes, particularly in relation to data management software;Examine the use of cutting-edge data collection methods for longitudinal surveys carried out at CLS/ULSC
Completed a wide ranging review of the Survey Data Process and submitted it to the ESRC in November 2009.
www.cls.ioe.ac.uk
44
Survey Data ‘production line’
www.cls.ioe.ac.uk
55
Data Management ComparedVarious strategies to cope with the complex data flows of survey collection, management and dissemination:
Final report will be available from http://surveynet.ac.uk/sms/introduction.asp
Highly Integrated : National Longitudinal Surveys (USA)
Partnership : PSID and HRS (USA)
Contracted : MCS, NCDS and BCS70, BHPS,USoc (UK)
Loosely Integrated : LISS Panel (Netherlands)
www.cls.ioe.ac.uk
66
National Longitudinal Surveys (USA)Over more than two decades the NLS has developed in-house software to capture the survey.
More recently they have integrated this into a turnkey solution where the storage of the survey is itself a mirror of the data collection instrument.
Based on a highly normalised Oracle database, a snapshot of the data is auto-processed and available to researchers on a “create your own dataset basis” and then turned into standard flat datasets for use by researchers.
Ref: http://www.chrr.ohio-state.edu/
www.cls.ioe.ac.uk
77
PSID and HRS (USA)Both the Panel Study of Income Dynamics (PSID) and the Health and Retirement Survey (HRS) utilise the in-house resources of the Survey Research Centre which provides survey data collection resources primarily to studies based at the University of Michigan.
Survey instrument design is closely linked both to the PI and data management teams using Blaise for data collection.
Data is prepared internally using SAS and processed to download as packaged datasets from PSID and also from IPCSR.
Ref: http://psidonline.isr.umich.edu/ and http://hrsonline.isr.umich.edu/
www.cls.ioe.ac.uk
88
MCS, NCDS and BCS70 (UK)CLS is responsible for specification of the instruments and data output which is implemented by a third party survey organisation.
Data is further processed within CLS using SIR and provided to researchers as packaged datasets for download from the ESDS Data Archive. Meta-data is harvested from the CAI instrumentation and held in an SQL database for generation of HTML web pages directly from DDI 2.0 XML
Ref: http://www.cls.ioe.ac.uk and http://www.cls.ioe.ac.uk/datadictionary
www.cls.ioe.ac.uk
99
LISS Panel (Netherlands)The LISS Panel is primarily a web based survey, which uses a layer over Blaise with a dedicated survey instrument programming section closely linked to the survey design team.
Data is produced from Blaise and managed in SPSS and provided as prepared datasets for use by researchers for download from LISS.
A separate SQL metadata database, based on DDI 3.0 is used to provide navigation and generate the codebook etc.
Ref: http://www.lissdata.nl/lissdata/Homec
www.cls.ioe.ac.uk
10
10
Management strategies comparedAll studies face the same challenges
1. Complex data2. Data description handling3. Management of meta-data4. Myriad audiences5. Longitudinal consistency6. Resource constraints7. Re-purposing of data
www.cls.ioe.ac.uk
1111
All in one basket approach
www.cls.ioe.ac.uk
NLS NHANES
12
12
Data and Meta-data separated
www.cls.ioe.ac.uk
LISS / PSID / HRS MCS / NCDS / BCS / BHPS / USoc
13
14
Storage, maintenance, output
www.cls.ioe.ac.uk
Cleaning your data•Cohort data continually evolves•2-3% of people mis-report sex•Interviewers mis-key data•Data entry clerks mis-key data•Respondents mis-understand questions
Outputting and deriving dataSynchronizing changes, derivations and internal consistency, e.g. geographical identifiers and outputting in the best format for research is a function best done by DB staff
14
15
Meta Data StandardsThe Data Documentation Initiative has emerged as the front runner as the
basis for an international standard
1. Existing foothold is limited
2. Lacks sufficient support for longitudinal studies
3. Provides at least a minimum of data which would enable international cross-cohort data discovery
Can we establish a ‘Dublin Core’ for longitudinal / birth cohort surveys?
www.cls.ioe.ac.uk
15
13
New Requirements• Video / audio• Genetics• Web capture e.g. social networks• Paper Archives• Record Linkage• Biological measures
• Data security (ISO27001)• Disclosure control
www.cls.ioe.ac.uk
16
16
Any questions?
Institute of EducationUniversity of London20 Bedford WayLondon WC1H 0AL
Tel +44 (0)20 7612 6000Fax +44 (0)20 7612 6126Email [email protected] www.ioe.ac.uk