16
Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the Institute of Education

Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

Embed Size (px)

Citation preview

Page 1: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

Data Collection, Harmonisation and Storage (An international perspective)

Jon Johnson (CLS, Senior Database Manager)

Sub-brand to go here

CLS is an ESRC Resource Centre based at the Institute of Education

Page 2: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

22

Contents1 Introduction

2 Survey Data ‘production line’

3 Data Management Compared

4 National Longitudinal Surveys

5 PSID and HRS (USA)

6 MCS, NCDS and BCS70 (UK)

7 LISS Panel (Netherlands)

8 Management strategies compared

9 Storage, maintenance and output

10 Meta Data Standards

11 New Requirements

Page 3: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

33

IntroductionIn November 2008 CLS (MCS,NCDS, BCS70) and ULSC (BHPS, Understanding Society) were commissioned as part of Objective 5 of the Survey Resources Network by the ESRC to:

Examine potential efficiencies in data management processes, particularly in relation to data management software;Examine the use of cutting-edge data collection methods for longitudinal surveys carried out at CLS/ULSC

Completed a wide ranging review of the Survey Data Process and submitted it to the ESRC in November 2009.

www.cls.ioe.ac.uk

Page 4: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

44

Survey Data ‘production line’

www.cls.ioe.ac.uk

Page 5: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

55

Data Management ComparedVarious strategies to cope with the complex data flows of survey collection, management and dissemination:

Final report will be available from http://surveynet.ac.uk/sms/introduction.asp

Highly Integrated : National Longitudinal Surveys (USA)

Partnership : PSID and HRS (USA)

Contracted : MCS, NCDS and BCS70, BHPS,USoc (UK)

Loosely Integrated : LISS Panel (Netherlands)

www.cls.ioe.ac.uk

Page 6: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

66

National Longitudinal Surveys (USA)Over more than two decades the NLS has developed in-house software to capture the survey.

More recently they have integrated this into a turnkey solution where the storage of the survey is itself a mirror of the data collection instrument.

Based on a highly normalised Oracle database, a snapshot of the data is auto-processed and available to researchers on a “create your own dataset basis” and then turned into standard flat datasets for use by researchers.

Ref: http://www.chrr.ohio-state.edu/

www.cls.ioe.ac.uk

Page 7: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

77

PSID and HRS (USA)Both the Panel Study of Income Dynamics (PSID) and the Health and Retirement Survey (HRS) utilise the in-house resources of the Survey Research Centre which provides survey data collection resources primarily to studies based at the University of Michigan.

Survey instrument design is closely linked both to the PI and data management teams using Blaise for data collection.

Data is prepared internally using SAS and processed to download as packaged datasets from PSID and also from IPCSR.

Ref: http://psidonline.isr.umich.edu/ and http://hrsonline.isr.umich.edu/

www.cls.ioe.ac.uk

Page 8: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

88

MCS, NCDS and BCS70 (UK)CLS is responsible for specification of the instruments and data output which is implemented by a third party survey organisation.

Data is further processed within CLS using SIR and provided to researchers as packaged datasets for download from the ESDS Data Archive. Meta-data is harvested from the CAI instrumentation and held in an SQL database for generation of HTML web pages directly from DDI 2.0 XML

Ref: http://www.cls.ioe.ac.uk and http://www.cls.ioe.ac.uk/datadictionary

www.cls.ioe.ac.uk

Page 9: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

99

LISS Panel (Netherlands)The LISS Panel is primarily a web based survey, which uses a layer over Blaise with a dedicated survey instrument programming section closely linked to the survey design team.

Data is produced from Blaise and managed in SPSS and provided as prepared datasets for use by researchers for download from LISS.

A separate SQL metadata database, based on DDI 3.0 is used to provide navigation and generate the codebook etc.

Ref: http://www.lissdata.nl/lissdata/Homec

www.cls.ioe.ac.uk

Page 10: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

10

10

Management strategies comparedAll studies face the same challenges

1. Complex data2. Data description handling3. Management of meta-data4. Myriad audiences5. Longitudinal consistency6. Resource constraints7. Re-purposing of data

www.cls.ioe.ac.uk

Page 11: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

1111

All in one basket approach

www.cls.ioe.ac.uk

NLS NHANES

Page 12: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

12

12

Data and Meta-data separated

www.cls.ioe.ac.uk

LISS / PSID / HRS MCS / NCDS / BCS / BHPS / USoc

Page 13: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

13

14

Storage, maintenance, output

www.cls.ioe.ac.uk

Cleaning your data•Cohort data continually evolves•2-3% of people mis-report sex•Interviewers mis-key data•Data entry clerks mis-key data•Respondents mis-understand questions

Outputting and deriving dataSynchronizing changes, derivations and internal consistency, e.g. geographical identifiers and outputting in the best format for research is a function best done by DB staff

Page 14: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

14

15

Meta Data StandardsThe Data Documentation Initiative has emerged as the front runner as the

basis for an international standard

1. Existing foothold is limited

2. Lacks sufficient support for longitudinal studies

3. Provides at least a minimum of data which would enable international cross-cohort data discovery

Can we establish a ‘Dublin Core’ for longitudinal / birth cohort surveys?

www.cls.ioe.ac.uk

Page 15: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

15

13

New Requirements• Video / audio• Genetics• Web capture e.g. social networks• Paper Archives• Record Linkage• Biological measures

• Data security (ISO27001)• Disclosure control

www.cls.ioe.ac.uk

Page 16: Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC

16

16

Any questions?

Institute of EducationUniversity of London20 Bedford WayLondon WC1H 0AL

Tel +44 (0)20 7612 6000Fax +44 (0)20 7612 6126Email [email protected] www.ioe.ac.uk