Upload
hector-allison
View
215
Download
1
Embed Size (px)
Citation preview
Experiences of managing Birth Cohort Data at CLS
Jon Johnson (Senior Database Manager)
Sub-brand to go here
CLS is an ESRC Resource Centre based at the Institute of Education
2
Contents
1 Introduction
2 (Pre) History
3 Centralised Computing
4 Semi-centralised computing
5 Personal Computing
6 Consequences
7 Survey Data ‘production line’
8 Requirements
9 Potential Database strategies
10 Staffing and skills
3
Introduction
CLS has been an ESRC Resource Centre since 2005. We are responsible for three of the four British Birth Cohort studies
•NCDS (1958)
•BCS70 (1970)
•MCS (2000)
NSHD (1946) is funded by MRC at UCL.
www.cls.ioe.ac.uk
4
(Pre) History
NCDS has its origins in the Perinatal Mortality Survey. Sponsored by the National Birthday Trust Fund, this was designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the children born in Great Britain in that one week. This was a ‘follow-up’ to the 1946 study with a similar scope.
BCS70 began as the British Births Survey (BBS), and it was sponsored by the National Birthday Trust Fund in association with the Royal College of Obstetricians and Gynaecologists to follow up the 1958 study.
MCS was the specifically designed as a longitudinal survey to follow up upon the three previous birth surveys.
www.cls.ioe.ac.uk
5
Centralised Computing
www.cls.ioe.ac.uk
“If one had coded and tried to use all the information received from the 68 questions it is calculated that the results could have been expressed in a vast number of permutations probably in the region of 10 to 480th power” Perinatal Mortality (1963)
Four years after the data collection, the tabulations were eventually finalised.
Things got faster ...
“The first batch of coded forms were sent for punching in October 1970 ... 113,994 punch cards there being a minimum of 6 cards per case. The punching was completed in November 1971”
Researchers were reliant on the DP and computer professionals to generate tabulations.
6
Semi-Centralised Computing
In the mid-1970’s, as at first SPSS and then other statistical packages became available. Researchers had the opportunity to use the data prepared and marshalled by the DP and computer scientists to analyse the data themselves using the central computer.
Most users still relied on computer professionals to retrieve and tabulate data.
www.cls.ioe.ac.uk
7
Personal Computing (c1984)
With a powerful 386 computer on your desk and a copy of SPSS researchers could take the raw data and manipulate it for their own purposes.
By the mid 1990’s this process had accelerated to the position where all the data from a survey could be easily handled on a single machine and the need for database professionals could be circumvented.
www.cls.ioe.ac.uk
8
Consequences
A study became snapshots of each survey making its value as a longitudinal resource cumbersome and inefficient to manage
•Data fragmentation as derivations became disconnected from original data
•Longitudinal linkage discrepancies e.g. Partnership, fertility histories
•Coding frame discrepancies
•Data security moved from IT to individuals
•Meta data was viewed as being separate from data
With the introduction of dependent interviewing these problems would be further increased.
www.cls.ioe.ac.uk
9
Survey Data ‘production line’
www.cls.ioe.ac.uk
Instrument realisation
Instrument design
Data processing
Datadocumentation
Science
Data collection
Study design
10
Requirement
Migrate and restructure the data back into a database to restore integrity and clean discrepancies
Re-derive variables
Integration of meta-data into data
Create longitudinal checking algorithms
Ability to manipulate data in-situ
Log of changes and version control
www.cls.ioe.ac.uk
12
Staffing and Skills
At CLS we chose use SIR as our main database and SQL for holding metadata (DDI 2.0 model)•Existing SIR experience•Easy to cross-train from SPSS•Migration of data from SPSS is straight-forward•Security very configurable•Version control and change log easy to implement•Derivations, manipulations done in one place•3 FTE (mix of skills, data management, DBA)
www.cls.ioe.ac.uk
13
Any questions?
Institute of EducationUniversity of London20 Bedford WayLondon WC1H 0AL
Tel +44 (0)20 7612 6000Fax +44 (0)20 7612 6126Email [email protected] www.ioe.ac.uk