Upload
valentine-dawson
View
217
Download
0
Embed Size (px)
Citation preview
www.cls.ioe.ac.uk
Return from Anarchy
Jon Johnson
11 May 2005
Migrating from SPSS to SIR
www.cls.ioe.ac.uk
Introduction CLS runs 3 / 4 British Birth Cohort Studies
Multi-disciplinary study of the life-course of three generations born in 1958,1970 and 2000
Data collected in various ways, paper, CAPI, administrative data Complex data, 100,000 variables, 18,000 participants per study
www.cls.ioe.ac.uk
History Punch cards, different data centres, SIR, SPSS The data has been through the range of data storage fashions Social science versus Medical data access models Goal of increased accessibility and understanding of relationships within data Development of social science meta-data standards
www.cls.ioe.ac.uk
Current Data Collection Data collection methods such as CAPI has a negative and positive side Data is pre-punched Data is pre-checked Data is less understandable Data is more complicated Recent data supplied for one sweep was > 100,000 variables
www.cls.ioe.ac.uk
Taming data Datasets are routinely supplied in SPSS format SPSS is not an ideal environment to manage such data SIR is an ideal environment to manage this data
www.cls.ioe.ac.uk
Data Migration with minimum information loss SPSS Data List
Rarely used, high level of manual intervention Visual Basic (a.k.a. SaxBasic)
Platform dependent Limited functionality, multi-step process
ODBC Flaky at best
Reverse engineer SPSS file SPSS Portable format - stable if poorly documented format
www.cls.ioe.ac.uk
Implementation PQL, Perl, Python ? Stable across OS’s Good text manipulation Good XML support Case based databases
www.cls.ioe.ac.uk
How it works parse spss file grabs variable name, value labels, data values etc looks up a configuration file for BDI settings check if also setting up database or just adding a new record do some conversions: time, date, scaled vars do some analysis of the data to grab range of values, write out warning if > 3 missing values or a range of missing values write out schema python spss_parser.py -f <input filename> -s <sir config file> -d <ddi config file>
www.cls.ioe.ac.uk
Use Once into SIR the data can be restructured Extend to other datasets held in other statistical packages such as Stata or SAS
going via StatTransfer -> SPSS portable format and go from there Also creates XML to add to a data store - superseded !!!