9
www.cls.ioe.ac. uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

Embed Size (px)

Citation preview

Page 1: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Return from Anarchy

Jon Johnson

11 May 2005

Migrating from SPSS to SIR

Page 2: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Introduction CLS runs 3 / 4 British Birth Cohort Studies

Multi-disciplinary study of the life-course of three generations born in 1958,1970 and 2000

Data collected in various ways, paper, CAPI, administrative data Complex data, 100,000 variables, 18,000 participants per study

Page 3: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

History Punch cards, different data centres, SIR, SPSS The data has been through the range of data storage fashions Social science versus Medical data access models Goal of increased accessibility and understanding of relationships within data Development of social science meta-data standards

Page 4: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Current Data Collection Data collection methods such as CAPI has a negative and positive side Data is pre-punched Data is pre-checked Data is less understandable Data is more complicated Recent data supplied for one sweep was > 100,000 variables

Page 5: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Taming data Datasets are routinely supplied in SPSS format SPSS is not an ideal environment to manage such data SIR is an ideal environment to manage this data

Page 6: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Data Migration with minimum information loss SPSS Data List

Rarely used, high level of manual intervention Visual Basic (a.k.a. SaxBasic)

Platform dependent Limited functionality, multi-step process

ODBC Flaky at best

Reverse engineer SPSS file SPSS Portable format - stable if poorly documented format

Page 7: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Implementation PQL, Perl, Python ? Stable across OS’s Good text manipulation Good XML support Case based databases

Page 8: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

How it works parse spss file grabs variable name, value labels, data values etc looks up a configuration file for BDI settings check if also setting up database or just adding a new record do some conversions: time, date, scaled vars do some analysis of the data to grab range of values, write out warning if > 3 missing values or a range of missing values write out schema python spss_parser.py -f <input filename> -s <sir config file> -d <ddi config file>

Page 9: Www.cls.ioe.ac.uk Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR

www.cls.ioe.ac.uk

Use Once into SIR the data can be restructured Extend to other datasets held in other statistical packages such as Stata or SAS

going via StatTransfer -> SPSS portable format and go from there Also creates XML to add to a data store - superseded !!!