45
Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Embed Size (px)

Citation preview

Page 1: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Programmability in SPSS 15

The Revolution Continues

Jon PeckTechnical AdvisorSPSS

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Page 2: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Recap of SPSS 14 Python programmability

Developer Central

New features in SPSS 15 programmability Writing first-class procedures Updating the data

The Bonus Pack modules

Interacting with the user

Q & A

ConclusionCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Agenda

Page 3: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

"Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago."

"I think I am going to like using Python."

"Python, here I come!"

"I now think Python is an amazing language."

"Python and SPSS 14 and later are, IMHO, GREAT!"

"By the way, Python is a great addition to SPSS."Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Quotations from SPSS Users

Page 4: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS provides a powerful engine for statistical and

graphical methods and for data management.

Python® provides a powerful, elegant, and easy-

to-learn language for controlling and responding to

this engine.

Together they provide a comprehensive system for

serious applications of analytical methods to data.

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

The Combination of SPSS and Python

Page 5: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program

SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API’s Easier setup

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Programmability Features in SPSS 14 and 15

Page 6: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Makes possible jobs that respond to datasets, output, environment

Allows greater generality, more automation

Makes jobs more robust

Allows extending the capabilities of SPSS

Enables better organized and more maintainable code

Facilitates staff specialization

Increases productivity

More fun

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Programmability Advantages

Page 7: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Python extends SPSS via General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code

Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context

Two modes Traditional SPSS syntax window Drive SPSS from Python (external mode)

Optional installCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Programmability Overview

Page 8: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS is not the owner or licensor of the Python

software. Any user of Python must agree to the

terms of the Python license agreement located on

the Python web site.  SPSS is not making any

statement about the quality of the Python program.

SPSS fully disclaims all liability associated with

your use of the Python program.

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Legal Notice

Page 9: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Supports implementing various programming

languages Requires a programmer to implement a new language

VB.NET Plug-In available on Developer Central Works only in external mode

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

The SPSS Programmability SDK

Page 10: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Python interpreter embedded within SPSS

SPSS runs in traditional way until BEGIN PROGRAM command is found

Python collects commands until END PROGRAM command is found; then runs the program

Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax

Python can access SPSS output and data

OMS is a key toolCopyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

How Programmability Works

Page 11: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

BEGIN PROGRAM.import spss, spssauxspssaux.GetSPSSInstallDir("SPSSDIR")spssaux.OpenDataFile("SPSSDIR/employee data.sav")

# find categorical variablescatVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])if catVars:

spss.Submit("FREQ " + " ".join(catVars.variables))# create a macro listing categorical variablesspss.SetMacroValue("!catVars", " ".join(catVars.variables))

END PROGRAM.

DESC !catVars.Run

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Example: Summarize Categorical Variables

Page 12: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Two modes of operation

SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM

X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) import spss No SPSS Viewer, Data Editor, or SPSS user interface

Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE

Even if to be run in traditional mode

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Programmability Inside or Outside SPSS

Page 13: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

PythonWin IDE Controlling SPSS

Page 14: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Python.org

Python Tutorial

Global (standard) Module Index

Python help system and help command

Cheeseshop 1627 packages as of Sept 21, 2006

SPSS Developer Central

SPSS Programming and Data Management, 3rd ed, 2006.

Many books Look for books at the Python 2.4 level

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Python Resources

Page 15: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Dive Into Python book or PDF

Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python

Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher

Second edition (July, 2006) ofMartelli, Python in a Nutshell, O'Reilly Very clear, comprehensive reference material

wxPython in Action by Rappin and Dunn Explains user interface building with wxPython

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Python Books

Page 16: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

scipy 0.5.0 Scientific Algorithms Library for Python

scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.

scipy rework currently beta

Visit Scipy.org

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Cheeseshop: scipy

Page 17: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Went Live21-May-2006

New Web home for developing SPSS applications

SPSS Developer Central old url: forums.spss.com/code_center

Python Integration Plug-Ins

Useful supplementary modules by SPSS and others Updated for SPSS 15

Articles on programmability and graphics

Place to ask questions and exchange information

Programmability Extension SDK

Get Python itself from Python.org SPSS uses 2.4. (2.4.3)

Not limited to programmability GPL graphics User-contributed code

Key Supplementary Modulesspssauxspssdata

New for SPSS 15trans extendedTransforms rake pls

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

SPSS Developer Central

Page 18: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

You can extend SPSS capabilities by building new procedures Or use ones that others have built

Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG

Calculate data aggregates in SPSS and pass to algorithm coded in Python Raking procedure starts with AGGREGATE

Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module)

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Approaches to Creating New Procedures

Page 19: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Common to adapt existing libraries or code for use as Python extension modules C, C++, VB, Fortran,...

Extension modules are normal Python modules Python itself written in C Many standard modules are C code

Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python interpreter

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Adapt Existing Code Libraries

Page 20: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Regression with large number of predictors (even k > N)

Similar to Principal Components but considers dependent variable simultaneously

Calculates principal components of (y, X) then use regression on the scores instead of original data

User chooses number of factors

Equivalent to ordinary regression when number of factors equals number of predictors and one y variable

For more information see An Optimization Perspective on

Kernel Partial Least Squares Regression.pdf.Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Partial Least Squares Regression

Page 21: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Strategy Fetches data from SPSS Uses scipy matrix operations to compute results

Third-party module from Cheeseshop

Writes pivot tables to SPSS Viewer Subject to OMS SPSS 14 viewer module created pivot table using OLE automation

Saves predicted values to active dataset

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

The pls Module

Page 22: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width .

begin program.import spss, pls

pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepowlength mpg price resale type wheelbas width""", yhat="predsales")end program.

plsproc defaults to five factors

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

pls Example: REGRESSION vs PLS

Page 23: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

PLS with 5 factors

almost equals

regression with 11

variables

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Results

Page 24: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

"Raking" adjusts sample weights to control totals in n dimensions

Example: data classified by age and sex with known population totals or proportions

Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting

Not directly available in SPSS

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Raking Sample Weights

Page 25: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Strategy: combine SPSS procedures with Python logic

rake.py (part of SPSS 15 Bonus Pack) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset

Does not use MATCH FILES or require a SORT command Written in one (long) day

rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt")

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Raking Module

Page 26: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS 14 programmability can wrap SPSS syntax in Python logic Useful when definitions can be expressed in SPSS syntax

SPSS 15 programmability can generate new variables directly Cursor can have accessType='w'

SPSS 15 programmability can add cases directly Cursor can have accessType='a'

SPSS 15 programmability can create new datasets from scratch Cursor can have accessType='n'

spssdata module on Developer Central updated to support these modes

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Extending SPSS Transformations

Page 27: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

trans module facilitates plugging in Python code to iterate over cases

Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise

Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

trans and extendedTransforms Modules

Page 28: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

trans strategy Pass case data through Python code writing

result back to SPSS in new variables

extendedTransforms collection of ten functions to apply to SPSS variables Regular expression search/replace Template-based substitution soundex and nysiis functions for phonetic equivalence Levenshtein distance function for string similarity Date/time conversions based on patterns

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

trans and extendedTransforms Modules

Page 29: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Pattern matching in text strings

If you use SPSS index or replace, you need these

Standardize string data (Mr, Mr., Herr, Senor,...)

Patterns can be simple strings (as with SPSS

index) or complex patterns

Pick out variable names with common parts

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Python Regular Expressions

Page 30: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

"age" – string containing the letters age

"\wage" – string containing the word age

"abc|xyz|pqrst" = string containing any of abc etc

"\d+" – a string of any number of digits

"x.*y" – a string starting with x and ending with y

Can be case sensitive or not

Can greatly simplify code currently using SPSS index and replace functions

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Regular Expressions: A Few Examples

Page 31: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

import spss, trans, spssaux, extendedTransforms

spssaux.OpenDataFile("c:/data/names.sav")tproc = trans.Tfunction(listwiseDeletion=True)

tproc.append(extendedTransforms.search, 'match','a8', ['names', trans.const('Peck|Pech|Pek')])

tproc.append(extendedTransforms.search, 'matchignorecase','a8', ['names', trans.const('peck'), trans.const(True)])

tproc.append(extendedTransforms.search, ('match2','startpos','length'), ('a12','f4.0','f4.0'), ['names', trans.const('Peck')])

tproc.execute()spss.Submit("SELECT IF length > 0")

spssaux.SaveDataFile("c:/temp/namesplus.sav")Run

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Using trans and extendedTransforms search Function

Page 32: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

begin program.import trans, redef splitAndExtract(s): """split a string on "--" and return the left part and the number in the right part. Ex: "simvastatin-- PO 80mg TAB" -> "simvastatin", 80""" parts = s.split("--") try: number = re.search("\d+", parts[1]).group() except: number = None return parts[0], number

tproc = trans.Tfunction()tproc.append(splitAndExtract, ("name", "number"), ("a30", "f5.0"), ["medicine"])tproc.execute()end program. Run

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Using trans:Writing Your Own Function

Page 33: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Algorithms for approximating phonetic equivalence of names

soundexallwords can be used on unstructured text

Applied to database of 20,000+ surnames

import spss, trans, spssaux, extendedTransforms

spssaux.OpenDataFile("c:/data/names.sav")tproc = trans.Tfunction()

tproc.append(extendedTransforms.soundex, 'soundex','a5', ['names'])tproc.append(extendedTransforms.nysiis, 'nysiis', 'a20', ['names'])tproc.execute()

spssaux.SaveDataFile("c:/temp/namesplusplus.sav")Run

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

extendedTransforms soundex and nysiis

Page 34: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Results

Page 35: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

(Overly) simple processing of unstructured text

Use soundex word by word to abstract spelling

No stemming, linguistic analysis etc Use STAFS for serious work

Very simple to use

begin program.import spss, trans, extendedTransformst = trans.Tfunction()t.append(extendedTransforms.soundexallwords, 'allsoundexn66',

'a108', ['n_66'])t.execute()end program.

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

soundex on Unstructured Text

Page 36: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

soundex on Unstructured Text

Page 37: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Python comes with Tkinter, a gui toolkit

There are better ones freely downloadable E.g., wxPython Visit wxpython.org

Very easy to do small user interactions

Examples Message box File chooser Variable picker

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Creating a Graphical User Interface

Page 38: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Simple Message Box Using wxPython

Page 39: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Simple File Chooser Using wxPython

Page 40: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Variable Picker Using wxPython

Page 41: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

User-missing values GetVarMissingValues GetSPSSLowHigh

Pivot table API's BasePivotTable CellText Dimension

Output Text block support Good for writing comments to the Viewer

Miscellaneous GetWeightVar HasCursor SplitChange

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Other New spss Module API’s

Page 42: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS 14 introduced major programmability features

SPSS 15 adds Reading and writing case data: new variables; new cases Creating pivot tables and text blocks Writing first-class SPSS procedures

Bonus Pack and Partial Least Squares modules illustrate these features

Developer Central improves ability to provide modules and information Will soon have four new SPSS 15 modules

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Recap

Page 43: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

?

? ??

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Questions

Page 44: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

SPSS 15 programmability makes it easy to add capabilities beyond what is already built in to SPSS

SPSS 15 makes it easier to build complete applications on top of SPSS

SPSS 15 programmability makes you more productive

SPSS 15 has lots of other great features, too

Try it out

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

SPSS 15: The Revolution Continues

Page 45: Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006

Copyright (c) SPSS Inc, 2006Copyright (c) SPSS Inc, 2006

Write to Me!