36
Programmability in SPSS 16 and 17 Jon K Peck Technical Advisor and Principal Software Engineer Athens, May 2008

Programmability in SPSS 16 & 17, Jon Peck

Embed Size (px)

DESCRIPTION

Programmability in SPSS 16 & 17, Jon Peck

Citation preview

Page 1: Programmability in SPSS 16 & 17, Jon Peck

Programmability in SPSS 16 and 17

Jon K Peck

Technical Advisor and Principal Software Engineer

Athens, May 2008

Page 2: Programmability in SPSS 16 & 17, Jon Peck

Agenda

Review of programmabilityThe Extension mechanism and the PROPOR procedureUser-Defined dialog boxesThe Dataset class and comparing datasetsExamples: custom sorting, pattern matchingBuilding applications that embed SPSSIntegrating R into SPSSQ and AWrap up

Page 3: Programmability in SPSS 16 & 17, Jon Peck

Programmability extends the standard SPSS capabilitiesMakes it easy to build jobs that respond to data, output, and the environmentAllows greater generality, more automationMakes jobs more flexible and robustAllows extending the capabilities of SPSSAllows the use of existing or new statistical modules written in R or PythonEnables simpler and more maintainable codeIncreases your productivityPuts you in control

More fun

Page 4: Programmability in SPSS 16 & 17, Jon Peck

SPSS syntaxBEGIN PROGRAM PYTHON or R.Python or R codeEND PROGRAM.SPSS syntaxA program in the SPSS input stream can communicate with SPSS and control it and use the language's facilities and modulesA Python or .NET application can embed SPSS inside itselfResources and forums are at SPSS Developer Central

www.spss.com/devcentralProgrammability plugins are an optional install

Programmability embeds Python or R inside SPSS

Page 5: Programmability in SPSS 16 & 17, Jon Peck

BEGIN PROGRAM.import spss, spssaux, spssdatadef findUnlabelledValues(name):d = spssaux.VariableDict()labels = set(d[name].ValueLabelsTyped)data = spssdata.Spssdata(indexes=[name])values = set()for case in data:

values.add(case[0])data.close()values.discard(None)print "\nUnlabeled Values:\n",sorted(values.difference(labels))

findUnlabelledValues("origin")END PROGRAM.

Example: Automate the job of finding unlabelled values of a variable

No label may indicate an error

Unlabeled Values:[4.0, 7.0, 11.0]

Page 6: Programmability in SPSS 16 & 17, Jon Peck

Python and R Are open source software

SPSS is not the owner or licensor of the Python or R software. Any user of Python or R must agree to the terms of the license agreement located on the Python or R web site. SPSS is not making any statement about the quality of the Python or R programs. SPSS fully disclaims all liability associated with your use of the Python or R programs.

Page 7: Programmability in SPSS 16 & 17, Jon Peck

SPSS is divided into two parts

The SPSS Processor: invisible– Syntax processing– Computation– Data handling– Procedures– May be remote with SPSS Server

The SPSS Front End: what you see– Menus and dialog boxes– Output Viewer– Data Editor– Syntax window

Page 8: Programmability in SPSS 16 & 17, Jon Peck

SPSS 16 added new programmability and scripting features

SPSS Processor– SPSS syntax– Python programs– .NET programs

SPSS Front End– SaxBasic scripts– COM support

SPSS Processor– SPSS syntax– Python programs– .NET programs– R programs– Extensions

SPSS Front End– Basic scripts (Windows)– COM support (Windows)– Python scripts

SPSS 15 SPSS 16

Page 9: Programmability in SPSS 16 & 17, Jon Peck

Scripting is useful for working with Viewer contents

Scripts can be written in Python or, on Windows, in BasicPython apis have a structure similar to familiar SaxBasic scripting– Import the spssClient module

IDEs are provided for Python and BasicSPSS 17 will allow programs to use the spssClientmoduleAutoscripts are triggered by specified types of output events– E.g., creating a table of regression coefficients

Autoscripts have been generalized in SPSS 16

Page 10: Programmability in SPSS 16 & 17, Jon Peck

Python and R add great functionality to SPSSMany users know only SPSS syntaxMEANS TABLES = accel BY origin

/CELLS MEAN COUNT STDDEV MEDIAN/STATISTICS LINEARITY.

Extensions define SPSS syntax for programs via XMLDefinitions are loaded automatically on SPSS startupParsed syntax is passed to Python or R moduleUser never needs to know about the programsAuthor never needs to parse SPSS syntaxPLS module in SPSS 16 is an extension

The EXTENSION mechanism turns Python or R programs into user-defined SPSS syntax

Page 11: Programmability in SPSS 16 & 17, Jon Peck

Extensions simplify the author's job

User'sSPSSSyntax

SPSSParser

ExtensionXML

Author codeModule

Run

extensionmodule

Templateparsecmd

Output

The author supplies only the gold parts

The user just enters the command syntax

Page 12: Programmability in SPSS 16 & 17, Jon Peck

PROPOR is a new extension procedure

Calculates confidence intervals for proportions

Produces pivot table output

PROPOR /HELP.Confidence Intervals for Proportions and Differences in Proportions.

PROPOR /HELP displays this help and does nothing else.Syntax:

PROPOR NUM=list DENOM=list [ID=varname][/DATASET NAME=dsname][/LEVEL ALPHA=value][/HELP]

Example:PROPOR NUM= 55 DENOM=100.

Developer Central

Page 13: Programmability in SPSS 16 & 17, Jon Peck

PROPOR produces a pivot table of confidence intervals

Page 14: Programmability in SPSS 16 & 17, Jon Peck

What about user interfaces?

SPSS

17

Page 15: Programmability in SPSS 16 & 17, Jon Peck

User-defined dialog boxes look like SPSS-defined dialogs

Which is the real one?

SPSS 17

Page 16: Programmability in SPSS 16 & 17, Jon Peck

Programmability can enhance procedures: A program to customize sorting in CTABLES

CTABLES /TABLE occupation[COUNT]/CATEGORIES VARIABLES=occupation ORDER=D KEY=COUNT TOTAL=YES.This table is sorted in descending order, but category Other should be at the bottom.

Page 17: Programmability in SPSS 16 & 17, Jon Peck

A Program To Customize Sorting in Ctables

import spss, spssaux2spssaux2.genCategoryList("occupation", specialvalues=[4], macroname="other")spss.Submit("""CTABLES /TABLE occupation[COUNT] /CATEGORIES VARIABLES=occupation [!other] TOTAL=YES.""")

Page 18: Programmability in SPSS 16 & 17, Jon Peck

Python regular expressions greatly simply tasks involving patterns in strings

A regular expression defines a pattern that can be searched for or used in a replaceExample: a dataset contains three variables, firstname, lastname, and narrative. The names need to be replaced in the narratives so that they are anonymousSample data:

Page 19: Programmability in SPSS 16 & 17, Jon Peck

Using regular expressions to work with patterns: Making a narrative anonymous

begin program.import spss, spssaux, spssdata, revard = spssaux.VariableDict()curs = spssdata.Spssdata(indexes='firstname lastname narrative', accessType='w')curs.append(spssdata.vdef("anonnarrative",

vtype=vard['narrative'].VariableType + 100))curs.commitdict()wbound = r"\b"for case in curs:

fnregex = re.compile(wbound + case.firstname.strip() + wbound, flags=re.IGNORECASE)

lnregex = re.compile(wbound + case.lastname.strip() + wbound, flags=re.IGNORECASE)

newnarr = fnregex.sub("-firstname-", case.narrative)newnarr = lnregex.sub("-lastname-", newnarr)curs.casevalues([newnarr])

curs.CClose()end program.

E.g. \bSmith\b

Page 20: Programmability in SPSS 16 & 17, Jon Peck

Before and After

Page 21: Programmability in SPSS 16 & 17, Jon Peck

The Dataset class delivers new functionality for data management

Available for Python and .NETRetrieve, add, delete and change variables, properties, and valuesProcess multiple datasets at the same timeAccess any case by case numberIncluded in the spss module in the plug-in

SPSS

16

ds = spss.Dataset()ds.varlist['accel'].label = "acceleration" #change labelprint len(ds.cases)ds.cases[10,2] = [100] #change a value

Page 22: Programmability in SPSS 16 & 17, Jon Peck

comparedatasets uses the Dataset class to compare cases and variables in two datasets

BEGIN PROGRAM.import spss, comparedatasetsc = comparedatasets.CompareDatasets("first", "second",

idvar="id", diffcount="differences", reportroot="compare")

c.cases()c.dictionaries()c.close()END PROGRAM.

As an extension:

COMPDS DS1=first, DS2=second/DATA ID=id DIFFCOUNT=differencesROOTNAME=compare.

Developer Central

Page 23: Programmability in SPSS 16 & 17, Jon Peck

Comparedatasets: The output dataset reports case differences

Page 24: Programmability in SPSS 16 & 17, Jon Peck

comparedatasets:A summary is written to the SPSS Viewer

You can do selection, summary statistics, and charts on the outcome variables for further information.

SPSS 17 will have a built-in procedure

Page 25: Programmability in SPSS 16 & 17, Jon Peck

The Dataset class makes it easy to use the functions in the extendedTransforms module

data list fixed /dt(a21).begin data.2/22/2008 11:47:45 AM2/22/2008 11:47:45 PMend data.

begin program.import spss, extendedTransformsspss.StartDataStep()ds = spss.Dataset()ds.varlist.append("newdt", 0)ds.varlist[-1].format = (22,22,0) # DATETIME22.0 format

for i, case in enumerate(ds.cases):ds.cases[i, -1] = extendedTransforms.strtodatetime(case[0],

"%m/%d/%Y %I:%M:%S %p")

spss.EndDataStep()end program.

strtodatetimeand datetimetostrallow patternsto be usedfor dates and times

14 functions inextendedTransforms

Developer Central

Page 26: Programmability in SPSS 16 & 17, Jon Peck

You can write applications where SPSS is hidden using external drives mode

Application built by SPSS Services

Page 27: Programmability in SPSS 16 & 17, Jon Peck

A Reporting Application

Real nameshave beenscrambled

Page 28: Programmability in SPSS 16 & 17, Jon Peck

Written entirely in PythonUses SPSS invisibly for calculation and chartingOutput is captured with the Output Management System (OMS)Uses free packages to supplement SPSS– wxPython for user interface– Reportlab for PDF production

Similar things could be done with .NET

The application was built with Python, SPSS, and standard Python packages

Page 29: Programmability in SPSS 16 & 17, Jon Peck

R programs can be run inside SPSS

SPSS datasets and output can be processed by RNew SPSS datasets can be created from RR can communicate with SPSS via 30 apis

BEGIN PROGRAM R.cases <- spssdata.GetDataFromSPSS(c("mpg", "accel"), 5)spsspivottable.Display(cases, collabels=c("mpg", "accel"))END PROGRAM.

• Output appears in the SPSS Viewer• spsspivottable.Display produces pivot tables

• print() produces plain text•SPSS 17 will include graphical output

Page 30: Programmability in SPSS 16 & 17, Jon Peck

R brings many statistical methods into SPSS

52 packages starting with"a"

Page 31: Programmability in SPSS 16 & 17, Jon Peck

Example: Estimate Rents Using theR Package kknn: K Nearest Neighbors

BEGIN PROGRAM R.dict <- spssdictionary.GetDictionaryFromSPSS()data <-spssdata.GetDataFromSPSS()library(kknn)kl <- c("rectangular","triangular","epanechnikov", "gaussian","rank")t.con <-train.kknn(nmqm ~ wfl + bjkat + zh, data=data, kmax=25, kernel=kl)print(t.con)newv <- spssdictionary.CreateSPSSDictionary(c("predictedRent",

"Predicted Rent", 0, "F8.2", "scale"))spssdictionary.SetDictionaryToSPSS("newrents", data.frame(dict, newv))best <- (charmatch(t.con$best.parameters$kernel, klist)-1) * 25 +

t.con$best.parameters$kspssdata.SetDataToSPSS("newrents",

data.frame(c(t.con$fitted.values[[best]]), data))spssdictionary.EndDataStep()END PROGRAM. (Adapted from an Example in the kknn Package)

Page 32: Programmability in SPSS 16 & 17, Jon Peck

R output appears in the Viewer. The output data appear in the Data Editor

Page 33: Programmability in SPSS 16 & 17, Jon Peck

Where We Have Been Today

Programmability adds flexibility and power to SPSSThe extension mechanism integrates programs better into SPSS syntaxThe new Dataset class adds data management powerThe new scripting capabilities provide more ways to work with outputR integration opens a large collection of statistical techniques to SPSS users

Page 34: Programmability in SPSS 16 & 17, Jon Peck

Questions and Answers

?

??

????

Page 35: Programmability in SPSS 16 & 17, Jon Peck

In Conclusion

Programmability capabilities continue to growOpening up SPSS puts you in control through plugging in your own codeMore tasks can be automatedYou can easily tap large R and Python librariesNew capabilities extend data managementThe Extension mechanism integrates capabilities with a consistent syntax

Page 36: Programmability in SPSS 16 & 17, Jon Peck

Tell us about your programmability experiences

Jon Peck, Ph. D.

SPSS Inc233 S Wacker DriveChicago, IL [email protected]