30

SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 2: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 3: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

SPSS® Statistics for Data Analysis and Visualization

Page 4: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 5: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

Keith McCormickJesus Salcedo

withJon Peck and Andrew Wheeler

SPSS® Statistics for Data Analysis and Visualization

Page 6: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

SPSS® Statistics for Data Analysis and Visualization

Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com

Copyright © 2017 by John Wiley & Sons, Inc., Indianapolis, IndianaPublished simultaneously in Canada

ISBN: 978-1-119-00355-7ISBN: 978-1-119-00557-5 (ebk)ISBN: 978-1-119-00366-3 (ebk)

Manufactured in the United States of America

10 9 8 7 6 5 4 3 2 1

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permis-sion of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley .com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or war-ranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2017936609

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permis-sion. SPSS is a registered trademark of International Business Machine Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Page 7: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

We would like to dedicate this book to Jon Peck, who retired from more than 30 years with SPSS and IBM while this book was in its final stages. We wish him the best of retirements even

though he probably won’t be able to resist staying in the SPSS community in some form.

Page 8: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 9: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

vii

About the Authors

Keith McCormick is a data mining consultant, trainer, and speaker. A passionate user of SPSS for 25 years, he has trained thousands on how to effectively use SPSS Statistics and SPSS Modeler. He blogs at keithmccormick.com.

Jesus Salcedo is an independent statistical consultant. He is a former SPSS Curriculum Team Lead and Senior Education Specialist, who has written numer-ous SPSS training courses and trained thousands of users.

Jon Peck, recently retired from IBM and SPSS, was instrumental in developing and introducing the R and Python connections to the SPSS community. This expertise made him uniquely qualified to produce Chapter 18. He is the author of all the extension commands discussed in that chapter and has a patent pend-ing on the algorithm in SPSSINC TURF procedure discussed there. He can be reached at [email protected].

Andrew Wheeler is a professor of criminology at the University of Texas at Dallas and a former crime analyst. The application of geospatial techniques in his research created the opportunity for a powerful real world example in Chapter 8. He has used SPSS for over 10 years, and often blogs SPSS tutorials at andrewpwheeler.wordpress.com.

Page 10: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 11: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

ix

Jon Peck, now retired from IBM, was a senior engineer, statistician, and product strategy person for SPSS and IBM for 32 years. He earned a Ph.D in economics from Yale University, and taught econometrics and statistics there for 13 years before joining SPSS. He designed and contributed to many features of SPSS Statistics and has consulted with and trained many users. He remains active on social media and in consulting.

Terry Taerum has fifteen years’ experience as a statistician at the University of Alberta, fifteen years as a data analyst at SPSS Inc., and five years as a predictive analyst and consultant with IBM Inc.

About the Technical Editors

Page 12: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 13: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xi

Credits

Project EditorTom Dinse

Technical EditorsJon PeckTerry Taerum

Production EditorDassi Zeidel

Copy EditorKim Cofer

Production ManagerKatie Wisor

Manager of Content Development & AssemblyMary Beth Wakefield

Marketing ManagerChristie Hilbrich

Professional Technology & Strategy DirectorBarry Pruett

Business ManagerAmy Knies

Executive EditorJim Minatel

Project Coordinator, CoverBrent Savage

ProofreaderNancy Carrasco

IndexerJohnna VanHoose Dinse

Cover DesignerWiley

Cover ImageiStock.com/agsandrew

Page 14: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 15: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xiii

Acknowledgments

Keith and Jesus are especially proud to have worked with Bob Elliot before he retired. Our good friend Dean Abbott recommended Keith to Bob when Bob was seeking out a follow up to Dean’s excellent Applied Predictive Analytics, but specifically in SPSS Statistics. Without both of them, this book would not have been created.

Terry’s and Jon’s contribution extended well beyond technical reviewing. We consider both of them mentors and friends. Jon took over technical reviewing when Terry took on a new role with a return to IBM. Jon, in particular, was an interlocutor and trusted advisor, and we produced a better book as a result.

Tom, our project editor, had to be patient with us. Deadlines slipped, con-tributors became unavailable, and Bob retired before the book was complete. Whenever it seemed that something wasn’t quite as it should be, it was often Tom that ultimately made it right. He deserves credit for multiple roles, and we thank him.

We would also like to thank all of the many SPSSers that we turn to when we have a question even if they haven’t heard from us in a while. We love the sense of community that we have all managed to maintain even when so many have moved on to other roles. And we thank Jason for capturing that sense of community in his foreword.

Page 16: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 17: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xv

Foreword xxiii

Introduction xxvii

Part I Advanced Statistics 1Chapter 1 Comparing and Contrasting IBM SPSS AMOS with Other

Multivariate Techniques 3Chapter 2 Monte Carlo Simulation and IBM SPSS Bootstrapping 43Chapter 3 Regression with Categorical Outcome Variables  71Chapter 4 Building Hierarchical Linear Models 101

Part II Data Visualization 129Chapter 5 Take Your Data Visualizations to the Next Level  131Chapter 6 The Code Behind SPSS Graphics:

Graphics Production Language  147Chapter 7 Mapping in IBM SPSS Statistics  173Chapter 8 Geospatial Analytics  193Chapter 9 Perceptual Mapping with Correspondence Analysis,

GPL, and OMS 217Chapter 10 Display Complex Relationships with

Multidimensional Scaling  249

Part III Predictive Analytics 271Chapter 11 SPSS Statistics versus SPSS Modeler:

Can I Be a Data Miner Using SPSS Statistics?  275Chapter 12 IBM SPSS Data Preparation  303

Contents at a Glance

Page 18: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

Chapter 13 Model Complex Interactions with IBM SPSS Neural Networks 325

Chapter 14 Powerful and Intuitive: IBM SPSS Decision Trees 355Chapter 15 Find Patterns and Make Predictions with K Nearest

Neighbors 379

Part IV Syntax, Data Management, and Programmability 393Chapter 16 Write More Efficient and Elegant Code with

SPSS Syntax Techniques 395Chapter 17 Automate Your Analyses with SPSS Syntax and the Output

Management System  421Chapter 18 Statistical Extension Commands  441

Index 473

xvi Contents at a Glance

Page 19: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xvii

Foreword xxiii

Introduction xxvii

Part I Advanced Statistics 1

Chapter 1 Comparing and Contrasting IBM SPSS AMOS with Other Multivariate Techniques 3T-Test 7

ANCOVA 8MANOVA 13

Factor Analysis and Unobserved Variables in SPSS 23AMOS 26

Revisiting Factor Analysis and a General Orientation to AMOS 26The General Model 29

Chapter 2 Monte Carlo Simulation and IBM SPSS Bootstrapping 43Monte Carlo Simulation 44Monte Carlo Simulation in IBM SPSS Statistics 44Creating an SPSS Model File 45IBM SPSS Bootstrapping 59

Proportions 63Bootstrap Mean 66Bootstrap and Linear Regression 68

Chapter 3 Regression with Categorical Outcome Variables  71Regression Approaches in SPSS 72Logistic Regression 73Ordinal Regression Theory 74

Assumptions of Ordinal Regression Models 77Ordinal Regression Dialogs 77

Contents

Page 20: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xviii Contents

Ordinal Regression Output 81Categorical Regression Theory 86

Assumptions of Categorical Regression Models 87Categorical Regression Dialogs 87Categorical Regression Output 93

Chapter 4 Building Hierarchical Linear Models 101Overview of Hierarchical Linear Mixed Models 102

A Two-Level Hierarchical Linear Model Example 102Mixed Models…Linear 104Mixed Models…Linear (Output) 113Mixed Models…Generalized Linear 116Mixed Models…Generalized Linear (Output) 120Adjusting Model Structure 126

Part II Data Visualization 129

Chapter 5 Take Your Data Visualizations to the Next Level  131Graphics Options in SPSS Statistics 132Understanding the Revolutionary Approach in

The Grammar of Graphics 136Bar Chart Case Study 138Bubble Chart Case Study 143

Chapter 6 The Code Behind SPSS Graphics: Graphics Production Language  147Introducing GPL: Bubble Chart Case Study 147GPL Help 155Bubble Chart Case Study Part Two 156Double Regression Line Case Study 160Arrows Case Study 163MBTI Bubble Chart Case Study 167

Chapter 7 Mapping in IBM SPSS Statistics  173Creating Maps with the Graphboard Template Chooser 174

Creating a Choropleth of Counts Map 175Creating Other Map Types 179Creating Maps Using Geographical Coordinates 185

Chapter 8 Geospatial Analytics  193Geospatial Association Rules 194Case Study: Crime and 311 Calls 194Spatio-Temporal Prediction 207Case Study: Predicting Weekly Shootings 207

Chapter 9 Perceptual Mapping with Correspondence Analysis, GPL, and OMS 217Starting with Crosstabs 220Correspondence Analysis 224

Page 21: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

Contents xix

Multiple Correspondence Analysis 234Crosstabulations 234

Applying OMS and GPL to the MCA Perceptual Map 242

Chapter 10 Display Complex Relationships with Multidimensional Scaling  249Metric and Nonmetric Multidimensional Scaling 251Nonmetric Scaling of Psychology Sub‐Disciplines 251Multidimenional Scaling Dialog Options 253Multidimensional Scaling Output Interpretation 259Subjective Approach to Dimension Interpretation 264Statistical Approach to Dimension Interpretation 266

Part III Predictive Analytics 271

Chapter 11 SPSS Statistics versus SPSS Modeler: Can I Be a Data Miner Using SPSS Statistics?  275What Is Data Mining? 275What Is IBM SPSS Modeler? 276Can Data Mining Be Done in SPSS Statistics? 278Hypothesis Testing, Type I Error, and Hold-Out Validation 280Significance of the Model and Importance of Each

Independent Variable 284The Importance of Finding and Modeling Interactions 284Classic and Important Data Mining Tasks 287

Partitioning and Validating 288Feature Selection 291Balancing 294Comparing Results from Multiple Models 295Creating Ensembles 297Scoring New Records 300

Chapter 12 IBM SPSS Data Preparation  303Identify Unusual Cases 304

Identify Unusual Cases Dialogs 305Identify Unusual Cases Output 311

Optimal Binning 315Optimal Binning Dialogs 316Optimal Binning Output 321

Chapter 13 Model Complex Interactions with IBM SPSS Neural Networks 325Why “Neural” Nets? 326

The Famous Case of Exclusive OR and the Perceptron 328What Is a Hidden Layer and Why Is It Needed? 332

Neural Net Results with the XOR Variables 333How the Weights Are Calculated: Error Backpropagation 337Creating a Consistent Partition in SPSS Statistics 340

Page 22: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xx Contents

Comparing Regression to Neural Net with the Bank Salary Case Study 341

Calculating Mean Absolute Percent Error for Both Models 344Classification with Neural Nets Demonstrated with the

Titanic Dataset 349

Chapter 14 Powerful and Intuitive: IBM SPSS Decision Trees 355Building a Tree with the CHAID Algorithm 355Review of the CHAID Algorithm 360

Adjusting the CHAID Settings 363CRT for Classification 366

Understanding Why the CRT Algorithm Produces a Different Tree 368

Missing Data 369Changing the CRT Settings 369Comparing the Results of All Four Models 371Alternative Validation Options 373

The Scoring Wizard 374

Chapter 15 Find Patterns and Make Predictions with K Nearest Neighbors 379Using KNN to Find “Neighbors” 380The Titanic Dataset and KNN Used as a Classifier 381The Trade-Offs between Bias and Variance 386Comparing Our Models: Decision Trees, Neural Nets,

and KNN 388Building an Ensemble 391

Part IV Syntax, Data Management, and Programmability 393

Chapter 16 Write More Efficient and Elegant Code with SPSS Syntax Techniques 395A Syntax Primer for the Uninitiated 396

Making the Connection: Menus and the Grammar of Syntax 401What Is “Inefficient” Code? 403

The Case Study 404Customer Dataset 406Fixing the ZIP Codes 407Addressing Case Sensitivity of City Names with UPPER() and

LOWER() 409Parsing Strings and the Index Function 410Aggregate and Restructure 410Pasting Variable Names, TO, Recode, and Count 412DO REPEAT Spend Ratios 414Merge 415Final Syntax File 417

Chapter 17 Automate Your Analyses with SPSS Syntax and the Output Management System  421Overview of the Output Management System 422Running OMS from Menus 423

Page 23: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

Contents xxi

Automatically Writing Selected Categories of Output to Different Formats 424

Suppressing Output 429Working with OMS data 436Running OMS from Syntax 438

Chapter 18 Statistical Extension Commands  441What Is an Extension Command? 441TURF Analysis—Designing Product Bundles 444

Large Problems 449Quantile Regression—Predicting Airline Delays 450Comparing Ordinary Least Squares with Quantile Regression

Results 455Operational Considerations 459

Support Vector Machines—Predicting Loan Default 461Background 461An Example 464Operational Issues 467

Computing Cohen’s d Measure of Effect Size for a T-Test 468

Index 473

Page 24: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years
Page 25: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xxiii

Foreword

In my various roles at SPSS and IBM I met Keith and Jesus many years ago. They both have over 20 years of statistical consulting experience, and they both have been training people on statistics and how to use SPSS for many years. Each has in fact trained thousands of students. They are uniquely qualified to bring the message and content of this book to you, and they have done so with rigor and grace. SPSS has so many techniques and procedures to perform both simple and complex analysis, and Keith and Jesus will introduce you to this rich tapestry so that it pays dividends in benefiting your endeavors in driving societal change based on data and analytics for years to come. This book goes beyond the elementary treatments found in most of the other books on SPSS Statistics but is written for users who do not necessarily have an advanced statistical background. It can make the reader a better analyst by expanding their toolkit to include powerful techniques that he or she might not otherwise consider but that can have a big payoff in increased insight.

Keith and Jesus’ outstanding new book on SPSS Statistics has brought back so many thoughts about this great product and the influence it has had on so many people that I thought I would briefly reminisce.

I first became involved with this software when I went to work for SPSS in 1995 as Director of Quality Assurance. A year earlier, SPSS had released its first Microsoft Windows product—which, while solid, did not really take advantage of the amazing possibilities a true graphical interface could provide. This was a huge and important time for the company as the SPSS team was hard at work revolutionizing both the front-end user interface and the output to create a standard that is still in place and considered best of breed today. These innova-tions enabled sophisticated pivot table output as well as much more customized graphical output than had ever been attempted before. Indeed, in the years to come it was that spirit of always getting ahead of every technological trend

Page 26: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xxiv Foreword

that would keep this software right in the heart of what the data analysis com-munity demanded.

When I say the heart of the data analysis community I am not in any way exaggerating. This software has been used by hundreds of thousands of stu-dents in college and graduate school and by similar numbers in government and commercial environments worldwide. Over the years I have literally had hundreds, if not thousands of people say to me “I used SPSS in college” when I introduced myself. And of course, I can’t leave out the bootleg copies I have seen in innumerable places during my travels and personally purchased on the streets of Santiago and Beijing.

Impressive? Absolutely. But of course the real question is … WHY is SPSS so heavily used and so well loved? WHY has its community of users stayed vibrant and loyal even eight years after the company itself was acquired by IBM?

The answer is the combination of power and simplicity combined with elegance. This is a big statement. To back this up—and apropos of the subject matter—I’ll contribute a data point as my best evidence. A few years ago, when I was still with IBM (which acquired SPSS in 2009), we hired a summer intern who had used our software for a semester in college. After about a month on the job, we debriefed her on the progress of her user interface design assignment. She discussed at length the challenges she was having coming up with a design that was up to the standard of the rest of the product in terms of simplicity, backed by immense power. This led to a discussion of the first time she used the product as a student. Of course, opening a “statistics” product for the first time filled this iPhone-using millennial with much trepidation; however, as she described to us within just a few minutes she was loading and manipulating data, building predictive models, and producing output for her class. In just a short time beyond that she was digging into the depths of some of the power the product provided. Even a user nearly born and bred with the beautiful user designs of the smartphone consumer era was right at home using SPSS. What an amazing statement in and of itself. Think about it! This is made even more extraordinary because this same student had interactions with professors and researchers on her campus who were using—in fact, relying on—that very same product to do their cutting-edge work. As I said, the answer is the combination of power and simplicity combined with elegance.

This amazing simplicity does not come at the expense of power. As Keith and Jesus make clear in this book, SPSS Statistics is an incredibly powerful tool for data analysis and visualization. Even today there is no tool that works with its users of any level (novice, intermediate, or expert) to uncover meanings and relationships in data as powerfully as SPSS does. Further, once the data has been prepared, the models built, and the analysis done, there is no software available that is better at explaining the results to non-data analysts who have to act on it. This increases the value of the tool immeasurably—since it creates the understanding and confidence to deploy its insights into the real world to

Page 27: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

Foreword xxv

create real value. Having seen this done so many times, by so many people, in so many domains, I can say to those starting with this product for the first time that I truly envy you—you are about to start on a journey of learning and getting results that will amaze you—and the people you work with.

Let’s put this all in perspective. This product is now in its sixth decade of existence. That’s right—it first came out in the late 1960s. How many products can you name that have survived and prospered for that long? Not many. The Leica M camera and the Porsche 911 car with their classic timeless designs come to mind, but not much else. How many COMPUTER products? Even less; perhaps only the venerable IBM mainframe, in fact. But here we have IBM SPSS Statistics—not only surviving but still as relevant and vital as ever—right in the midst of the new age of big data and machine learning, heavily used by experts who dig deep into data and model building, but usable by novices in the iPhone era as well.

Now, let us switch our focus from celebrating the vibrancy and staying power of the SPSS journey and into the heart of what Keith and Jesus have addressed in this book. This is first and foremost a book for data analysis practitioners at intermediate and advanced levels. The question this begs is how this product can help that audience create the most value in the modern era.

Unlike the world of the late 1960s when SPSS was created, we now live in an age where there are many tools to do quick and fast analysis of datasets. For example, Tableau is a fine tool for more business-oriented users with less data analysis training to get immediate and useful visual insights from their data. So what then is the need for IBM SPSS Statistics in this new world?

To answer that question, let me take you back several years to a conference called “MinneAnalytics,” sponsored by a Minnesota-based organization of analytic professionals, where I delivered a presentation on Advanced Analytics called “What’s Your World View?” In that presentation, I envisioned a rapidly approaching new age where “big data” would meet advanced analytic tech-niques running in real time and that combination would drive every decision- making aspect of how our society would work. I compared the importance of this movement to previous huge steps that changed the very foundation of society—including the invention of the automobile and the invention of assembly-line production for manufacturing many different types of goods.

Well, a mere three years later that “future” society is here already—right now. It is happening all around us. Analytics on big data is driving decision making and processes everywhere you look. Hospitals apply real-time analytics to data feeds from patient-monitoring instruments in intensive care units to message doctors automatically that their patient in the ICU will shortly take a turn for the worse. Firms managing trucking use analytics to intervene proactively when the system tells them one of their drivers is predicted to have an accident. Airplanes and cars apply real-time analytics to engine sensors to predict failure and inform the pilots and drivers to take action before such failure occurs. Indeed, big data

Page 28: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xxvi Foreword

analytics has become one of the most disruptive forces in business history and is unleashing new value creation quite literally wherever you look. All of these examples clearly show a fundamental point—quick visual understanding is one thing—but deep insight yielding confidence in a predictive model that is deployed in real time at critical decision points at vast scale is quite another. It is in this realm of confirmation and confidence that SPSS Statistics shines like no other.

Mass deployment of advanced analytics will create benefits for society that are for all intents and purposes unimaginable. Assuming, of course, that the deployed analytics are in fact correct (and with the right tweaking and trade-offs between accuracy and stability) and deployed properly. It is the almost unique benefit of SPSS that no matter what language in which those analytics are built (SPSS, R, Python, supervised or unsupervised, standard or machine learning, executed programmatically or through visual interfaces, or any other variant you can think of) the product can be used to confirm confidence that the desired results will be achieved, and in understanding the risks involved. It can also be used to explain the results to others in the enterprise, aligning those who need to be in the know on exactly and precisely how analytics drive their new business models. There is no better “hub” for data scientists to practice their craft and contribute their value to the creation of a new world—a new world of staggering rates of change guided or driven by data and analytics.

IBM SPSS Statistics is the perfect tool for this new world when used by well-trained analysts who can put all the data and all the insights together without mistakes to create the most value. People who can take the output of machine learning, add traditional data and then other new forms of data (like sensors and social media for example), to get insights well beyond those quick insights from Tableau and other surface-level tools. People who know how to use the advanced capabilities of the tool, such as the ability to do mixed model analysis of data at different levels (for example, within a hierarchy to find even deeper insights). Such a tool, in the hands of such people—well-trained data scientists—can drive us into this new remarkable world with both confidence and safety. To become one of those who drive this societal transformation using SPSS you can benefit from having this book as your guide.

Enjoy the book…and enjoy the next 50 years of IBM SPSS Statistics as well! — Jason Verlen

Jason Verlen is currently Senior Vice President of Product Management and Marketing at CCC Information Services, based in Chicago. Before moving to CCC he spent 20 years at SPSS and then IBM (after its acquisition of SPSS) in various roles ending with being named Vice President of Big Data Analytics at IBM.

Page 29: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xxvii

Introduction

This book is a collaboration between me (Keith) and several other career-long “SPSSers,” and the editorial decisions about what to cover, and how to cover it, are greatly affected by that fact. My own career took a turn down a road that led to a life of learning, teaching, and consulting about SPSS almost 20 years ago. I was contemplating a PhD in Psychometrics at the University of North Carolina, Chapel Hill. My plans didn’t get much further than auditing some prerequisites and establishing residency. So, on paper, I hadn’t made much progress, but moving 1000 miles (from Massachusetts) to relocate and purchasing a house represented a milestone in my life and career. I’m still in that same house (more than 22 years now), and I’m still using SPSS almost daily. Like many things in life, it seems almost accidental. I was doing contract statistics work using SPSS, working from home while I planned for a life in graduate school, and I drove up to Arlington, VA to take advantage of what SPSS training then called the training “subscription.”

The concept was to take as many classes as you can manage in a year. It was remarkably cost effective. I was able to convince my primary contract client to pay for the subscription under the condition that I covered all other expenses, and didn’t let it affect my deadlines. I already had several years of daily SPSS use under my belt, so I was hardly a rookie, but it was too good to pass up. I found a summer sublet in Washington, DC, took advantage of the training classes almost daily for a couple of months, learned all the latest features, learned about modules that I had never tried, made some good new friends, and worked late into the evening trying to keep my contract research work on schedule. Then suddenly I was asked if I wanted to relocate and take on teaching the basic classes in that same office. I declined the full-time position (the grad school idea was still alive), but I did start making occasional trips. Within a year they were frequent trips, and it became effectively full time, including training trips all over the United States and Canada.

Page 30: SPSS® Statistics - download.e-bookshelf.de · with standard print versions of this book may not be included in e-books or in print-on-demand. If this book ... more than 30 years

xxviii Introduction

A bit of nostalgia, perhaps, but there is a good reason to reflect on that time period in SPSS Inc.’s history. As Jason Verlen notes in his foreword to this book, the mid to late ’90s was a pivotal time in the development of SPSS. With Windows 95 came a whole new world, and SPSS Inc. leaped into the fray. Also, in the late ’90s, SPSS Inc. bought ISL, and with it, Clementine. The revolutionary software package then became SPSS Clementine, and is now called IBM SPSS Modeler. While this book is dedicated to SPSS Statistics and not SPSS Modeler, my career certainly was never quite the same since. Although that was the acquisition that most influenced my career, it was certainly not the only one. There were numerous acquisitions during that period, growing the SPSS family to include products like AMOS, SPSS Data Collection, and Showcase.

It was also a bit of a golden age in SPSS training. Almost 20 of us offered SPSS training frequently. On any given day, there were at least a couple of SPSS training events being held in one of several cities that had permanent full-time SPSS training facilities. Traveling to public training was common then—online training hadn’t yet arrived. It simply was how training was done. In light of this very active, live, corporate-managed, instructor-led training economy more than 30 distinct classes were offered that represented 50–60+ days of training content. It took me three years before I found myself teaching 80% of them, and even longer before I taught all of them. Classroom training was seen as a key way to support the user community, so even classes that were infrequent, and therefore not very profitable, were still scheduled to support the product. Everything changes over time, and certainly traveling cross-country to a corporate training center for 5 continuous days of training, with a stack of huge books, along with 16 strangers from other companies seems quaint now.

For all of us who experienced it as trainers and participants, however, we are forever changed. One of the things that always struck me, and that still knocks me off my feet, was that the 32 books we used were not enough! SPSS had so many great new features coming out with each new version that it was hard to keep up, even though we were in the classroom three-quarters of the time. The Arlington office frequently had another trainer teaching in a room next door, so we would have lunch together, and admit to each other that we had left ourselves with a few too many pages for day three. Day three! And that was just the Regression class! We’d sometimes lament that someone had shown up for a class, but had skipped one or more of the three prerequisites. Can you imagine? Seven days of prerequisites to take a training class! It just wouldn’t work to require that many days now, but we worked hard, and covered a lot of ground, and we went through all the software output, step by step. Then we would make a change to the model, or respond to an audience question, and go through the entire output again, step by step. Go ahead and admit it—if you are like us it probably sounds great. And it was.