IBM SPSS Decision Trees 20

  • View
    216

  • Download
    1

Embed Size (px)

Text of IBM SPSS Decision Trees 20

  • i

    IBM SPSS Decision Trees 20

  • Note: Before using this information and the product it supports, read the general informationunder Notices on p. 104.

    This edition applies to IBM SPSS Statistics 20 and to all subsequent releases and modificationsuntil otherwise indicated in new editions.Adobe product screenshot(s) reprinted with permission from Adobe Systems Incorporated.Microsoft product screenshot(s) reprinted with permission from Microsoft Corporation.

    Licensed Materials - Property of IBM

    Copyright IBM Corporation 1989, 2011.

    U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.

  • PrefaceIBM SPSS Statistics is a comprehensive system for analyzing data. The Decision Treesoptional add-on module provides the additional analytic techniques described in this manual.The Decision Trees add-on module must be used with the SPSS Statistics Core system and iscompletely integrated into that system.

    About IBM Business Analytics

    IBM Business Analytics software delivers complete, consistent and accurate information thatdecision-makers trust to improve business performance. A comprehensive portfolio of businessintelligence, predictive analytics, financial performance and strategy management, and analyticapplications provides clear, immediate and actionable insights into current performance and theability to predict future outcomes. Combined with rich industry solutions, proven practices andprofessional services, organizations of every size can drive the highest productivity, confidentlyautomate decisions and deliver better results.

    As part of this portfolio, IBM SPSS Predictive Analytics software helps organizations predictfuture events and proactively act upon that insight to drive better business outcomes. Commercial,government and academic customers worldwide rely on IBM SPSS technology as a competitiveadvantage in attracting, retaining and growing customers, while reducing fraud and mitigatingrisk. By incorporating IBM SPSS software into their daily operations, organizations becomepredictive enterprises able to direct and automate decisions to meet business goals and achievemeasurable competitive advantage. For further information or to reach a representative visithttp://www.ibm.com/spss.

    Technical support

    Technical support is available to maintenance customers. Customers may contact TechnicalSupport for assistance in using IBM Corp. products or for installation help for one of thesupported hardware environments. To reach Technical Support, see the IBM Corp. web siteat http://www.ibm.com/support. Be prepared to identify yourself, your organization, and yoursupport agreement when requesting assistance.

    Technical Support for Students

    If youre a student using a student, academic or grad pack version of any IBMSPSS software product, please see our special online Solutions for Education(http://www.ibm.com/spss/rd/students/) pages for students. If youre a student using auniversity-supplied copy of the IBM SPSS software, please contact the IBM SPSS productcoordinator at your university.

    Customer Service

    If you have any questions concerning your shipment or account, contact your local office. Pleasehave your serial number ready for identification.

    Copyright IBM Corporation 1989, 2011. iii

    http://www-01.ibm.com/software/data/businessintelligence/http://www-01.ibm.com/software/data/businessintelligence/http://www-01.ibm.com/software/analytics/spss/http://www-01.ibm.com/software/data/cognos/financial-performance-management.htmlhttp://www-01.ibm.com/software/data/cognos/products/cognos-analytic-applications/http://www-01.ibm.com/software/data/cognos/products/cognos-analytic-applications/http://www.ibm.com/spsshttp://www.ibm.com/supporthttp://www.ibm.com/spss/rd/students/http://www.ibm.com/spss/rd/students/http://www.ibm.com/spss/rd/students/http://www.ibm.com/spss/rd/students/

  • Training Seminars

    IBM Corp. provides both public and onsite training seminars. All seminars feature hands-onworkshops. Seminars will be offered in major cities on a regular basis. For more information onthese seminars, go to http://www.ibm.com/software/analytics/spss/training.

    Additional Publications

    The SPSS Statistics: Guide to Data Analysis, SPSS Statistics: Statistical Procedures Companion,and SPSS Statistics: Advanced Statistical Procedures Companion, written by Marija Noruis andpublished by Prentice Hall, are available as suggested supplemental material. These publicationscover statistical procedures in the SPSS Statistics Base module, Advanced Statistics moduleand Regression module. Whether you are just getting starting in data analysis or are ready foradvanced applications, these books will help you make best use of the capabilities found withinthe IBM SPSS Statistics offering. For additional information including publication contentsand sample chapters, please see the authors website: http://www.norusis.com

    iv

    http://www.norusis.com

  • ContentsPart I: Users Guide

    1 Creating Decision Trees 1

    Selecting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Tree-Growing Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Growth Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9CHAID Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10CRT Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12QUEST Criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Pruning Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Surrogates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Misclassification Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Profits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Prior Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Saving Model Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    Tree Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Selection and Scoring Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2 Tree Editor 37

    Working with Large Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Tree Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Scaling the Tree Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Node Summary Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    Controlling Information Displayed in the Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Changing Tree Colors and Text Fonts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Case Selection and Scoring Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    Filtering Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Saving Selection and Scoring Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    v

  • Part II: Examples

    3 Data assumptions and requirements 48

    Effects of measurement level on tree models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Permanently assigning measurement level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Variables with an unknown measurement level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    Effects of value labels on tree models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Assigning value labels t