Download ppt - Chapter 8 Decision Tree Algorithms Rule Based Suitable for automatic generation

Chapter 8Chapter 8Decision Tree AlgorithmsDecision Tree Algorithms

Rule Based

Suitable for automatic generation

結束

8-2

ContentsContents

Presents the concept of decision tree models

Discusses the concept rule interestingness

Demonstrates decision tree rules on a case

Review real applications of decision tree models

Shows the application of decision tree models to larger data sets

Demonstrates See5 decision tree analysis in the appendix

結束

8-3

ProblemsProblems

Grocery stores has a massive data problem in inventory control, dealt with a high degree by bar-coding.The massive data database of transactions can be mined to monitor customer demand.Decision trees provide a means to obtain product-specific forecasting models in the form of rules (IF-THEN) that are easy to implement.Decision-tree can be used by grocery stores in a number of policy decisions, including ordering inventory replenishment and evaluating alternative promotion campaigns.

結束

8-4

Decision treeDecision tree

Decision tree refers to the tree structure of rules (often association rules).The decision tree modeling process involves collecting those variables that the analyst thinks might bear on the decision at issue, and analyzing these variables for their ability to predict the outcome.The algorithm automatically determines which variables are most important, based on their ability to sort the data into the correct output category.Decision tree has a relative advantage over ANN and GA in that a reusable set of rules are provided, this explaining model conclusions.

結束

8-5

Decision Tree ApplicationsDecision Tree Applications

Classifying loan applications, screening potential consumers, and rating job applicants.

Decision tree provide a way to implement rule-based system approach. 監督式學習模型

樹型結構關聯模式規則誘發

決策樹(類別型屬性)

迴歸樹(連續型屬性)

C4.5/C5ID3

CART CART

M5

CN2

ITRULE

Cubist

監督式學習模型

樹型結構關聯模式規則誘發

決策樹(類別型屬性)

迴歸樹(連續型屬性)

C4.5/C5ID3

CART CART

M5

CN2

ITRULE

Cubist

結束

8-6

Types of TreesTypes of Trees

Classification treeVariable values classesFinite conditions

Regression treeVariable values continuous numbersPrediction or estimation

結束

8-7

Rule InductionRule Induction

Automatically process dataClassification (logical, easier)Regression (estimation, messier)

Search through data for patterns & relationshipsPure knowledge discovery

Assumes no prior hypothesisDisregards human judgment

結束

8-8

Decision treesDecision trees

Logical branching

Historical:ID3 – early rule- generating systemC4.5See 5

Branches:Different possible values

Nodes:From which branches emanate (spray)

結束

8-9

Decision tree operationDecision tree operation

A bank may have a database of pass loan applicants for short-term loans. (see table. 4.4)

The bank’s policy treats applicants differently by age group, income level, and risk.

A tree sorts the possible combinations of these variables. An exhaustive tree enumerates all combinations of variable values in Table. 8.1

結束

8-10


結束

8-11


A rule-based system model would require that bank loan officers who had respected judgment be interviewed to classify the decision for each of these combinations of variables.

Some situations can be directly reduced in decision tree.

結束

8-12

Rule interestingnessRule interestingness

Data, even categorical data can potentially involve many rules.

See Table. 8.1, 333=27 combinations. If 10 variables and each with 4 possible values(410=1,048,576), the combinations become over a million. unreasonable

Decision tree models identify the most useful rules in terms of predicting outcomes. Rule effectiveness is measured in terms of confidence and support.Confidence is the degree of accuracy of a ruleSupport is the degree to which the antecedent

condition occur in the data.

結束

8-13

Tanagra ExampleTanagra Example

結束

8-14

Support & ConfidenceSupport & Confidence

Support for an association rule indicates the proportion of records covered by the set of attributes in the association rules.

Example: If there were 10 million book purchases, support for the given rule would be 10/10,000,000, a very small support measure of 0.000001. These concepts are often used in the form of threshold levels in machine learning systems.

Minimum confidence levels and support levels can be specified to retain rules identified by decision tree.

結束

8-15

Machine learningMachine learning

Rule-induction algorithms can automatically process categorical data (also can work on continuous data). A clear outcome is needed.

Rule induction works by searching through data for patterns and relationships.

Machine learning starts with on assumptions, looking only at input data and results.

Recursive partitioning algorithms split data (original data) into finer and finer subsets leading to a decision tree.

結束

8-16

CasesCases

20 past loan application cases in Table 8.3.

結束

8-17

CasesCases

Automatic machine learning begins with identifying those variables that offer the great likelihood of distinguishing between the possible outcomes.

For each of the three variables, the outcome probabilities illustrated in Table 8.5 (next slide)

Most data mining packages use an entropy measure to gauge the discriminating power of each variable (split data) (Chi-square measures can also be used to select variables)

結束

8-18

CasesCasesTable 8.4 Grouped data

Table 8.5 Combination outcomes

結束

8-19

Entropy formulaEntropy formula

Where p is the number of positive examples and n is the number of negative examples in the training set for each value of the attribute.

The lower the measure (entropy), the greater the information content

Can use to automatically select variable with most productive rule potential

np

nlog

np

n-

np

plog

np

p-Inform 22

結束

8-20

Entropy formulaEntropy formula

The entropy formula has a problem if either p or n are 0, then the log2 is undefined.Entropy for each Age category generated by the formula is shown in Table 8.6.

Category Young: [-(8/12)(-0.585)-(4/12)(-1.585)](12/20)=0.551The lower entropy measure, the greater the information content (the greater the agreement probability)Rule: If (Risk=low) then Predict on-time payment

Else predict late

np

nlog

np

n-

np

plog

np

p-Inform 22

× + × ×

結束

8-21

EntropyEntropy

Young- 8/12 × -0.390 – 4/12 × -0.528 × 12/20: 0.551

Middle- (4/5 × -0.322) – (1/5 x -2.322) × 5/20: 0.180

Old- 3/3 × 0 – 0/3 × 0 × 3/20: 0.000

SUM 0.731Income 0.782Risk 0.446 (lowest)

By the measures, Risk has the greatest information content. If Risk is low, the data indicates a 1.0 probability that the applicant will pay the loan back on time.

結束

8-22

EvaluationEvaluation

Two type errors may occur:1. Those applicants rated as low risk actually not pay on

time (from the data, the probability of this case is 0.0)

2. Those applicants rated as high or average risk may actually have paid if given a loan. (from the data, the probability of this happening is 0.25. 5/20=0.25)

Expected error 0.250.5 (the p of being wrong)=0.125 Test the model using another set of data.

結束

8-23


The entropy formula for Age, given the risk was not low 0.99, while the same calculation for income is 1. 971. Age has greater discriminating power.

If age is middle, the one case did not pay one time.

If (Risk is Not low) AND (Age=Middle)

Then Predict late

Else Predict On-time

結束

8-24


For last variable, income, give that Risk was not low, and Age was not Middle, there are nine cases left and shown in Table 8.8.

A third rule takes advantage of the case with u’nanimous (agree) outcome is:

If (Risk Not low) AND (Age NOT middle) AND (income high)

Then predict Late

Else Predict on-time

See page 141 for more explanations

結束

8-25

Rule accuracyRule accuracy

The expected accuracy of the three rules is shown in Table 8.9.

The expected error is 8/20=0.375 (1-0.625)

An additional rule could be generated. For the case of Risk not low, Age=Young, and Income Not high (four cases with low income (p of on-time) =0.5 and four cases with average income (p of on-time = 0.75)

The greater discrimination is provided by average income, resulting in the following rule:

If (Risk not low) and (Age not middle) and (income average)

Then predict on-time

Else predict either

結束

8-26


There is no added accuracy obtained with this rule, shown in Table 8.10.

The expected error is 4/20 times 0.5 = 0.15 the same without the rule.

When machine learning methods encounter no improvement, they generally stop.

結束

8-27


Table 8.11 shows the results.

結束

8-28


結束

8-29

Inventory PredictionInventory Prediction

GroceriesMaybe over 100,000 SKUsBarcode data input

Data mining to discover patternsRandom sample of over 1.6 million records30 months95 outletsTest sample 400,000 records

Rule induction more workable than regression28,000 rulesVery accurate, up to 27% improvement

結束

8-30

Clinical DatabaseClinical Database

HeadacheOver 60 possible causes

Exclusive reasoning uses negative rulesUse when symptom absent

Inclusive reasoning uses positive rules

Probabilistic rule induction expert systemHeadache: Training sample over 50,000 cases, 45

classes, 147 attributesMeningitis(腦膜炎 ): 1200 samples on 41 attributes, 4

outputs

結束

8-31

Clinical DatabaseClinical Database

Used AQ15, C4.5Average accuracy 82%

Expert SystemAverage accuracy 92%

Rough Set Rule SystemAverage accuracy 70%

Using both positive & negative rules from rough setsAverage accuracy over 90%

結束

8-32

Software Development QualitySoftware Development Quality

Telecommunications company

Goal: find patterns in modules being developed likely to contain faults discovered by customersTypical module several million lines of codeProbability of fault averaged 0.074

Apply greater effort for thoseSpecification, testing, inspection

結束

8-33

Software QualitySoftware Quality

Preprocessed dataReduced dataUsed CART (Classification & Regression Trees)Could specify prior probabilities

First model 9 rules, 6 variablesBetter at cross-validationBut variable values not available until late

Second model 4 rules, 2 variablesAbout same accuracy, data available earlier

結束

8-34

Rules and evaluationRules and evaluation

結束

8-35

Rules and evaluationRules and evaluation

The Second models rules

Two models were very close in accuracy. The fist model was better at cross validation accuracy, but its variables were available just prior to release.The second model had the advantage of being based on data available at an earlier state and required less extensive data reduction.See also, page .146 for expert system

結束

8-36

Applications of methods to larger data setsApplications of methods to larger data sets

Expenditure application to find the characteristics of potential customers for each expenditure category.

A simple case is to categorize clothing expenditures (or other expenditures in the data set) per year as a 2-class classification problem.

Data preparation data transformation, see page 154

Comparisons of A-prioriC4.5C5.0

結束

8-37

Fuzzy Decision TreesFuzzy Decision Trees

Have assumed distinct (crisp) outcomes

Many data points not that clear

Fuzzy: Membership function represents belief (between 0 and 1)

Fuzzy relationships have been incorporated in decision tree algorithms

結束

8-38

Fuzzy ExampleFuzzy Example

Age Young 0.3 Middle 0.9 Old 0.2

Income Low 0.0 Average 0.8 High 0.3

Risk Low 0.1 Average 0.8 High 0.3

Definitions:Sum will not necessarily equal 1.0If ambiguous, select alternative with larger

membership valueAggregate with mean

結束

8-39

Fuzzy ModelFuzzy Model

IF Risk=Low Then OTMembership function: 0.1

IF Risk NOT Low & Age=Middle Then LateRisk MAX(0.8, 0.3)Age 0.9Membership function: Mean = 0.85

結束

8-40

Fuzzy Model cont.Fuzzy Model cont.

IF Risk NOT Low & Age NOT Middle & Income=High THEN LateRisk MAX(0.8, 0.3) 0.8Age MAX(0.3, 0.2) 0.3Income 0.3Membership function: Mean = 0.433

結束

8-41


IF Risk NOT Low & Age NOT Middle & Income NOT High THEN LateRisk MAX(0.8, 0.3) 0.8Age MAX(0.3, 0.2) 0.3Income MAX(0.0, 0.8) 0.8Membership function: Mean = 0.633

結束

8-42


Highest membership function is 0.633, for Rule 4

Conclusion: On-time

結束

8-43

Decision TreesDecision Trees

Very effective & useful

Automatic machine learningThus unbiased (but omit judgment)

Can handle very large data setsNot affected much by missing data

Lots of software available