Transcript
Page 1: Classification Using Statistically Significant Rules

1

Classification Using Statistically Significant Rules

Sanjay ChawlaSchool of IT

University of Sydney

(joint work with Florian Verhein and Bavani Arunasalam)

Page 2: Classification Using Statistically Significant Rules

2

Overview

• Data Mining Tasks

• Associative Classifiers

• Support and Confidence for Imbalanced Datasets

• The use of exact tests to mine rules

• Experiments

• Conclusion

Page 3: Classification Using Statistically Significant Rules

3

Data Mining

• Data Mining research has settled into an equilibrium involving four tasks

Pattern Mining(Association Rules)

Classification

Clustering Anomaly or OutlierDetection

Associative Classifier

DB

ML

Page 4: Classification Using Statistically Significant Rules

4

Outlier Detection

• Outlier Detection (Anomaly Detection) can be studied from two aspects– Unsupervised

• Nearest Neighbor or K-Nearest Neighbor Problem

– Supervised• Classification for imbalanced data set

– Fraud Detection– Medical Diagnosis

Page 5: Classification Using Statistically Significant Rules

6

Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD)

Example:Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk(

s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

– An implication expression of the form X Y, where X and Y are itemsets

– Example: {Milk, Diaper} {Beer}

• Rule Evaluation Metrics– Support (s)

• Fraction of transactions that contain both X and Y

– Confidence (c)• Measures how often items in Y

appear in transactions thatcontain X

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

From “Introduction to Data Mining”, Tan,Steinbach and Kumar

Page 6: Classification Using Statistically Significant Rules

7

Association Rule Mining (ARM) Task

• Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold– confidence ≥ minconf threshold

• Brute-force approach:– List all possible association rules– Compute the support and confidence for each rule

– Prune rules that fail the minsup and minconf thresholds

Computationally prohibitive!

Page 7: Classification Using Statistically Significant Rules

8

Mining Association Rules

• Two-step approach:

1. Frequent Itemset Generation– Generate all itemsets whose support minsup

2. Rule Generation– Generate high confidence rules from each

frequent itemset, where each rule is a binary partitioning of a frequent itemset

• Frequent itemset generation is still computationally expensive

Page 8: Classification Using Statistically Significant Rules

9

Frequent Itemset Generationnull

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Given d items, there are 2d possible candidate itemsets

From “Introduction to Data Mining”, Tan,Steinbach and Kumar

Page 9: Classification Using Statistically Significant Rules

10

Reducing Number of Candidates

• Apriori principle:– If an itemset is frequent, then all of its subsets

must also be frequent

• Apriori principle holds due to the following property of the support measure:

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

)()()(:, YsXsYXYX

Page 10: Classification Using Statistically Significant Rules

11

Found to be Infrequent

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Illustrating Apriori Principlenull

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDEPruned supersets

From “Introduction to Data Mining”, Tan,Steinbach and Kumar

Page 11: Classification Using Statistically Significant Rules

12

Classification using ARMTID Items Gender

1 Bread, Milk F

2 Bread, Diaper, Beer, Eggs M

3 Milk Diaper, Beer, Coke M

4 Bread, Milk, Diaper, Beer M

5 Bread, Milk, Diaper, Coke F

In a Classification task we want to predict the class label (Gender) using the attributes

A good (albeit stereotypical) rule is {Beer,Diaper} Male whose support is 60% and confidence is 100%

Page 12: Classification Using Statistically Significant Rules

13

Classification Based on Association (CBA): Liu, Hsu and Ma (KDD 98)

• Mine association rules of the form Ac on the training data

• Prune or Select rules using a heuristic• Rank rules

– Higher confidence; higher support; smallest antecedent

• New data is passed through the ordered rule set– Apply first matching rule or variation thereof

Page 13: Classification Using Statistically Significant Rules

14

Several Variations

Acronym Authors Forum Comments

CMAR Li, Han, Pei ICDM 01 Use Chi^2

-- Antonie, Zaiane

DMKD 04 Pos and Neg rules

FARMER Cong et. al SIGMOD 04 row enumeration

Top-K Cong et. al SIGMOD 05 Limit no. of rules

CCCS Arunasalam et. Al.

SIGKDD 06 Support-free

Page 14: Classification Using Statistically Significant Rules

17

Downsides of Support (3)• Support is biased towards the majority class

– Eg: classes = {yes, no}, sup({yes})=90%– minSup > 10% wipes out any rule predicting

“no”– Suppose X no has confidence 1 and support

3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class!

• In summary, support has many downsides –especially for classification.

Page 15: Classification Using Statistically Significant Rules

18

Downside of Confidence(1)

A20 5 25

70 5 75

90 10 100

AC CConf(A C) = 20/25 = 0.8

Support(AC) = 20/100 = 0.2

Correlation between A and C:

189.090.025.0

20

)()(

),(

CA

CA

Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated.

Page 16: Classification Using Statistically Significant Rules

19

Downside of Confidence (2)

• Reasonable to expect that for “good rules” the antecedent and consequent are not independent!

• Suppose – P(Class=Yes) = 0.9– P(Class=Yes|X) = 0.9

Page 17: Classification Using Statistically Significant Rules

20

Complement Class Support(CCS)

)(

),()(

C

CACACCS

The following are equivalent for a rule A C

1. A and C are positively correlated

2. The support of the antecedent(A) is less than CCS(AC)

3. Conf(AC) is greater than the support of Consequent(C)

Page 18: Classification Using Statistically Significant Rules

21

Downsides of Confidence (3)Another useful observation

• Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class.

• Confidence (support) tends to bias the majority class.

Page 19: Classification Using Statistically Significant Rules

22

Statistical Significant Rules

• Support is a computationally efficient measure (anti-monotonic)– Tendency to “force” a statistical interpretation

on support

• Lets start with a statistically correct approach– And “force” it to be computationally

efficient

Page 20: Classification Using Statistically Significant Rules

23

Exact Tests

• Let the class variable be {0,1}.

• Suppose we have two rules X 1 and Y1

• We want to determine if the two rules are different, i.e., they have different effect on “causing” or ‘associating” with 1 or 0– e.g., medicine and placebo

Page 21: Classification Using Statistically Significant Rules

24

Exact Tests

21

2121

2121

11

][][0

][][1

mm

nnNnn

mdxmncxn

mbxmax

YX

We assume X and Y are binomial random variables with the same parameter p

We want to determine, the probability that a specific table instance occurs “purely by chance”

Table[a,b;c,d]

Page 22: Classification Using Statistically Significant Rules

25

Exact Tests

1

21

1

21

1

21

1

21

1

1

1

11

1211

1211

)1(

)1(

)(

)()()(

),()|(

m

nn

xm

n

x

n

ppm

nn

ppxm

npp

x

nmYXP

xmYPxXPmYXP

xmYxXPmYXxXP

mnnm

xmnxmxnx

We can calculate the exact probability of a specific table instance without resorting to asymptotic approximations.

This can be used to calculate the p-value of [a,b;c,d]

Page 23: Classification Using Statistically Significant Rules

26

Fisher Exact Test

• Given a table, [a,b;c,d], Fisher Exact Test will find the probability (p-value) of obtaining the given table or a more positively associated table under the assumption that X and Y come from the same distribution.

Page 24: Classification Using Statistically Significant Rules

27

Forcing “anti-monotonic”

• We test a rule X 1 against all its immediate generalizations {X-z 1; z in X}

• The

• The rule is significant if – Pvalue < significance level (typically 0.05)

• Use a bottom up approach and only test rules whose immediate generalizations are significant

• Webb[06] has used Fisher Exact tests for generic association rule mining

]},;,[{min dcbappXz

value

Page 25: Classification Using Statistically Significant Rules

28

Example

• Suppose we have already determined that the rules (A = a1) 1 and (A = a2) 1 are significant.

• Now we want to test if – X=(A =a1) ^ (A=a2) 1 is significant

• Then we carry out a FET on X and X –{A=a1} and X and X-{A=a2}.

• If the minimum of their p-value is less than the significance level we keep the X 1 rule, otherwise we discard it.

Page 26: Classification Using Statistically Significant Rules

30

Ranking Rules

• We have already observed that – high confidence rule for the majority class

may be “more” negatively correlated than the same rule predicting the other class

– A high positively correlated rule that predicts the minority class may have a lower confidence than the same rule predicting the other class

Page 27: Classification Using Statistically Significant Rules

31

Experiments: Random Dataset

• Attributes independent and uniformly distributed.• Makes no sense to find any rules – other than by chance• However minSup=1% and minConf=0.5 mines

4149/13092 -- over 31% of all possible rules. • Using our FET technique with standard significance level

we find only 11 (0.08%)

Page 28: Classification Using Statistically Significant Rules

32

Experiments: Balanced Dataset

• Similar performance (within 1%)

Page 29: Classification Using Statistically Significant Rules

33

Experiments: Balanced Dataset

• But mines only 0.06% the number of rules• By searching only 0.5% of the search space • And using only 0.4% of the time.

Page 30: Classification Using Statistically Significant Rules

34

Experiments: Imbalanced Dataset

• Higher performance than support-confidence techniques• Using 0.07% of the search space and time, 0.7% the

number of rules.

Page 31: Classification Using Statistically Significant Rules

35

Contributions• Strong evidence and arguments against the use

of support and confidence for imbalanced classification

• Simple technique for using Fisher’s Exact test for finding positively associated and statistically significant rules. – Uses on average 0.4% of the time, searches only

0.5% of the search space, finds only 0.06% of the rules as support-confidence techniques.

– Similar performance on balanced datasets, higher on imbalanced datasets.

• Parameter free (except for significance level)

Page 32: Classification Using Statistically Significant Rules

36

References

• Verhein and Chawla, Classification Using Statistically Significant Rules http://www.it.usyd.edu.au/~chawla

• Arunasalam and Chawla, CCCS: A Top-Down Associative Classifier for Imbalanced Class Distribution [ACM SIGKDD 2006; pp 517-522]


Recommended