View
231
Download
3
Category
Preview:
Citation preview
http://www.csse.monash.edu.au/~webb
Intelligent Systems
Exploratory pattern discovery
Geoff Webb
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 2Intelligent Systems
Outline
• Tutorial covers • Data Mining• Exploratory Pattern Discovery• Association rules• Interestingness (objective functions)• False discoveries• Limitations of minimum support• K-most interesting pattern discovery• Itemset discovery• Contrast rule discovery• Impact rules
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 3Intelligent Systems
Part 1:
Data Mining
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 4Intelligent Systems
Data mining
• Data mining seeks to discover unanticipated knowledge from data
• Exponential growth in the quantity of data stored gives urgency to the pursuit of practical analytic approaches that address• Large volumes of data• Low quality data• Post-hoc analysis• Loosely defined analytical objectives
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 5Intelligent Systems
So what’s the big deal?
• Don’t statistics identify patterns in data?• Conventional statistics do not address
• searching quintillions of potential correlations Eg.
• market basket data 2100,000
• US phone calls 2100,000,000
• human genome 23,000,000,000
• selecting most interesting from millions of correlations
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 6Intelligent Systems
Example: Should we stock vitamins?
• Major national retailer with detailed records of customer purchasing behaviour
• Considering deleting a low volume product line
• Does data provide evidence of indirect contribution to bottom line?
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 7Intelligent Systems
Example: Steel rolling mill
• Complex control problem for expensive production process influenced by input materials, desired output and state of equipment
• Currently uses imperfect model
• Objective, use data to identify circumstances in which model is deficientPhoto courtesy G.C. Goodwin, S. Graebe and M. Salgado. Control System Design, Prentice Hall, 2000.
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 8Intelligent Systems
Example: Synchrotron x-ray data analysis
• Synchrotron x-ray scatter patterns reflect micro-structure of material analysed.
Normal Malignant
• Can x-ray scatter plots be used for cancer diagnosis?
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 9Intelligent Systems
A growth area
• The sum of human data stored doubles every 7 years
• Data mining is critical to commerce• Fraud detection
• Information retrieval
and to science• Bioinformatics
• Mass data analysis
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 10Intelligent Systems
Large unmet demand for good PhDs!
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 11Intelligent Systems
Beyond statistics
• Data mining goes beyond the traditional realm of statistics by encompassing • problem formulation • interactions between the business
process and the analytic process• knowledge management• data manipulation
Analytics
Business processes
Data
Other knowledge
sources
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 12Intelligent Systems
Generating models
• The core of the data mining process is generating models from data
Eg neural networks, support vector machines, decision trees
• Most research concentrates on this aspect• Surrounding activities are also very important
• Defining analytic task• Sourcing data• Preprocessing data• Identifying appropriate forms of model • Identifying appropriate techniques for generating models• Interpreting models• Applying models
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 13Intelligent Systems
Part 2:
Exploratory Pattern Discovery
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 14Intelligent Systems
The perils of model selection
• Many data mining techniques seek to identify a single model that best fits the observed data.
• In many applications many models will (almost) equally fit the data
bruises=f & gill-attachment=f & gill-spacing=c & ring-number=o → poisonous[Coverage=0.406 (3296); Support=0.388 (3152); Confidence=0.956]
bruises=f & gill-spacing=c & veil-color=w & ring-number=o → poisonous [Coverage=0.406 (3296); Support=0.388 (3152); Confidence=0.956]
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 15Intelligent Systems
Perils of model selection (cont.)
• Data mining systems often make arbitrary choices• without warning
• A system may have no basis on which to select models, but an expert often will• ease / cost of operatalisation
• comprehensibility / compatibility with existing knowledge and beliefs
• social / legal / ethical / political acceptability
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 16Intelligent Systems
Exploratory pattern discovery
• Exploratory pattern discovery seeks all patterns that satisfy user-defined constraints
• The user can select from these patterns• can use criteria that might be infeasible to
quantify
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 17Intelligent Systems
Patterns
• Rules:• <antecedent> <consequent>
• Itemsets• <condition1> & <condition2> & …
• Sequences• <event1>, <event2>, ….
• Structures
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 18Intelligent Systems
Rules
• <antecedent> <consequent>• IF <antecedent> THEN <consequent>
• IF temp >36.8 AND pulse > 120 THEN call doctor• Antecedent
= condition= left hand side, LHS= conditions under which antecedent holds / applies
• Consequent = conclusion= right hand side, RHS= action to perform or conclusion to reach
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 19Intelligent Systems
Theoretical foundations
• Substantial bodies of theory in Formal Logic, Computational Logic, and Artificial Intelligence can be brought to bear to utilise rules once they are inferred.
• If the antecedent entails the consequent and the antecedent is known (believed) then the consequent can be concluded.
• Can be extended to probabilistic basis.• Supports complex reasoning.• Modular knowledge representation.
• can capture knowledge nuggets
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 20Intelligent Systems
Rule discovery as search
• Rule discovery can be viewed as search through a space of expressible rules.
• The rule space (search space / description space) can be partially ordered on generality.
• A C is a generalisation of B C iff B entails A (A must be true if B is true)
• proper generalisation iff A does not also entail B
• If A C is a generalisation of B C then B C is a specialisation of A C.
• Eg. IF age > 30 THEN X is a generalisation of• IF age > 31 THEN X• IF age > 30 AND gender = male THEN X
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 21Intelligent Systems
{}
{A,B} {A,C} {A,D}{B,C} {B,D} {C,D}
{A} {B} {C} {D}
{A,B,C} {A,B,D} {A,C,D} {B,C,D}
Generalization lattice for antecedents
{A,B,C,D}
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 22Intelligent Systems
{}
{A,B} {A,C} {A,D}{B,C} {B,D} {C,D}
{A} {B} {C} {D}
{A,B,C} {A,B,D} {A,C,D} {B,C,D}
Search tree for antecedents
{A,B,C,D}
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 23Intelligent Systems
{}{A,B,C,D}
{A,B}{C,D}
{A,C}{B,D}
{A,D}{B,C}
{B,C}{A,D}
{B,D}{A,C}
{C,D}{A,B}
{A}{B,C,D}
{B}{A,C,D}
{C}{A,B,D}
{D}{A,B,C}
{A,B,C}{D}
{A,B,D}{C}
{A,C,D}{B}
{B,C,D}{A}
Search tree with consequent propagation
{A,B,C,D}{}
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 24Intelligent Systems
Propositional rule discovery
• Antecedent and consequent are propositions
• Often restricted to antecedent and consequent both conjunctions of Boolean terms• IF temp >36.8 AND pulse > 120 THEN
blood pressure > 140 AND condition = critical
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 25Intelligent Systems
Rule discovery is inherently intractable
• If • there are n propositions,
• antecedents can be any set of propositions and
• consequents are a single proposition
then
• size of search space ≈ n2n
• It is essential to use powerful pruning techniques to limit the search space
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 26Intelligent Systems
Part 3:
Association rules
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 27Intelligent Systems
Association rule discovery
• Developed for market basket analysis• a basket is a collection of products
purchased in a single transaction• an itemset is a set of products
• all baskets are itemsets• market basket analysis seeks to identify
products that are associated with each other• diapers and beer
• Can generalize to itemset = any conjunction of Boolean terms
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 28Intelligent Systems
Transaction and tabular data
• Transaction data• Each record is a set of items involved in a single
transaction• Eg. market basket, web site traversal, amino acids
in a protein• Tabular data
• Each record consists of a vector of values for the predefined attributes or fields
• Eg. A patient’s signs and symptoms, employee details, the amino acids at each site in a protein
• While association rules were developed for transaction data they generalise directly to attribute-value data
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 29Intelligent Systems
Support and confidence
• F(X) = proportion of records that satisfy condition X
• Coverage(AC) = F(A)• Support(AC) = F(A & C)• Confidence(AC) = support(AC) /
coverage(AC) • Maximum likelihood estimate of P(C | A)
A C
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 30Intelligent Systems
Frequent itemsets
• An itemset is frequent if its cover equals or exceeds a user defined minimum
• Downward closure • frequency is anti-monotone
• if an itemset I is not frequent then no specialization of I is frequent
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 31Intelligent Systems
Association rules
• Antecedent and consequent are frequent itemsets
• An association rule indicates that the presence of the antecedent increases the probability that the consequent will be present• bread & butter honey
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 32Intelligent Systems
Association rule discovery
• Requires minimum support constraint• Finds all rules that satisfy minimum
support together with other user specified constraints such as minimum confidence
• Example: 1000 transactions, 100 bread, 100 honey, 50 bread & honey• support(bread honey) = 0.05
• confidence(bread honey) = 0.50
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 33Intelligent Systems
The frequent itemset approach
• Find all frequent itemsets• Generate all association rules therefrom• Assumes
• a minimum support constraint
• sparse data
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 34Intelligent Systems
Finding frequent itemsets
• Once frequent itemsets are found rule generation is straightforward
• Research has concentrated on efficient frequent itemset generation
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 35Intelligent Systems
The Apriori algorithm
Apriori(T, ε)L1 ← frequent 1-itemsets relative to T
k ← 2
while Lk-1 ≠ Ck ← Generate(Lk-1)
for t T
for c Subsets(Ck, t)
count[c]++
Lk ← { c Ck | count[c] ≥ ε }
k++
return L
TRANSACTIONS
a,b,c
a,b,d
a,d
PROCESS, ε=2
L1 {{a},{b},{d}}
C2 {{a,b},{a,d},{b,d}}
L2 {{a,b},{a,d}}
C3 {{a,b,d}}
L3 {}
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 36Intelligent Systems
Closed itemsets
• In practice many itemsets cover exactly the same items• Eg pregnant, pregnant & woman
• A closed itemset is the most specific itemset that covers a particular set of items
• More efficient to find all closed frequent itemsets than all frequent itemsets
• Can generate all association rules from closed itemsets
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 37Intelligent Systems
Closed Itemsets Example
Full set of itemsets for gill-size=n, gill-color=b & spore-print-color=w gill-size=n [Coverage=2512]
spore-print-color=w [Coverage=2388]
gill-size=n & spore-print-color=w [Coverage=1824]
gill-color=b [Coverage=1728]
gill-color=b & spore-print-color=w [Coverage=1728]
gill-size=n & gill-color=b [Coverage=1728]
gill-size=n & gill-color=b & spore-print-color=w [Coverage=1728]
Closed itemsetsgill-size=n [Coverage=2512]
spore-print-color=w [Coverage=2388]
gill-size=n & spore-print-color=w [Coverage=1824]
gill-size=n & gill-color=b & spore-print-color=w [Coverage=1728]
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 38Intelligent Systems
Part 4:
Interestingness (objective functions)
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 39Intelligent Systems
Interestingness (Objective Functions)
• Need some means of selecting the most (potentially) interesting patterns
• Many different measures of interestingness may be relevant
• Most measures relate to the degree to which the antecedent and consequent are interdependento P(A & C) – P(A) P(C)
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 40Intelligent Systems
Interestingness measures: lift
• lift = confidence / (cover(consequent)/n)• proportional increase in confidence in
context of antecedent
• Example: 1000 transactions, 100 bread, 100 honey, 50 bread & honey• confidence(bread honey) = 0.50
• lift(bread honey) = 5.00
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 41Intelligent Systems
M-estimates
• Problem: many rules with low support will have unrealistically high confidence and lift
• Example: 1000 records, 500 females, 1 age>=90, 1 female & age>=90
• confidence(age>=90 female) = 1.00• lift(age>=90 female) = 2.00
• M-estimate is Bayesian estimate of true confidence and lift• biases confidence toward prior• confidence estimate = (support + m * prior) / (coverage +
m)• lift estimate = confidence estimate / prior• Eg confidence estimate = (1 + 2 * 0. 5) / (1 + 2) = 0.667
lift estimate = 0.667 / 0. 500 = 1.333
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 42Intelligent Systems
Interestingness measures: leverage
• leverage = support – (cover(antecedent) cover(consequent) / n)
• absolute increase in comparison to expected cases if antecedent and consequent independent
• Also known as interest
• Example: 1000 transactions, 100 bread, 100 honey, 50 bread & honey
• confidence(bread honey) = 0.50• lift(bread honey) = 5.00• leverage(bread honey) = 0.04
• Example2: 1000 transactions, 10 batteries, 5 vodka, 1 batteries & vodka
• lift(batteries vodka) = 20.00• leverage(batteries vodka) = 0.0009
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 43Intelligent Systems
Spurious rules
• If condition X is unrelated to conditions A and B,
• confidence(A & X B) confidence(A B)• lift(A & X B) lift(A B)• Eg pregnant & AI Researcher oedema
• One core rule can result in many spurious rules
• If problem ignored, majority of rules can be spurious!
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 44Intelligent Systems
{}{A,B,C,D}
{A,B}{C,D}
{A,C}{B,D}
{A,D}{B,C}
{B,C}{A,D}
{B,D}{A,C}
{C,D}{A,B}
{A}{B,C,D}
{B}{A,C,D}
{C}{A,B,D}
{D}{A,B,C}
{A,B,C}{D}
{A,B,D}{C}
{A,C,D}{B}
{B,C,D}{A}
Need to test up the generalization lattice
{A,B,C,D}{}
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 45Intelligent Systems
Minimum Improvement
• The improvement of rule X → Y [conf=c] = min(c-k | ZX Z → Y [conf=k])
• A minimum improvement constraint can eliminate many spurious rules
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 46Intelligent Systems
Non redundant rules
xyzsc x → y [conf = 1.0 ] x → z [supp=s, conf=c] x, z → y [supp=s, conf=c]
Eg pregnant → oedema [supp=0.1, conf=0.2] pregnant, female → oedema [supp=0.1, conf=0.2]
• A rule X → Y [supp=s, conf=c] is redundant iff xX X\x → Y [supp=s, conf=c] or yY X → Y\y [supp=s, conf=c]
Eg, pregnant, female → oedema • Closed itemset approaches lead to efficient
generation of non-redundant rules because a rule is non-redundant iff all immediate specialisations are closed itemsets.
• Note, redundant rules have improvement of 0.0.
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 47Intelligent Systems
Effect
dataset
filter
non- improvement
none redundant % > 0 %
bms webview 170 170 100 155 91
covtype 998 815 82 143 14
ipums.la.99 973 959 99 481 49
kddcup98 995 992 100 939 94
letter-recognition 541 524 97 421 78
mush 891 469 53 128 14
retail 590 590 100 519 88
shuttle 666 595 89 312 47
splice-junction 748 727 97 699 93
ticdata-2000 996 996 100 988 99
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 48Intelligent Systems
Part 5:
False discoveries
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 49Intelligent Systems
False discoveries
• Massive search leads to high risk of false discoveries
• eg 100 observations, two independent events each occurring with 0.5 probability,
• the probability of perfect correlation is 7.8x10-31. • if there are 1000 events then there are 21000 =
1.07x10301 antecedent – consequent pairs.• What constitutes a false discovery depends upon
the analytic objective• Usually should include rules where
• antecedent and consequent are independent• antecedent and consequent are independent given a
generalisation of the antecedent
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 50Intelligent Systems
Testing independence
• Cannot perform simple test of independence because of multiple comparisons problem• used previously (eg Webb, Butler &
Newlands, 2003) as a statistically unsound filter
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 51Intelligent Systems
Standard statistical correction
• Bonferroni• To maintain experimentwise risk ≤ α for n tests
• use critical value = α / n
• Holm procedure• To maintain experimentwise risk ≤ α for n tests with p
values ordered from lowest to highest p1 … pn
• Accept tests corresponding to p1 … pk , where k is the
highest value such that 1≤i≤k pi ≤ α / (n – k + 1)
p values 0.0100, 0.0200, 0.0400, 0.0400
critical values 0.0125, 0.0167, 0.0250, 0.0500
accept, accept, reject, reject
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 52Intelligent Systems
Direct adjustment
• I used to think “cannot perform simple adjustment such as Bonferroni or Holm because rule spaces are so large, eg 21000 (> 1.0E+301 )
• would result in unacceptable type-2 error• eg = 5.0E-303”
• However, search is often restricted to small antecedents (eg. ≤ 4) resulting in Bonferonni adjusted critical values of magnitude 1.0E-10 … 1.0E-20.
• With such adjustments often many rules can be found
• Cannot order p values to apply Holm procedure
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 53Intelligent Systems
Discovery as hypothesis generation
• Important to trade-off the risks of both type-1 and type-2 errors
• Perhaps best viewed as hypothesis generation, recognising that ‘discovered’ patterns require independent assessment
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 54Intelligent Systems
Hypothesis testing: proposal
• Why not automate such assessment?
Data
Explor- atory
Holdout
ExploratoryPattern
Discovery
Patterns
StatisticalEvaluation
SoundPatterns
Smallset
prefer-able
Holm adjustment
Any hypothesi
s test
Limited type-2 error
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 55Intelligent Systems
Direct adjustment vs Holdout
Direct adjustment• All data used for
exploration and evaluation
• Bonferroni adjustment
• Larger adjustment• Adjustment alters
with size of search space
Holdout• Half data used for
each of exploration and evaluation
• Holm procedure
• Smaller adjustment• Adjustment alters
with number of rules found
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 56Intelligent Systems
Case study: Ten widely used data sets
Name Description RecordsAttribute-
values
BMS webview products viewed at a commercial website 59,601 497
covtype forest cover data 581,012 125
ipums.la.99 Los Angeles census data 88,443 1,874
kddcup98 charity donors 52,256 19,662
letter-recog’n digital image recognition 20,000 74
mush identification of poisonous mushrooms 8,124 127
retail retail market basket data 88,162 16,470
shuttle records of space shuttle flight data 58,000 34
splice-junction DNA sequence records 3,177 243
ticdata-2000 insurance risk assessment 5,822 689
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 57Intelligent Systems
Detecting spurious rules
• Assuming interest only in positive associations• P(C | A) > P(C)
• For any rule A C, want to assess whether it has higher confidence than all its generalisations
• Eg, is confidence(pregnant & female B) >• confidence(pregnant B)• confidence(female B)• confidence(true B)
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 58Intelligent Systems
Detecting spurious rules (cont)
• Perform one-tailed Fisher exact tests with respect to each generalisation• Reject if any test does not exceed critical
value• no need to adjust for multiple comparisons
with respect to the multiple tests for a single rule
• Use Holm adjustment for strict control of type-1 error
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 59Intelligent Systems
Spurious rules case study: high support & confidence non-redundant rules
Name RecordsAttribute-values # Rules # Accepted %
bms webview 59,601 497 22,135 1,747 8
covtype 581,012 125 10,018 0 0
ipums.la.99 88,443 1,874 9,857 288 3
kddcup98 52,256 19,662 9,863 40 <1
letter-recognition 20,000 74 7,978 952 12
mush 8,124 127 8,957 1,266 14
retail 88,162 16,470 11,656 97 1
shuttle 58,000 34 9,760 876 9
splice-junction 3,177 243 8,937 132 1
ticdata-2000 5,822 689 10,438 30 <1
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 60Intelligent Systems
KDDCUP98: 99.5% of rules rejected
The following 40 rules passed holdout evaluation…ETH12<=0 HC15<=0 [Coverage=0.987 (25786); Support=0.946 (24722); Confidence=0.959; Lift=1.00]…The following 9843 rules failed holdout evaluation, adjusted critical value = 5.09E-06…NOEXCH=0 & ETH12<=0 HC15<=0 [Coverage=0.984 (25703); Support=0.943 (24644); Confidence=0.959; Lift=1.00]…NOEXCH=0 & ETH12<=0 & MDMAUD_F=X HC15<=0 [Coverage=0.981 (25629); Support=0.940 (24573); Confidence=0.959; Lift=1.00]…NOEXCH=0 & ETH12<=0 & ADATE_2>=9706 & MDMAUD_R=X HC15<=0 [Coverage=0.981 (25623); Support=0.940 (24567); Confidence=0.959; Lift=1.00]…
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 61Intelligent Systems
Comparison of direct adjustment and holdout tests on artificial data
True Discoveries False Discoveries Experimentwise Error
Hol
dout
Dire
ct
Averages over 100 runs, 84 true rules at antecedent size 4
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 62Intelligent Systems
Comparison on real data
Letter Recognition
0
500
1000
1500
2000
2500
3000
2.33E+03 1.32E+05 2.29E+06 2.68E+07 2.27E+08 1.47E+09
Search Space Size
No
of
rule
s
Direct Holdout
Retail
0
200
400
600
800
1000
1200
1.36E+08 2.23E+12 1.23E+16 5.05E+19 1.66E+23 4.56E+26
Search space size
No
of
rule
s
Direct Holdout
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 63Intelligent Systems
Part 6:
Limitations of minimum support
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 64Intelligent Systems
Limitations of minimum support
• Discontinuity in ‘interestingness’ function• The vodka and caviar problem
• some high value associations are infrequent• Feast or famine
• minimum support is a crude control mechanism• often results in too few or too many associations
• Cannot handle dense data• Cannot prune search space using constraints on
relationship between antecedent and consequent• eg confidence
• Minimum support may not be relevant• cannot be sufficiently low to capture all valid rules• cannot be sufficiently high to exclude all spurious rules
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 65Intelligent Systems
Very low support rules can be significant
Data file: Brijs retail.itl [50% sample]
44081 cases / 44081 holdout cases / 16470 items
The following 5 rules passed holdout evaluation
168 & 4685 → 1 [Coverage=0.000 (3); Support=0.000 (3); Confidence estimate=0.601; Lift estimate=192.06]
168 & 3021 → 1 [Coverage=0.000 (3); Support=0.000 (3); Confidence estimate=0.601; Lift estimate=192.06]
1476 & 4685 → 1 [Coverage=0.000 (2); Support=0.000 (2); Confidence estimate=0.502; Lift estimate=160.21]
168 & 783 → 1 [Coverage=0.000 (4); Support=0.000 (3); Confidence estimate=0.501; Lift estimate=160.05]
3021 & 4685 → 1 [Coverage=0.000 (4); Support=0.000 (3); Confidence estimate=0.501; Lift estimate=160.05]
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 66Intelligent Systems
Very high support rules can be spurious
Data file: covtype.data 581012 cases / 125 valuesST15=0 → ST07=0 [Coverage=1.000 (581009); Support=1.000 (580904); Confidence=1.000; Lift=1.00]ST07=0 → ST15=0 [Coverage=1.000 (580907); Support=1.000 (580904); Confidence=1.000; Lift=1.00]ST15=0 → ST36=0 [Coverage=1.000 (581009); Support=1.000 (580890); Confidence=1.000; Lift=1.00]ST36=0 → ST15=0 [Coverage=1.000 (580893); Support=1.000 (580890); Confidence=1.000; Lift=1.00]ST15=0 → ST08=0 [Coverage=1.000 (581009); Support=1.000 (580830); Confidence=1.000; Lift=1.00]ST08=0 → ST15=0 [Coverage=1.000 (580833); Support=1.000 (580830); Confidence=1.000; Lift=1.00]….. 197,183,686 such rules have highest support
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 67Intelligent Systems
Roles of constraints
1. Select most relevant patterns• patterns that are likely to be interesting
2. Control the number of patterns that the user must consider
3. Make computation feasible
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 68Intelligent Systems
Minimum support can get overloaded!
Select most relevant
Control the number
Make com
putation feasible
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 69Intelligent Systems
Part 6:
K-most interesting pattern discovery
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 70Intelligent Systems
K-most interesting pattern discovery
• Find k patterns that maximise a measure of interest within other constraints that the user may specify
• removes need for minimum support constraint• efficient with dense data• empowers user to use relevant measure of interest• user specifies number of patterns to be returned• does not require either monotone or anti-monotone
constraints• Relies on efficient search
• must be able to retain all data in memory• constraints must sufficiently constraint the search
space
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 71Intelligent Systems
Part 7:
Itemset discovery
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 72Intelligent Systems
Itemset discovery
• In some contexts it is the collection of variables that are correlated that are of interest and the rule structure is superfluous.
• If A is associated with B then B must be associated with A (in the sense of the presence of the antecedent increasing the probability of the presence of the consequent).
• Discovering interesting itemsets is an area that has been little explored.
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 73Intelligent Systems
Part 8:
Contrast discovery
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 74Intelligent Systems
Contrast sets (emerging patterns)
• Sometimes it is interesting to identify differences between contrasting groups
• Eg: how do purchasing patterns differ on weekends to weekdays?
• Contrast sets find sets of conditions that differ significantly between groups
)|P()|P( ji GcsetGcsetij ),support(),support(max jiij GcsetGcset
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 75Intelligent Systems
Contrast sets (cont.)
• Different analytic objective to association rules• more directed
• focus on differences between groups instead of associations between variables
• Different to classification rules• not discriminative
• no attempt to distinguish all individuals of each group
• find all contrasts rather than sufficient discriminators
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 76Intelligent Systems
Can be discovered by existing techniques!
• Contrast / emerging pattern discovery is strictly equivalent to standard exploratory rule discovery with the consequent restricted to the group variable
)|P()|P()|P()|P( csetGcsetGijGcsetGcsetijjiji
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 77Intelligent Systems
Part 9:
Impact rules
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 78Intelligent Systems
Impact rules (quantitative association rules)
• Most rule discovery techniques require that numeric variables be discretised.
• This often loses important information.• Impact rules associate an antecedent with a
distribution on a numeric variable.• The user specifies what makes a distribution
interesting • eg largest mean, smallest standard deviation, …
• System finds rules that maximise the measure of interest within other user-specified constraints
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 79Intelligent Systems
Impact rule discovery example
LengthOfStay: mean = 10.6; min = -6; max = 1687; sum = 367781
COUNTRYOFBIRTH=1100 -> LengthOfStay: Coverage=0.054 (1861); Mean=22.2; Min=-4; Max=1687; Sum=41314; Impact=21612.4
ADMITDay=Wednesday -> LengthOfStay: Coverage=0.159 (5518); Mean=13.3; Min=0; Max=1548; Sum=73389; Impact=15307.6
http://www.csse.monash.edu.au/~webbCopyright © Geoffrey I Webb 2006 80Intelligent Systems
Summary
• Exploratory pattern discovery empowers the user to select the patterns that are most useful
• Rules provide a modular and powerful knowledge representation formalism
• Association rules discover associations between qualitative variables that are frequent
• K-optimal rules discover associations between qualitative variables that optimise a measure of interest
• Impact rules discover associations between qualitative and quantitative variables
• Contrasts discover differences in distributions over variables between different groups
• If you mine for patterns without appropriate statistical evaluation, expect to find fool’s gold!
Recommended