Upload
megan-lee
View
212
Download
0
Embed Size (px)
Citation preview
Mining Frequent Patterns, Associations, and Correlations
Compiled By:
Umair Yaqub
Lecturer
Govt. Murray College Sialkot
2
Frequent Pattern Mining - Basic Concepts Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set
Finding frequent associations or correlations among sets of items or objects in transaction databases, relational databases, and other information repositories
Let I={i1,i2,…im} be a set of items, and let D be a set of database of transactions, where each transaction T is a list of items (purchased by a customer in a visit).
An association rule is an implication of the form A → B, where A and B are subsets of I, and A∩B= Ø
Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F
Customerbuys A (Computer)
Customerbuys both
Customerbuys B (Software)
3
Association Mining-Basic Concepts (contd…)
Find all the rules A → B with minimum confidence and support support, s, probability that a transaction contains both A and B confidence, c, conditional probability that a transaction having A also contains B
Rules satisfying a minimum support threshold and a minimum confidence threshold are called strong
A set of items is referred to as an itemset. An itemset containing k items is a k-itemset. The occurrence frequency of an itemset is the number of transactions that contain the itemset
(frequency, support count or count) An itemset satisfying minimum support (count) is a frequent itemset commonly denoted by Lk
4
Association Mining-Basic Concepts (contd…)
Association rule mining is a two step process Find all frequent itemsets Generate strong association rules from frequent itemsets
Performance determined by first step
5
Association Rule Mining: A Road Map Based on the completeness of mined patterns
Complete set of frequent itemsets, constrained frequent itemsets
Based on levels of abstraction Single level vs. multiple-level analysis
age(x, “30..39”) ®buys(x, “computer”) age(x, “30..39”) ®buys(x, “laptop”)
Based on number of data dimensions Single dimension vs. multiple dimensional associations
Based on the types of values handled Boolean vs. quantitative associations buys(x, “SQLServer”) ^ buys(x, “DMBook”) ®buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ®buys(x, “PC”) [1%, 75%]
Based on kinds of rules to be mined Association rules, correlation rules
Based on the kinds of patterns to be mined Frequent itemset mining, sequential pattern mining, structured patterns mining
6
Mining Association Rules—An ExampleTransaction ID Items Bought
2000 A,B,C1000 A,C4000 A,D5000 B,E,F
Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%
Min. support 50%Min. confidence 50%
7
The Apriori Algorithm
Method:
Initially, scan DB once to get frequent 1-itemset
Generate length (k+1) candidate itemsets from length k frequent itemsets
Test the candidates against DB
Terminate when no frequent or candidate set can be generated
Use the frequent itemsets to generate association rules.
The Apriori principle:
All nonempty subsets of a frequent itemset must be frequent
8
The Apriori Algorithm — Example
TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5
Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3
itemset sup.{1} 2{2} 3{3} 3{5} 3
Scan D
C1L1
itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}
itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2
itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2
L2
C2
C2Scan D
C3 L3itemset{2 3 5}
Scan D itemset sup{2 3 5} 2
9
The Apriori Algorithm
Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k
L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;
10
Important Details of Apriori How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
How to count supports of candidates?
Example of Candidate-generation L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
11
How to Generate Candidates?
Suppose the items in Lk-1 are listed in an order
Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
Step 2: pruningforall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
12
Example – Transaction DB
13 Adapted from slides by Han and Kamber http://www-faculty.cs.uiuc.edu/~hanj/bk2/
Example – Finding Frequent Patterns (1)
14
Example – Finding Frequent Patterns (2)