14
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

Embed Size (px)

Citation preview

Page 1: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

Mining Frequent Patterns, Associations, and Correlations

Compiled By:

Umair Yaqub

Lecturer

Govt. Murray College Sialkot

Page 2: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

2

Frequent Pattern Mining - Basic Concepts Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs

frequently in a data set

Finding frequent associations or correlations among sets of items or objects in transaction databases, relational databases, and other information repositories

Let I={i1,i2,…im} be a set of items, and let D be a set of database of transactions, where each transaction T is a list of items (purchased by a customer in a visit).

An association rule is an implication of the form A → B, where A and B are subsets of I, and A∩B= Ø

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Customerbuys A (Computer)

Customerbuys both

Customerbuys B (Software)

Page 3: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

3

Association Mining-Basic Concepts (contd…)

Find all the rules A → B with minimum confidence and support support, s, probability that a transaction contains both A and B confidence, c, conditional probability that a transaction having A also contains B

Rules satisfying a minimum support threshold and a minimum confidence threshold are called strong

A set of items is referred to as an itemset. An itemset containing k items is a k-itemset. The occurrence frequency of an itemset is the number of transactions that contain the itemset

(frequency, support count or count) An itemset satisfying minimum support (count) is a frequent itemset commonly denoted by Lk

Page 4: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

4

Association Mining-Basic Concepts (contd…)

Association rule mining is a two step process Find all frequent itemsets Generate strong association rules from frequent itemsets

Performance determined by first step

Page 5: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

5

Association Rule Mining: A Road Map Based on the completeness of mined patterns

Complete set of frequent itemsets, constrained frequent itemsets

Based on levels of abstraction Single level vs. multiple-level analysis

age(x, “30..39”) ®buys(x, “computer”) age(x, “30..39”) ®buys(x, “laptop”)

Based on number of data dimensions Single dimension vs. multiple dimensional associations

Based on the types of values handled Boolean vs. quantitative associations buys(x, “SQLServer”) ^ buys(x, “DMBook”) ®buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ®buys(x, “PC”) [1%, 75%]

Based on kinds of rules to be mined Association rules, correlation rules

Based on the kinds of patterns to be mined Frequent itemset mining, sequential pattern mining, structured patterns mining

Page 6: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

6

Mining Association Rules—An ExampleTransaction ID Items Bought

2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Min. support 50%Min. confidence 50%

Page 7: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

7

The Apriori Algorithm

Method:

Initially, scan DB once to get frequent 1-itemset

Generate length (k+1) candidate itemsets from length k frequent itemsets

Test the candidates against DB

Terminate when no frequent or candidate set can be generated

Use the frequent itemsets to generate association rules.

The Apriori principle:

All nonempty subsets of a frequent itemset must be frequent

Page 8: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

8

The Apriori Algorithm — Example

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2

C2Scan D

C3 L3itemset{2 3 5}

Scan D itemset sup{2 3 5} 2

Page 9: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

9

The Apriori Algorithm

Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k

L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

increment the count of all candidates in Ck+1 that are contained in t

Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;

Page 10: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

10

Important Details of Apriori How to generate candidates?

Step 1: self-joining Lk

Step 2: pruning

How to count supports of candidates?

Example of Candidate-generation L3={abc, abd, acd, ace, bcd}

Self-joining: L3*L3

abcd from abc and abd

acde from acd and ace

Pruning:

acde is removed because ade is not in L3

C4={abcd}

Page 11: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

11

How to Generate Candidates?

Suppose the items in Lk-1 are listed in an order

Step 1: self-joining Lk-1

insert into Ck

select p.item1, p.item2, …, p.itemk-1, q.itemk-1

from Lk-1 p, Lk-1 q

where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

Step 2: pruningforall itemsets c in Ck do

forall (k-1)-subsets s of c do

if (s is not in Lk-1) then delete c from Ck

Page 12: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

12

Example – Transaction DB

Page 13: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

13 Adapted from slides by Han and Kamber http://www-faculty.cs.uiuc.edu/~hanj/bk2/

Example – Finding Frequent Patterns (1)

Page 14: Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot

14

Example – Finding Frequent Patterns (2)