41
Chapter 10 ASSOCIATION RULE By: Aris D.(13406054) Ricky A.(13406058) Nadia FR. (13406069) Amirah K.(13406070) Paramita AW.(13406091) Bahana W.(13406102)

Chapter 10 Association Rule

Embed Size (px)

DESCRIPTION

Presentation Slides of Association Rule (Decision Support System)

Citation preview

Page 1: Chapter 10 Association Rule

Chapter 10

ASSOCIATION RULEBy:

Aris D.(13406054)

Ricky A.(13406058)

Nadia FR. (13406069)

Amirah K.(13406070)

Paramita AW.(13406091)

Bahana W.(13406102)

Page 2: Chapter 10 Association Rule

Introduction

• Affinity Analysis

Study of attributes or characteristics that “go together”.

• Market Based Analysis

The method, uncover rules for quantifying the relationship between two or more attributes.

“If antecedent, then consequent”

Page 3: Chapter 10 Association Rule

Affinity Analysis & Market Basket Analysis

• Example: Supermarket may find that of the 1000 customers

shopping on a Thursday night, 200 bought diapers, and of the 200 who bought diapers, 50 bought beer.

The association rule:If buy diapers, then buy beers”,with support of 50/1000 = 5%,and confidence of 50/200=25%

Page 4: Chapter 10 Association Rule

Affinity Analysis & Market Basket

Analysis (2)Examples business & research:• Investigating the proportion of subscribers to your

company’s cell phone plan that respond positively to an offer of a service upgrade

• Examining the proportion of children whose parents read to them who are themselves good readers

• Predicting degradation in telecommunications networks• Finding out which items in a supermarket are purchased

together & which are never purchased together• Determining the proportion of cases in which a new drug

will exhibit dangerous side effects

Page 5: Chapter 10 Association Rule

Affinity Analysis & Market Basket

Analysis (3)• The number of possible association rules grows

exponentially in the number of attributes.

• If binary attributes (yes/no) then there are k.[2^(k-1)] possible association rule.

• Example: a convinience store that sells 100 items. Possible association rules = 100.[2^99] ≈ 6,4 x (10^31)

• A priori algorithm (pendahuluan) reduce the search problem to a more manageable size

Page 6: Chapter 10 Association Rule

Notation for Data Representation in

Market Basket Analysis• Farmer sells I = {asparagus, beans, broccoli,

corn, green peppers, squash, tomatoes}

• A customer puts in a basket, Subset I = {broccoli, corn}

• Subset doesn’t keep track of how much each item is purchased, just the name of item.

Page 7: Chapter 10 Association Rule

Transactional Data Format

Page 8: Chapter 10 Association Rule

Tabular Data Format

Page 9: Chapter 10 Association Rule

Support, Confidence, Frequent

Itemsets, & the Apriori Property• Example:D : set of transactions represented in Table 10.1T : a transaction in D represents a set of itemsI : set of itemsSet of items A : beans, squashSet of items B : asparagus

THEN …Association rule takes the form if A, then B (AB),A and B are PROPER subsets of I, and are mutuallyexclusive

Page 10: Chapter 10 Association Rule

Table of Transaction Made

Page 11: Chapter 10 Association Rule

Support and Confidence• Support, s, is the proportion of transactions in D

that contain both A and B.support = P(AB)= number of transactions containing both A&B

total number of transactions• Confidence, c, is a measure of the accuracy of the

rule.confidence = P(B|A)= P(AB)

P(A)= number of transactions containing both A&B

number of transactions containing A

• Analysts prefer RULES:High support AND High confidence

Page 12: Chapter 10 Association Rule

Frequent Itemset Definition…

An Itemset is a set of items contained in I, and a k-itemset containing k items. e.g: {beans, squash} 2-itemset The itemset frequency…

the number of transactions that contain the particular itemset A frequent itemset …

itemset that occurs at least a certain minimum number of times, having itemset frequency

Example:Set that = 4, then itemsets that occur more than FOUR times are said to be frequent.

Page 13: Chapter 10 Association Rule

• Mining Association RulesIt is a two-steps process:1. Find all frequent itemsets (all itemsets with

frequency )2. From the frequent itemsets, generate

association rules satisfying the minimum support and confidence conditions

• The Apriori property states that if an itemset Z isnot frequent, then adding another item A tothe itemset Z will not make Z morefrequent. This helpful property reducessignificantly the search space for the a priorialgorithm.

The Apriori Property

Page 14: Chapter 10 Association Rule

How does the Apriori Algorithm Work?

• Part 1: Generating Frequent Itemsets

• Part 2: Generating Association Rules

Page 15: Chapter 10 Association Rule

Generating Frequent Itemsets• Example:

let = 4, so that an itemset is frequent if it occursfour or more times in D.

F1= {asparagus, beans, broccoli, corn, greenpeppers, squash, tomatoes}F2 first, constructs a set Ck of candidate k-itemsetsby joining Fk-1 with itself. Then it prunes Ck usingthe a priori property.Ck for k=2, consists of all the combinations ofvegetables in Table 10.4F3 not much different than the steps for F2, butuse k number = 3

Page 16: Chapter 10 Association Rule

Table 10.3 (pg.183)

Page 17: Chapter 10 Association Rule

Table 10.4 (pg. 185)

Page 18: Chapter 10 Association Rule

• However, consider s={beans, corn, squash}

the subset {corn, squash} has frequency 3 < 4 =, so that {corn, squash} is not frequent.

By the priori property, therefore, {beans, corn,squash} cannot be frequent, is therefore pruned,and doesn’t appear in F3

So does the s= {beans, squash, tomatoes}, the frequency of the subsets is < 4

Page 19: Chapter 10 Association Rule

Generating Association Rules

1. Generate all subsets of s.

2. Association Rule R : ss ⇒ (s-ss)Generate R if fulfills the minimum confidence requirement.

(s-ss) is set s without ss

Page 20: Chapter 10 Association Rule

Example two antecedent

• All transaction = 14

• Transaction include asparagus and beans = 5

• Transaction include asparagus and Squash = 5

• Transaction include Beans and squash = 6

Page 21: Chapter 10 Association Rule

Ranked by support x Confidence

• Minimum Confidence 80%

Page 22: Chapter 10 Association Rule

Clementine generating Association

Rules

Page 23: Chapter 10 Association Rule

Clementine generating Association

Rules (2)• Support means occurences of antecedent,

different from what we defined before.

• First columns indicates number of antecedent occurs.

• To find actual “support” using clementine, multiply support and confidence.

Page 24: Chapter 10 Association Rule

Extension From Flag Data to General

Categorical Data

- Association rule not only for Flag (Boolean) data.

- A priori algorithm can be applied to categorical data.

Page 25: Chapter 10 Association Rule

Example using Clementine

• Recall Normalized adult data set in chapter 6 and 7

Page 26: Chapter 10 Association Rule

Information-Theoretic Approach:

Generalized Rule Induction MethodWhy GRI?

• A priori algorithm is not well equipped to handle numerical attributes, need discretization

• Discretization can lead to loss of information

• GRI can handle both categorical or numerical variables as inputs, but still requires categorical variables as output

Page 27: Chapter 10 Association Rule

Generalized Rule Induction Method (2)

J-Measure

• p(x) probability of the value of x (antecedent)

• p(y) probability of the value of y (consequent)

• p(y|x) conditional probability of y given that x has occured

)(1

)|(1ln)].|(1[

)(

)|(ln).|()(

yp

xypxyp

yp

xypxypxpJ

Page 28: Chapter 10 Association Rule

Generalized Rule Induction Method (3)

• J-Measure shows “interestingness”

• In GRI, user specifies how many association rules would be reported

• If the “interestingness” of new rule > current minimum J in the rule table, new rule is inserted, rule with minimum J is eliminated

Page 29: Chapter 10 Association Rule

Application of GRIp(x) : female, never married

p(x) = 0.1463

Page 30: Chapter 10 Association Rule

Application of GRI (2)

p(y) : work class = private

p(y) = 0.6958

Page 31: Chapter 10 Association Rule

Application of GRI (3)p(y|x) : work class = private;

given : female, never married

p(y|x) = conditional probabilities = 0.763

Page 32: Chapter 10 Association Rule

Application of GRI

Calculation :

001637.0

)]7791.0ln().237.0()0966.1ln(.763.0[1463.0

3042.0

237.0ln).237.0(

6958.0

763.0ln.763.01463.0

)(1

)|(1ln)].|(1[

)(

)|(ln).|()(

yp

xypxyp

yp

xypxypxpJ

Page 33: Chapter 10 Association Rule

When not to use Association Rules

• Association Rules chosen a priori could be used based on:

▫ Confidence

▫ Confidence Difference

▫ Confidence Ratio

• Association Rules need to be applied with care because the results are sometimes unreliable.

Page 34: Chapter 10 Association Rule

When not to use Association Rules (2)Association Rules chosen a priori, based on confidence

• Applying this association rule reduces the probability of randomly selecting desired data.

• Eventhough the rule is useless, software still reported it probably because the default ranking mechanism for priori’s algorithm is confidence.

• We should never simply believe the computer output without making the effort to understand the models and mechanism underlying the result.

Page 35: Chapter 10 Association Rule

When not to use Association Rules (3)Association Rules chosen a priori, based on confidence

Page 36: Chapter 10 Association Rule

When not to use Association Rules (4)Association Rules chosen a priori, based on confidence difference

• A random selection from the database wouldhave provided more effective results (none useless report)than applying the association rule.

• This rule provide the greatest increase in confidence from the prior to posterior.

• Evaluation measures the absolute difference between the prior and posterior confidences.

Page 37: Chapter 10 Association Rule

When not to use Association Rules (5)Association Rules chosen a priori, based on confidence difference

Page 38: Chapter 10 Association Rule

When not to use Association Rules (6)Association Rules chosen a priori, based on confidence ratio

• Analyst prefer to use the confidence ratio to evaluate potential rules.

• Confidence difference criterion yielded the very same rules as did the confidence ratio criterion.

Page 39: Chapter 10 Association Rule

When not to use Association Rules (7)Association Rules chosen a priori, based on confidence ratio

• Example:

If Marital_Satus = Divorced, then sex = Female. p(y)=0.3317 danp(y|x)=0.60

Page 40: Chapter 10 Association Rule

Do Association Rules Represent

Supervised or Unsupervised Learning?• Supervised learning:

▫ Variable is prespecified

▫ Algorithm is provided with a rich collection of examples where possible association between the target vaiable and the predictor variables may be uncovered

• Unsupervised learning:▫ No target variable is identified explicitly

▫ Algorithm searches for patterns and structure among all the variables

• Association Rules generally used for unsupervised learning but can also be applied for supervised learning for classification task

Page 41: Chapter 10 Association Rule

Local Patterns Versus Global Models

Model: Global Description or Explanation of a data set. Patterns: Essential local features of Data Association rules are well suited to uncovering

local patterns in data Applying “if “clause drills down deep into data set,

uncovering a hidden local pattern that might be relevant Finding local patterns is one of the most

important goals in data mining. It can lead to new profitable initiatives.