15
Homework Due Lesson 6 Association Rules All homework will be team homework but all team members must know the material. Submit all homework to both instructor and graduate assistant Md Ali ([email protected]). ---------------------------**************-------------------------- Complete Association Rules exercise 4 (end of chapter, page 159 in textbook) manually. A local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics: {battery} appears in 6,000 transactions. {sunscreen} appears in 5,000 transactions. {sandals} appears in 4,000 transactions. {bowls} appears in 2,000 transactions. {battery,sunscreen} appears in 1,500 transactions. {battery,sandals} appears in 1,000 transactions. {battery,bowls} appears in 250 transactions. {battery,sunscreen,sandals} appears in 600 transactions. Answer the following questions: 1. What are the support values of the preceding itemsets? {battery} appears in 6,000 transactions. So, support (sunscreen)= 6000/10000 = 0.6 {sunscreen} appears in 5,000 transactions. So, support (sunscreen)= 5000/10000 = 0.5 {sandals} appears in 4,000 transactions. So, support (sandals)= 4000/10000 = 0.4 {bowls} appears in 2,000 transactions. So, support (bowls)= 2000/10000 = 0.2 {battery,sunscreen} appears in 1,500 transactions. So, support (battery,sunscreen)= 1500/10000 = 0.15 {battery,sandals} appears in 1,000 transactions. So, support (battery,sandals)= 1000/10000 = 0.1 {battery,bowls} appears in 250 transactions. So, support (battery,bowls)= 250/10000 = 0.025 {battery,sunscreen,sandals} appears in 600 transactions. So, support (battery,sunscreen,sandals)= 600/10000 = 0.06 2. Assuming the minimum support is 0.05, which item sets are considered frequent? The support of a frequent itemset should be greater than or equal to the minimum support. As the minimum support is 0.05, Itemsets {battery}, {sunscreen}, {sandals}, {bowls}, {battery,sunscreen}, {battery,sandals}, and {battery,sunscreen,sandals} are considered frequent itemsets at the minimum support 0.05. Only {battery,bowls} is not frequent itemsets. 3. What are the confidence values of {battery}{sunscreen} and {battery, sunscreen}{sandals}? Which of the two rules is more interesting? Confidence (battery→sunscreen) = support (battery, sunscreen)/ support (battery) = 0.15/0.6= 0.25. Which means 25% of the time a customer buys battery, sunscreen is bought as well.

Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

  • Upload
    haminh

  • View
    224

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members must know the material.

Submit all homework to both instructor and graduate assistant Md Ali ([email protected]).

---------------------------**************--------------------------

Complete Association Rules exercise 4 (end of chapter, page 159 in textbook) manually.

A local retailer has a database that stores 10,000 transactions of last summer. After

analyzing the data, a data science team has identified the following statistics:

{battery} appears in 6,000 transactions.

{sunscreen} appears in 5,000 transactions.

{sandals} appears in 4,000 transactions.

{bowls} appears in 2,000 transactions.

{battery,sunscreen} appears in 1,500 transactions.

{battery,sandals} appears in 1,000 transactions.

{battery,bowls} appears in 250 transactions.

{battery,sunscreen,sandals} appears in 600 transactions.

Answer the following questions:

1. What are the support values of the preceding itemsets?

{battery} appears in 6,000 transactions. So, support (sunscreen)= 6000/10000 = 0.6 {sunscreen} appears in 5,000 transactions. So, support (sunscreen)= 5000/10000 = 0.5 {sandals} appears in 4,000 transactions. So, support (sandals)= 4000/10000 = 0.4 {bowls} appears in 2,000 transactions. So, support (bowls)= 2000/10000 = 0.2 {battery,sunscreen} appears in 1,500 transactions. So, support (battery,sunscreen)= 1500/10000 = 0.15 {battery,sandals} appears in 1,000 transactions. So, support (battery,sandals)= 1000/10000 = 0.1 {battery,bowls} appears in 250 transactions. So, support (battery,bowls)= 250/10000 = 0.025 {battery,sunscreen,sandals} appears in 600 transactions. So, support (battery,sunscreen,sandals)= 600/10000 = 0.06

2. Assuming the minimum support is 0.05, which item sets are considered frequent?

The support of a frequent itemset should be greater than or equal to the minimum support. As the minimum support is 0.05, Itemsets {battery}, {sunscreen}, {sandals}, {bowls}, {battery,sunscreen}, {battery,sandals}, and {battery,sunscreen,sandals} are considered frequent itemsets at the minimum support 0.05. Only {battery,bowls} is not frequent itemsets.

3. What are the confidence values of {battery}→{sunscreen} and {battery, sunscreen}→{sandals}? Which of the two rules is more interesting?

Confidence (battery→sunscreen) = support (battery, sunscreen)/ support (battery) = 0.15/0.6= 0.25. Which means 25% of the time a customer buys battery, sunscreen is bought as well.

Page 2: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

Confidence ({battery, sunscreen}→{sandals}) = support (battery, sunscreen, sandals )/ support (battery, sunscreen) = 0.06/0.15= 0.4. Which means 40% of the time a customer buys battery and sunscreen, sandals is bought as well.

The second rule ({battery, sunscreen}→{sandals}) is more interesting because it shows that 40% of the time a customer buys battery and sunscreen, sandals is bought as well.

4. List all the candidate rules that can be formed from the statistics. Which rules are considered

interesting at the minimum confidence 0.25? Out of these interesting rules, which rule is

considered the most useful (that is, least coincidental)?

Support:

{battery} appears in 6,000 transactions. So, support (sunscreen)= 6000/10000 = 0.6 {sunscreen} appears in 5,000 transactions. So, support (sunscreen)= 5000/10000 = 0.5 {sandals} appears in 4,000 transactions. So, support (sandals)= 4000/10000 = 0.4 {bowls} appears in 2,000 transactions. So, support (bowls)= 2000/10000 = 0.2 {battery,sunscreen} appears in 1,500 transactions. So, support (battery,sunscreen)= 1500/10000 = 0.15 {battery,sandals} appears in 1,000 transactions. So, support (battery,sandals)= 1000/10000 = 0.1 {battery,bowls} appears in 250 transactions. So, support (battery,bowls)= 250/10000 = 0.025 {battery,sunscreen,sandals} appears in 600 transactions. So, support (battery,sunscreen,sandals)= 600/10000 = 0.06

Confidence:

Confidence(x→y)=support(x,y)/support(x)

Confidence (battery → sunscreen) = support (battery, sunscreen)/ support (battery) = 0.15/0.6= 0.25 Confidence (sunscreen → battery) = support (battery, sunscreen)/ support (sunscreen) = 0.15/0.5= 0.3 Confidence (battery → sandals) = support (battery, sandals) / support (battery) = 0.1/0.6= 0.17 Confidence (sandals → battery) = support (battery, sandals) / support (sandals) = 0.1/0.4=0.25 Confidence (battery → bowls) = support (battery, bowls)/ support (battery) = 0.025/0.6= 0.042 Confidence (bowls → battery) = support (battery, bowls)/ support (bowls) = 0.025/0.2= 0.125 Confidence ({battery}→{sunscreen, sandals }) = support (battery, sunscreen, sandals )/ support (battery) = 0.06/0.6=0.1 Confidence ({sunscreen}→{battery, sandals}) = support (battery, sunscreen, sandals )/ support (sunscreen) = 0.06/0.5=0.12 Confidence ({battery, sandals}→{sunscreen}) = support (battery, sunscreen, sandals )/ support (battery, sandals) = 0.06/0.1=0.6 Confidence ({sandals}→{battery, sunscreen }) = support (battery, sunscreen, sandals )/ support (sandals) = 0.06/ 0.4=0.15 Confidence ({battery, sunscreen}→{sandals}) = support (battery, sunscreen, sandals )/ support (battery, sunscreen) = 0.06/0.15= 0.4

So considering the minimum confidence value =0.25, the interesting rules are

Page 3: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

Confidence (battery → sunscreen) = support (battery, sunscreen)/ support (battery) =

0.15/0.6= 0.25 [ means that there is 25% chance that customer will buy sunscreen if the

customer buy battery only]

Confidence (sunscreen → battery) = support (battery, sunscreen)/ support (sunscreen) =

0.15/0.5= 0.3 [There is 30% chance that a customer will buy battery, if the customer

buy sunscreen only.]

Confidence (sandals → battery) = support (battery, sandals) / support (sandals) =

0.1/0.4=0.25 [There is 25% chance that a customer will buy battery, if the customer

buy sandals only.]

Confidence ({battery, sandals}→{sunscreen}) = support (battery, sunscreen, sandals )/

support (battery, sandals) = 0.06/0.1=0.6 [There is 60% chance that a customer will buy

sunscreen, if the customer buy battery and sandals together.]

Confidence ({battery, sunscreen}→{sandals}) = support (battery, sunscreen, sandals )/

support (battery, sunscreen) = 0.06/0.15= 0.4 [there is 40% chance that a customer will buy

sandals, if the customer buy battery and sunscreen together.]

Lift

Lift(x → y)=support(x, y)/{support(x)*support(y)

Lift (battery → sunscreen) = support (battery, sunscreen)/ support (battery)* support ( sunscreen) = 0.15/(0.6*0.5)= 0.5 Lift (sandals → battery) = support (battery, sandals) / support (battery)* support (sandals) = 0.1/(0.6*0.4)= 0.42 Lift ({battery, sunscreen}→{sandals}) = support (battery, sunscreen, sandals )/ support (battery, sunscreen)* support (sandals) = 0.06/(0.15*0.4)= 1 Lift ({battery, sandals}→{sunscreen}) = support (battery, sunscreen, sandals )/ support (battery, sandals)* support (sunscreen) = 0.06/(0.1*0.5)= 1.2

Therefore it can be concluded that ({battery, sandals}→{sunscreen}) have a stronger association than others.

Leverage

Leverage(x → y)=support(x, y)-{support(x)*support(y)}

Leverage (battery → sunscreen) = support (battery, sunscreen) – {support (battery)* support ( sunscreen)} = 0.15 - (0.6*0.5)= -0.15 Leverage (sandals → battery) = support (battery, sandals) – {support (battery)* support (sandals)} = 0.1 - (0.6*0.4)= - 0.14 Leverage (battery → bowls) = support (battery, bowls) – {support (battery)* support (bowls)} = 0.025 - (0.6*0.2)= - 0.1 Leverage ({battery, sunscreen}→{sandals}) = support (battery, sunscreen, sandals ) – {support (battery, sunscreen)* support (sandals) }= 0.06 - (0.15*0.4)= 0 Leverage ({battery, sandals}→{sunscreen}) = support (battery, sunscreen, sandals ) – {support (battery, sandals)* support (sunscreen) }= 0.06 - (0.1*0.5)= 0.01

Page 4: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

It again confirms that ({battery, sandals}→{sunscreen}) have a stronger association than others.

So by doing Lift and Leverage candidate rules we can conclude that ({battery, sandals}→{sunscreen}) rule is most useful.

Important Notes: Confidence is able to identify trustworthy rules, but it cannot tell whether a rule

is coincidental. A high-confidence rule can sometimes be misleading because confidence does not

consider support of the itemset in the rule consequent. Measures such as lift and leverage not only

ensure interesting rules are identified but also filter out the coincidental rules.

-----------------------*************------------------------------

Given the following 10 grocery store transactions, use appropriate association rule thresholds to

find a few interesting rules both by hand and by using R.

1. beer, diapers

2. soda, potato chips, hamburger meat, milk, eggs

3. coffee, eggs

4. beer, bread, cheese, ham

5. diapers, beer, potato chips

6. cheese, ham, beer

7. ham, cheese, bread, coffee, milk

8. soda, cheese, bread, ham

9. coffee, hamburger meat

10. eggs, diapers, beer

R Code:

library('arules') library('arulesViz') purchases <- c("beer,diapers", "soda,potato,chips,hamburger,meat,milk,eggs", "coffee,eggs", "beer,bread,cheese,ham", "diapers,beer,potato,chips", "cheese,ham,beer", "ham,cheese,bread,coffee,milk", "soda,cheese,bread,ham", "coffee,hamburger,meat", "eggs,diapers,beer") # write to a basket file data <- paste(purchases, sep="\n") write(data, file = "purchases") # read transcations from puchases "basket" file trans <- read.transactions("purchases", format = "basket", sep=",") inspect(trans)

Page 5: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

summary(trans) items2 <- apriori(trans, parameter=list(minlen=2, maxlen=2, support=0.3)) summary(items2) inspect(sort(items2, by ="support")) items3 <- apriori(trans, parameter=list(minlen=3, maxlen=3, support=0.3)) summary(items3) inspect(sort(items3, by ="support")) items4 <- apriori(trans, parameter=list(minlen=4, maxlen=4, support=0.3)) summary(items4) rules <- apriori(trans, parameter=list(minlen=2, support=0.3)) summary(rules) inspect(rules) rules <- apriori(trans, parameter=list(minlen=2, support=0.3, confidence=0.3, target = "rules")) summary(rules) inspect(rules) plot(rules) plot(rules@quality)

confidentRules <- rules[quality(rules)$confidence > 0.3] inspect(confidentRules) plot(confidentRules, method="matrix", control=list(reorder=TRUE)) inspect(head(sort(rules, by="lift"), 10)) highConfidenceRules <- head(sort(rules, by="confidence"), 5) plot(highConfidenceRules, method="graph", control=list(type="items")) highLiftRules <- head(sort(rules, by="lift"), 5) plot(highLiftRules, method="graph", control=list(type="items")) # plot parallel coordinates of the candidate rules plot(rules, method="paracoord", control=list(reorder=TRUE))

Console Output

# HW6: Extra Exercise # CS816 Big Data Analytics ################# # Extra Exercise ################# library('arules')

## Loading required package: Matrix ## ## Attaching package: 'arules' ## ## The following objects are masked from 'package:base': ## ## %in%, abbreviate, write

Page 6: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

library('arulesViz')

## Loading required package: grid ## ## Attaching package: 'arulesViz' ## ## The following object is masked from 'package:arules': ## ## abbreviate ## ## The following object is masked from 'package:base': ## ## abbreviate

## create the dataset file using basket format purchases <- c("beer,diapers", "soda,potato,chips,hamburger,meat,milk,eggs", "coffee,eggs", "beer,bread,cheese,ham", "diapers,beer,potato,chips", "cheese,ham,beer", "ham,cheese,bread,coffee,milk", "soda,cheese,bread,ham", "coffee,hamburger,meat", "eggs,diapers,beer") # write to a basket file data <- paste(purchases, sep="\n") write(data, file = "purchases") # read transcations from puchases "basket" file trans <- read.transactions("purchases", format = "basket", sep=",") inspect(trans)

## items ## 1 {beer,diapers} ## 2 {chips,eggs,hamburger,meat,milk,potato,soda} ## 3 {coffee,eggs} ## 4 {beer,bread,cheese,ham} ## 5 {beer,chips,diapers,potato} ## 6 {beer,cheese,ham} ## 7 {bread,cheese,coffee,ham,milk} ## 8 {bread,cheese,ham,soda} ## 9 {coffee,hamburger,meat} ## 10 {beer,diapers,eggs}

summary(trans)

## transactions as itemMatrix in sparse format with ## 10 rows (elements/itemsets/transactions) and ## 13 columns (items) and a density of 0.2846154

Page 7: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

## ## most frequent items: ## beer cheese ham bread coffee (Other) ## 5 4 4 3 3 18 ## ## element (itemset/transaction) length distribution: ## sizes ## 2 3 4 5 7 ## 2 3 3 1 1 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.0 3.0 3.5 3.7 4.0 7.0 ## ## includes extended item information - examples: ## labels ## 1 beer ## 2 bread ## 3 cheese

# apply apriori on the itemsets in the transactions # frequent 2-itemsets items2 <- apriori(trans, parameter=list(minlen=2, maxlen=2, support=0.3))

## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.8 0.1 1 none FALSE TRUE 0.3 2 2 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[13 item(s), 10 transaction(s)] done [0.00s]. ## sorting and recoding items ... [7 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 done [0.00s]. ## writing ... [5 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].

summary(items2)

## set of 5 rules ## ## rule length distribution (lhs + rhs):sizes ## 2

Page 8: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

## 5 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2 2 2 2 2 2 ## ## summary of quality measures: ## support confidence lift ## Min. :0.30 Min. :1 Min. :2.0 ## 1st Qu.:0.30 1st Qu.:1 1st Qu.:2.5 ## Median :0.30 Median :1 Median :2.5 ## Mean :0.34 Mean :1 Mean :2.4 ## 3rd Qu.:0.40 3rd Qu.:1 3rd Qu.:2.5 ## Max. :0.40 Max. :1 Max. :2.5 ## ## mining info: ## data ntransactions support confidence ## trans 10 0.3 0.8

inspect(sort(items2, by ="support"))

## lhs rhs support confidence lift ## 4 {cheese} => {ham} 0.4 1 2.5 ## 5 {ham} => {cheese} 0.4 1 2.5 ## 1 {diapers} => {beer} 0.3 1 2.0 ## 2 {bread} => {cheese} 0.3 1 2.5 ## 3 {bread} => {ham} 0.3 1 2.5

# frequent 3-itemsets items3 <- apriori(trans, parameter=list(minlen=3, maxlen=3, support=0.3))

## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.8 0.1 1 none FALSE TRUE 0.3 3 3 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[13 item(s), 10 transaction(s)] done [0.00s]. ## sorting and recoding items ... [7 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [2 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].

Page 9: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

summary(items3)

## set of 2 rules ## ## rule length distribution (lhs + rhs):sizes ## 3 ## 2 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3 3 3 3 3 3 ## ## summary of quality measures: ## support confidence lift ## Min. :0.3 Min. :1 Min. :2.5 ## 1st Qu.:0.3 1st Qu.:1 1st Qu.:2.5 ## Median :0.3 Median :1 Median :2.5 ## Mean :0.3 Mean :1 Mean :2.5 ## 3rd Qu.:0.3 3rd Qu.:1 3rd Qu.:2.5 ## Max. :0.3 Max. :1 Max. :2.5 ## ## mining info: ## data ntransactions support confidence ## trans 10 0.3 0.8

inspect(sort(items3, by ="support"))

## lhs rhs support confidence lift ## 1 {bread,cheese} => {ham} 0.3 1 2.5 ## 2 {bread,ham} => {cheese} 0.3 1 2.5

# frequent 4-itemsets items4 <- apriori(trans, parameter=list(minlen=4, maxlen=4, support=0.3))

## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.8 0.1 1 none FALSE TRUE 0.3 4 4 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[13 item(s), 10 transaction(s)] done [0.00s]. ## sorting and recoding items ... [7 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 done [0.00s].

Page 10: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

## writing ... [0 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].

summary(items4)

## set of 0 rules

############################## # Generate and Visualize Rules ############################## # run Apriori without max (7 rules 100% confidence) rules <- apriori(trans, parameter=list(minlen=2, support=0.3))

## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.8 0.1 1 none FALSE TRUE 0.3 2 10 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[13 item(s), 10 transaction(s)] done [0.00s]. ## sorting and recoding items ... [7 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [7 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].

summary(rules)

## set of 7 rules ## ## rule length distribution (lhs + rhs):sizes ## 2 3 ## 5 2 ## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.000 2.000 2.000 2.286 2.500 3.000 ## ## summary of quality measures: ## support confidence lift ## Min. :0.3000 Min. :1 Min. :2.000 ## 1st Qu.:0.3000 1st Qu.:1 1st Qu.:2.500 ## Median :0.3000 Median :1 Median :2.500

Page 11: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

## Mean :0.3286 Mean :1 Mean :2.429 ## 3rd Qu.:0.3500 3rd Qu.:1 3rd Qu.:2.500 ## Max. :0.4000 Max. :1 Max. :2.500 ## ## mining info: ## data ntransactions support confidence ## trans 10 0.3 0.8

inspect(rules)

## lhs rhs support confidence lift ## 1 {diapers} => {beer} 0.3 1 2.0 ## 2 {bread} => {cheese} 0.3 1 2.5 ## 3 {bread} => {ham} 0.3 1 2.5 ## 4 {cheese} => {ham} 0.4 1 2.5 ## 5 {ham} => {cheese} 0.4 1 2.5 ## 6 {bread,cheese} => {ham} 0.3 1 2.5 ## 7 {bread,ham} => {cheese} 0.3 1 2.5

# (11 rules with 30% confidence) rules <- apriori(trans, parameter=list(minlen=2, support=0.3, confidence=0.3, target = "rules"))

## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.3 0.1 1 none FALSE TRUE 0.3 2 10 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[13 item(s), 10 transaction(s)] done [0.00s]. ## sorting and recoding items ... [7 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [11 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].

summary(rules)

## set of 11 rules ## ## rule length distribution (lhs + rhs):sizes ## 2 3 ## 8 3

Page 12: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

## ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.000 2.000 2.000 2.273 2.500 3.000 ## ## summary of quality measures: ## support confidence lift ## Min. :0.3000 Min. :0.6000 Min. :2.000 ## 1st Qu.:0.3000 1st Qu.:0.7500 1st Qu.:2.500 ## Median :0.3000 Median :1.0000 Median :2.500 ## Mean :0.3182 Mean :0.8955 Mean :2.409 ## 3rd Qu.:0.3000 3rd Qu.:1.0000 3rd Qu.:2.500 ## Max. :0.4000 Max. :1.0000 Max. :2.500 ## ## mining info: ## data ntransactions support confidence ## trans 10 0.3 0.3

inspect(rules)

## lhs rhs support confidence lift ## 1 {diapers} => {beer} 0.3 1.00 2.0 ## 2 {beer} => {diapers} 0.3 0.60 2.0 ## 3 {bread} => {cheese} 0.3 1.00 2.5 ## 4 {cheese} => {bread} 0.3 0.75 2.5 ## 5 {bread} => {ham} 0.3 1.00 2.5 ## 6 {ham} => {bread} 0.3 0.75 2.5 ## 7 {cheese} => {ham} 0.4 1.00 2.5 ## 8 {ham} => {cheese} 0.4 1.00 2.5 ## 9 {bread,cheese} => {ham} 0.3 1.00 2.5 ## 10 {bread,ham} => {cheese} 0.3 1.00 2.5 ## 11 {cheese,ham} => {bread} 0.3 0.75 2.5

# visualization of the selected rules plot(rules)

Page 13: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

plot(rules@quality)

# 11 rules matrix confidentRules <- rules[quality(rules)$confidence > 0.3] inspect(confidentRules)

## lhs rhs support confidence lift ## 1 {diapers} => {beer} 0.3 1.00 2.0 ## 2 {beer} => {diapers} 0.3 0.60 2.0 ## 3 {bread} => {cheese} 0.3 1.00 2.5 ## 4 {cheese} => {bread} 0.3 0.75 2.5 ## 5 {bread} => {ham} 0.3 1.00 2.5 ## 6 {ham} => {bread} 0.3 0.75 2.5 ## 7 {cheese} => {ham} 0.4 1.00 2.5 ## 8 {ham} => {cheese} 0.4 1.00 2.5 ## 9 {bread,cheese} => {ham} 0.3 1.00 2.5 ## 10 {bread,ham} => {cheese} 0.3 1.00 2.5 ## 11 {cheese,ham} => {bread} 0.3 0.75 2.5

plot(confidentRules, method="matrix", control=list(reorder=TRUE))

## Itemsets in Antecedent (LHS) ## [1] "{cheese}" "{cheese,ham}" "{ham}" "{bread,ham}" ## [5] "{bread}" "{bread,cheese}" "{diapers}" "{beer}" ## Itemsets in Consequent (RHS) ## [1] "{cheese}" "{beer}" "{diapers}" "{bread}" "{ham}"

Page 14: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

# displays rules with top lift scores inspect(head(sort(rules, by="lift"), 10))

## lhs rhs support confidence lift ## 3 {bread} => {cheese} 0.3 1.00 2.5 ## 4 {cheese} => {bread} 0.3 0.75 2.5 ## 5 {bread} => {ham} 0.3 1.00 2.5 ## 6 {ham} => {bread} 0.3 0.75 2.5 ## 7 {cheese} => {ham} 0.4 1.00 2.5 ## 8 {ham} => {cheese} 0.4 1.00 2.5 ## 9 {bread,cheese} => {ham} 0.3 1.00 2.5 ## 10 {bread,ham} => {cheese} 0.3 1.00 2.5 ## 11 {cheese,ham} => {bread} 0.3 0.75 2.5 ## 1 {diapers} => {beer} 0.3 1.00 2.0

# graph the 5 rules with the highest CONFIDENCE highConfidenceRules <- head(sort(rules, by="confidence"), 5) plot(highConfidenceRules, method="graph", control=list(type="items"))

Page 15: Homework Due Lesson 6 Association Rulesctappert/cs816-15fall/hw/hw06-sol.pdf · Homework Due Lesson 6 – Association Rules All homework will be team homework but all team members

# graph the 5 rules with the highest LIFT highLiftRules <- head(sort(rules, by="lift"), 5) plot(highLiftRules, method="graph", control=list(type="items"))

# plot parallel coordinates of the candidate rules plot(rules, method="paracoord", control=list(reorder=TRUE))

# references # http://www.rdatamining.com/examples/association-rules # http://statistical-research.com/data-frames-and-transactions/