35
1 Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge www.technologyforge.net Version 0.1 ?

1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?

Embed Size (px)

Citation preview

Page 1: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

1Weka Tutorial 5 - Association© 2009 – Mark Polczynski

Weka Tutorial 5 –

Association

TechnologyForge

www.technologyforge.net

Version 0.1

?

Page 2: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Association

Market Basket Analysis

2Weka Tutorial 5 - Association

Got Beer?

Got Diapers?

What things tend to “go together”

in your market basket?

Page 3: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

3Weka Tutorial 5 - Association

Example of Market Basket

Analysis

People who bought beer also usually bought diapers.

People who bought diapers also usually bought beer.

Page 4: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

4Weka Tutorial 5 - Association

Cash Register Checkout Data:•Items in shopping cart•Time of day•Day of week•Season of year•Method of payment•Location of store•etc…

Fact-Based Marketing Strategies:•Floor plans•Discounts•Coupons•etc…

Associations?

What can you use associations for?

Page 5: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

5Weka Tutorial 5 - Association

Then to play, or not to play, that is the question!

If the weather outside is:

Page 6: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

6Weka Tutorial 5 - Association

TPONTPNom.xls dataset:

Outlook Temp. Humidity Windy Playsunny hot high false nosunny hot high true no

overcast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true no

overcast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yes

overcast mild high true yesovercast hot normal false yes

rainy mild high true no

Page 7: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

7Weka Tutorial 5 - Association

Attribute values

and counts

Outlook Temp. Humidity Windy Playsunny hot high false nosunny hot high true no

overcast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true no

overcast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yes

overcast mild high true yesovercast hot normal false yes

rainy mild high true no

Outlook Temp. Humidity Windy Playsunny = 5 hot = 4 high = 7 true = 6 yes = 9

overcast = 4 mild = 6 normal = 7 false = 8 no = 5rainy = 5 cool = 4

Page 8: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

8Weka Tutorial 5 - Association

Classification: Input attributes Class

Goal: Predict whether the person will play, given weather conditions.

Outlook Temp. Humidity Windy Playsunny hot high false nosunny hot high true no

overcast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true no

overcast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yes

overcast mild high true yesovercast hot normal false yes

rainy mild high true no

Page 9: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humidity Windy Playsunny hot high false nosunny hot high true no

overcast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true no

overcast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yes

overcast mild high true yesovercast hot normal false yes

rainy mild high true no

9Weka Tutorial 5 - Association

Some things that “go together”: Outlook = sunny AND Temp. = hot go together 2 times, Outlook = sunny AND Temp. = cool go together only once.

For association, we look for things

that “go together”

Page 10: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

10Weka Tutorial 5 - Association

1

34

5 2

Outlook: sunny overcast rainy

Temp: hot mild cool

Humidity: high normal

Windy: true false

Two-item sets of attributes that can

go together

Play: yes no

1-22-33-44-55-11-32-43-54-15-2 10node-to-node

links

Page 11: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

11Weka Tutorial 5 - Association

1

34

5 2

Outlook: sunny overcast rainy

Temp: hot mild cool

Humidity: high normal

Windy: true false

sunny/hotsunny/mildsunny/coolovercast/hotovercast/mildovercast/coolrainy/hotrainy/mildrainy/cool

Combinations of Outlook and Temperature that

can go together

Play: yes no

3 x 3 = 9

Page 12: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

12Weka Tutorial 5 - Association

1

34

5 2

Play: 2

All possibletwo-item sets

1-2: 3x3 = 92-3: 3x2 = 63-4: 2x2 = 44-5: 2x2 = 45-1: 2x3 = 61-3: 3x2 = 62-4: 3x2 = 63-5: 2x2 = 44-1: 2x3 = 65-2: 2x3 = 6 57

Outlook: 3

Temp: 3

Windy: 2 Humidity: 2

node-to-node links

combinations for link

Page 13: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 2 2 0 3 1 1 3 2 2mild 6 2 1 3 4 2 3 3 4 2cool 4 1 1 2 0 4 2 2 3 1

Humid. high 7 3 2 2 3 4 0 3 4 3 4normal 7 2 2 3 1 2 4 3 4 6 1

Windy yes 6 2 2 2 1 3 2 3 3 3 3no 8 3 2 3 3 3 2 4 4 6 2

Play yes 9 2 4 3 2 4 3 3 6 3 6no 5 3 0 2 2 2 1 4 1 3 2

13Weka Tutorial 5 - Association

Occurrencesof each pair

Page 14: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 3 1 1 3 2 2mild 6 4 2 3 3 4 2cool 4 0 4 2 2 3 1

Humid. high 7 3 4 3 4normal 7 3 4 6 1

Windy yes 6 3 3no 8 6 2

Play yes 9no 5

14Weka Tutorial 5 - Association

Temperature = cool and Humidity = normal occurs more often than

Temperature = cool and Humidity = high

Humidity = normal and Play = yesoccurs more often than

Humidity = normal and Play = no

Page 15: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 3 1 1 3 2 2mild 6 4 2 3 3 4 2cool 4 0 4 2 2 3 1

Humid. high 7 3 4 3 4normal 7 3 4 6 1

Windy yes 6 3 3no 8 6 2

Play yes 9no 5

15Weka Tutorial 5 - Association

“Support” is the number of instances

where a particular rule is correct.

Temp. = cool AND Humid. = normal

occurs 4 times in the dataset,

so the support is 4.

Page 16: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 3 1 1 3 2 2mild 6 4 2 3 3 4 2cool 4 0 4 2 2 3 1

Humid. high 7 3 4 3 4normal 7 3 4 6 1

Windy yes 6 3 3no 8 6 2

Play yes 9no 5

16Weka Tutorial 5 - Association

Here are the 47 two-item sets

that have support => 2.

Combinations with highest coverage:Humidity = normal AND Play = yes

Windy = no and Play = yes

Combinations with highest support:Humidity = normal AND Play = yes

Windy = no and Play = yes

Page 17: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 3 1 1 3 2 2mild 6 4 2 3 3 4 2cool 4 0 4 2 2 3 1

Humid. high 7 3 4 3 4normal 7 3 4 6 1

Windy yes 6 3 3no 8 6 2

Play yes 9no 5

17Weka Tutorial 5 - Association

“Confidence” of the association rule

“If Humidity = normal, then Play =

yes”

is 6/7, or 86%

CombinationHumidity = normal AND Play = yes

occurs 6 times.

Number of occurrences of Humidity = normal

is 7.

Page 18: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 3 1 1 3 2 2mild 6 4 2 3 3 4 2cool 4 0 4 2 2 3 1

Humid. high 7 3 4 3 4normal 7 3 4 6 1

Windy yes 6 3 3no 8 6 2

Play yes 9no 5

18Weka Tutorial 5 - Association

Confidence of rule

If Play = yes, then Humidity = normal

is 6/9, or 67%

CombinationHumidity = normal AND Play = yes

occurs 6 times.

Number of occurrences of Play = yes

is 9.

Page 19: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

19Weka Tutorial 5 - Association

THENOutlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

IF 5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 0.40 0.40 0.20 0.60 0.40 0.40 0.60 0.40 0.60

overcast 4 0.50 0.25 0.25 0.50 0.50 0.50 0.50 1.00 0.00rainy 5 0.00 0.60 0.40 0.40 0.60 0.40 0.60 0.60 0.40

Temp. hot 4 0.50 0.50 0.00 0.75 0.25 0.25 0.75 0.50 0.50mild 6 0.33 0.17 0.50 0.67 0.33 0.50 0.50 0.67 0.33cool 4 0.25 0.25 0.50 0.00 1.00 0.50 0.50 0.75 0.25

Humid. high 7 0.43 0.29 0.29 0.43 0.57 0.00 0.43 0.57 0.43 0.57normal 7 0.29 0.29 0.43 0.14 0.29 0.57 0.43 0.57 0.86 0.14

Windy yes 6 0.33 0.33 0.33 0.17 0.50 0.33 0.50 0.50 0.50 0.50no 8 0.38 0.25 0.38 0.38 0.38 0.25 0.50 0.50 0.75 0.25

Play yes 9 0.22 0.44 0.33 0.22 0.44 0.33 0.33 0.67 0.33 0.67no 5 0.60 0.00 0.40 0.40 0.40 0.20 0.80 0.20 0.60 0.40

If Humidity = normal, then Play = yes

If Play = yes,then Humidity = normal

The two-item set of Humidity and Play produced two different rules.

Page 20: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

20Weka Tutorial 5 - Association

THENOutlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

IF 5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 0.40 0.40 0.20 0.60 0.40 0.40 0.60 0.40 0.60

overcast 4 0.50 0.25 0.25 0.50 0.50 0.50 0.50 1.00 0.00rainy 5 0.00 0.60 0.40 0.40 0.60 0.40 0.60 0.60 0.40

Temp. hot 4 0.50 0.50 0.00 0.75 0.25 0.25 0.75 0.50 0.50mild 6 0.33 0.17 0.50 0.67 0.33 0.50 0.50 0.67 0.33cool 4 0.25 0.25 0.50 0.00 1.00 0.50 0.50 0.75 0.25

Humid. high 7 0.43 0.29 0.29 0.43 0.57 0.00 0.43 0.57 0.43 0.57normal 7 0.29 0.29 0.43 0.14 0.29 0.57 0.43 0.57 0.86 0.14

Windy yes 6 0.33 0.33 0.33 0.17 0.50 0.33 0.50 0.50 0.50 0.50no 8 0.38 0.25 0.38 0.38 0.38 0.25 0.50 0.50 0.75 0.25

Play yes 9 0.22 0.44 0.33 0.22 0.44 0.33 0.33 0.67 0.33 0.67no 5 0.60 0.00 0.40 0.40 0.40 0.20 0.80 0.20 0.60 0.40

Outlook Temp. Humid. Windy Play

sunn

y

over

cast

rain

y

hot

mild

cool

high

norm

al

true

fals

e

yes

no

5 4 5 4 6 4 7 7 6 8 9 5Oulook sunny 5 2 2 1 3 2 2 3 2 3

overcast 4 2 1 1 2 2 2 2 4 0rainy 5 0 3 2 2 3 2 3 3 2

Temp. hot 4 2 2 0 3 1 1 3 2 2mild 6 2 1 3 4 2 3 3 4 2cool 4 1 1 2 0 4 2 2 3 1

Humid. high 7 3 2 2 3 4 0 3 4 3 4normal 7 2 2 3 1 2 4 3 4 6 1

Windy yes 6 2 2 2 1 3 2 3 3 3 3no 8 3 2 3 3 3 2 4 4 6 2

Play yes 9 2 4 3 2 4 3 3 6 3 6no 5 3 0 2 2 2 1 4 1 3 2

Two-item set coverage

Two-item set accuracy

For thresholds set at: accuracy = 100% coverage = >2

Two-item set association rules:If Temperature = cool, then Humidity = normalIf Outlook = overcast, than Play = yes

Page 21: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Weka Tutorial 5 - Association 21

Dataset Association Rule Generator

Association Rules

Support Confidence

Predictive apriori association algorithm:

•Optimally combines support and confidence into predictive accuracy,

•Requires only that user specify number of rules generated.

Optimumthresholds?Optimum

thresholds?

Page 22: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

22Weka Tutorial 5 - Association

Load TPONTPNom.arff

dataset

Page 23: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

23Weka Tutorial 5 - Association

Show Outlook relative to Play

Show Outlook relative to Play

Page 24: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

24Weka Tutorial 5 - Association

Choose PredictiveApriori

algorithm

Page 25: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

25Weka Tutorial 5 - Association

Best 100 rules for TPONTP dataset

Page 26: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

26Weka Tutorial 5 - Association

Occurrences of Outlook = overcast

Support for Outlook = overcast

ANDPlay = yes

Confidence of If Humidity = normal then

Play = yesis 6/7 = 86%

Confidence ofIf Outlook = overcast,

then Play = yesis 4/4 = 100%

Predictive accuracy = 95.6%

Page 27: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

27Weka Tutorial 5 - Association

Page 28: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

28Weka Tutorial 5 - Association

Input attributes Class

Outlook Temp. Humidity Windy Playsunny hot high false nosunny hot high true no

overcast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true no

overcast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yes

overcast mild high true yesovercast hot normal false yes

rainy mild high true no

Association rule:If Humidity = normalAND Windy = false,then Play = yes

Classification Goal: Predict whether the person will play, given weather conditions.

Question: Is there a linkage between classification and association?

Page 29: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

29Weka Tutorial 5 - Association

Mine only the association rules that have the designated class attribute as the antecedent.

Keep the number of rules at 100.

Page 30: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

30Weka Tutorial 5 - Association

Page 31: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

31Weka Tutorial 5 - Association

Page 32: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

32Weka Tutorial 5 - Association

What’s the difference between classification and association?

•In one sense, association is like classification, except that instead of considering just one attribute as the class attribute, it considers every attribute and every combination of attributes as a class, and then performs a form of classification on all possible classes.

•Unlike classification, association does not necessarily consider all attributes when attempting to create rules, but operates on sub-set combinations of attributes.

•Association only works on nominal attributes, not numerical attributes, although numerical attributes can be discretized.

Page 33: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

33Weka Tutorial 5 - Association

Topic of the next Weka tutorial

The Weka Experimenter Environment supports efficient automation of algorithm testing.

Page 34: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

34Weka Tutorial 5 - Association

Weka Documentation:

Page 35: 1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge  Version 0.1 ?

Weka Tutorial 5 - Association 35

Contact the Author:

Mark Polczynski, PhDThe Technology [email protected]