40
Instructor : Prof. Marina Gavrilova

Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Embed Size (px)

Citation preview

Page 1: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Instructor : Prof. Marina Gavrilova

Page 2: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

GoalGoal of this presentation is to discuss in detail

how data mining methods are used in market analysis.

Page 3: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Outline of Presentation Motivation based on types of learning

(supervised/unsupervised) Market Based Analysis Association Rule Algorithms More abstract problem Redux Breadth-first search Depth-first search Summary

Page 4: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

What to Learn/Discover?Statistical SummariesGeneratorsDensity EstimationPatterns/RulesAssociations Clusters/Groups Exceptions/OutliersChanges in Patterns Over Time or

Location

Page 5: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Market Basket AnalysisConsider shopping cart filled with several

itemsMarket basket analysis tries to answer the

following questions:Who makes purchases?What do customers buy together?In what order do customers purchase items?

Page 6: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Market Basket AnalysisGiven:A database of

customer transactions

Each transaction is a set of items

Example:Transaction with TID 111 contains items {Pen, Ink, Milk, Juice}

TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4

Page 7: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Market Basket Analysis (Contd.)Coocurrences

80% of all customers purchase items X, Y and Z together.

Association rules60% of all customers who purchase X and Y

also buy Z.Sequential patterns

60% of customers who first buy X also purchase Y within three weeks.

Example: Face recognition for vending machine product recommendation

Page 8: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Confidence and SupportWe prune the set of all possible association

rules using two interesting measures:Support of a rule:

X Y has support s : P(XY) = s (X AND Y PURCHASED TOGETHER)

Confidence of a rule:X Y has confidence c : P(Y|X) = c (Y

FOLLOWED X)

Page 9: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

ExampleExamples:{Pen} => {Milk}

Support: 75%Confidence: 75%

{Ink} => {Pen}Support: 100%Confidence: 100%

TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4

Page 10: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

ExampleFind all itemsets

withsupport >= 75%?

TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4

Page 11: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

ExampleFind all association

rules with support >= 50%

TID CID Date Item Qty 111 201 5/1/99 Pen 2 111 201 5/1/99 Ink 1 111 201 5/1/99 Milk 3 111 201 5/1/99 Juice 6 112 105 6/3/99 Pen 1 112 105 6/3/99 Ink 1 112 105 6/3/99 Milk 1 113 106 6/5/99 Pen 1 113 106 6/5/99 Milk 1 114 201 7/1/99 Pen 2 114 201 7/1/99 Ink 2 114 201 7/1/99 Juice 4

Page 12: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Market Basket Analysis: Applications

Sample ApplicationsDirect marketingFraud detection for medical insuranceFloor/shelf planningWeb site layoutCross-selling

Page 13: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Applications of Frequent ItemsetsMarket Basket AnalysisAssociation RulesClassification (especially: text, rare classes)Seeds for construction of Bayesian NetworksWeb log analysisCollaborative filtering

Page 14: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Association Rule AlgorithmsAbstract problem reduxBreadth-first searchDepth-first search

Page 15: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Problem ReduxAbstract: A set of items {1,2,…,k} A dabase of transactions

(itemsets) D={T1, T2, …, Tn},Tj subset {1,2,…,k}

GOAL:Find all itemsets that appear in at

least x transactions

(“appear in” == “are subsets of”)I subset T: T supports I

For an itemset I, the number of transactions it appears in is called the support of I.

x is called the minimum support.

Concrete: I = {milk, bread, cheese, …} D = { {milk,bread,cheese},

{bread,cheese,juice}, …}

GOAL:Find all itemsets that appear in

at least 1000 transactions

{milk,bread,cheese} supports {milk,bread}

Page 16: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Problem Redux (Cont.)Definitions:An itemset is frequent if it

is a subset of at least x transactions. (FI.)

An itemset is maximally frequent if it is frequent and it does not have a frequent superset. (MFI.)

GOAL: Given x, find all frequent (maximally frequent) itemsets (to be stored in the FI (MFI)).

Obvious relationship:MFI subset FI

Example:D={ {1,2,3}, {1,2,3},

{1,2,3}, {1,2,4} }Minimum support x = 3

{1,2} is frequent{1,2,3} is maximal frequentSupport({1,2}) = 4

All maximal frequent itemsets: {1,2,3}

Page 17: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

The Itemset Lattice{}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

Page 18: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Frequent Itemsets

Frequent itemsets

Infrequent itemsets

{}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

Page 19: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Breath First Search: 1-Itemsets{}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 20: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Breath First Search: 2-Itemsets{}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 21: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Breath First Search: 3-Itemsets{}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 22: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Breadth First Search: RemarksWe prune infrequent itemsets and avoid

to count themTo find an itemset with k items, we need

to count all 2k subsets

Page 23: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Depth First Search (1){}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 24: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Depth First Search (2){}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 25: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Depth First Search (3){}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 26: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Depth First Search (4){}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 27: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Depth First Search (5){}

{2}{1} {4}{3}

{1,2} {2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

InfrequentFrequentCurrently examinedDon’t know

Page 28: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

BFS Versus DFSBreadth First SearchPrunes infrequent

itemsetsUses anti-

monotonicity: Every superset of an infrequent itemset is infrequent

Depth First SearchPrunes frequent

itemsetsUses monotonicity:

Every subset of a frequent itemset is frequent

Page 29: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

ExtensionsImposing constraints

Only find rules involving the dairy departmentOnly find rules involving expensive productsOnly find “expensive” rulesOnly find rules with “whiskey” on the right hand

sideOnly find rules with “milk” on the left hand sideHierarchies on the itemsCalendars (every Sunday, every 1st of the month)

Page 30: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Item set ConstraintsDefinition: A constraint is an arbitrary property of itemsets.

Examples:The itemset has support greater than 1000. No element of the itemset costs more than $40.The items in the set average more than $20.

Goal: Find all itemsets satisfying a given constraint P.

“Solution”: If P is a support constraint, use the Apriori Algorithm.

Page 31: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Two Trivial ObservationsApriori can be applied to any constraint P

(). Start from the empty set.Prune supersets of sets that do not satisfy P.

Itemset lattice is a boolean algebra, so Apriori also applies to Q ().Start from set of all items instead of empty set.Prune subsets of sets that do not satisfy Q.

Page 32: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Negative Pruning a Monotone Q{}

{2}{1} {4}{3}

{2,3}{1,3} {1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

{1,2}

Satisfies QDoesn’t satisfy QCurrently examinedDon’t know

Page 33: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Positive Pruning in Apriori{}

{2}{1} {4}{3}

{2,3}{1,3}{1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

{1,2}

FrequentInfrequentCurrently examinedDon’t know

Page 34: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Positive Pruning in Apriori

{2,3}

{}

{2}{1} {4}{3}

{1,3}{1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

{1,2}

FrequentInfrequentCurrently examinedDon’t know

Page 35: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Positive Pruning in Apriori{}

{2}{1} {4}{3}

{2,3}{1,3}{1,4} {2,4}

{1,2,3,4}

{1,2,3}

{3,4}

{1,2,4} {1,3,4} {2,3,4}

{1,2}

FrequentInfrequentCurrently examinedDon’t know

Page 36: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

The Problem Current Techniques:Approximate the difficult constraints.

New Goal:Given constraints P and Q, with P (support) and

Q (statistical constraint). Find all itemsets that satisfy both P and Q.

Recent solutions:Newer algorithms can handle both P and Q

Page 37: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Satisfies Q

Satisfies P & Q

Satisfies P

{}

D

All supersets satisfy Q

All subsets satisfy P

Page 38: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

ApplicationsSpatial association rulesWeb miningMarket basket analysisUser/customer profiling

Page 39: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Review QuestionsWhat is Supervised and Un- supervised learning ? Is clustering – supervised or un supervised type of

learning?What are Association Rule Algorithms?Differentiate with help of an example Breadth-first search

and Depth-first search

Page 40: Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis

Useful linkshttp://www.oracle.com/technology/

industries/life_sciences/pdf/ls_sup_unsup_dm.pdf

http://www.autonlab.org/tutorials/http://www.bandmservices.com/Clustering/

Clustering.htmhttp://www.cs.sunysb.edu/~skiena/

combinatorica/animations/search.htmlhttp://www.codeproject.com/KB/java/

BFSDFS.aspx