16

Click here to load reader

Frequent Itemset Mining(FIM) on BigData

Embed Size (px)

DESCRIPTION

Frequent Itemset Mining on BigData using mapreduce, Apriori and Eclat method.

Citation preview

Page 1: Frequent Itemset Mining(FIM) on BigData

A LITERATURE SURVEY ON :-

“FREQUENT ITEMSET MINING ON BIGDATA”

By :-

RAJU GUPTA (9028218451)

PURUSHOTAM SINGH

Page 2: Frequent Itemset Mining(FIM) on BigData

Big DataBig data usually includes data sets with sizes beyond the ability of commonly used software tools to capture,curate, manage, and process the data within a tolerable elapsed time.

Page 3: Frequent Itemset Mining(FIM) on BigData

Introduction :-

Frequent Itemset Mining (FIM)

Support The support supp(X) of an itemset X is defined as the proportion of

transactions in the data set which contain the itemset.

supp(X)= no. of transactions which contain the itemset X / total no. of transactions.

Confidence

conf(X->Y)= supp(X U Y)/supp(X).

Page 4: Frequent Itemset Mining(FIM) on BigData

Fig:- Example for support and confidence

Page 5: Frequent Itemset Mining(FIM) on BigData

Hadoop Framework :- Apache Hadoop is an open-source software framework  for

storage and large-scale processing of data-sets on clusters of commodity hardware.

Hadoop Distributed File System (HDFS).

Hadoop MapReduce.

Page 6: Frequent Itemset Mining(FIM) on BigData

Map Reduce :-

Map :-

A mapper processes a part of data and generates a key-value pair.

Reduce :-

various key value pair are combined and fed to reducer which processes these parts and gives o/p.

MapReduce

Map

Key value pair

generation

Reduce

Give o/p

Page 7: Frequent Itemset Mining(FIM) on BigData

EXAMPLE1

Page 8: Frequent Itemset Mining(FIM) on BigData

EXAMPLE2

Page 9: Frequent Itemset Mining(FIM) on BigData

MAP REDUCE AND ITS ALGORITHM ..

• It is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster..

• Single pass counting utilizes a map reduce phase for each candidate generation and frequency counting steps..

Page 10: Frequent Itemset Mining(FIM) on BigData

MAP REDUCE(Cont..)

• Fixed pass combined counting starts to generate candidates with n different lengths after p phases and count their frequencies in one database scan.

• Dynamic passes counting is similar to fixed passes combined counting however n and p is determined dynamically at each phase by the number of generated candidates.

Page 11: Frequent Itemset Mining(FIM) on BigData

MAP REDUCE(Cont..)

• Fixed pass combined counting starts to generate candidates with n different lengths after p phases and count their frequencies in one database scan.

• Dynamic passes counting is similar to fixed passes combined counting however n and p is determined dynamically at each phase by the number of generated candidates.

Page 12: Frequent Itemset Mining(FIM) on BigData

MAP REDUCE(Cont..)

o Parallel FP Growth is a parallel version of well known FP Growth.. PFP groups the items and distributes their conditional databases to the mappers..

o The PARMA algorithm finds aproximate collections of frequent itemsets.

o TWISTER improves the performance between map reduce cycles or NIMBLE provides better programming tools for data mining jobs.

Page 13: Frequent Itemset Mining(FIM) on BigData

Search space distribution :-

The main challenge in adapting algorithms to the MapReduce Framework.

Task defined at start up.

Prefix tree:oTree Structure where each path represents an itemset.

oDivided into independent groups.

oEclat traverses the tree in the DFS manner to find FI’s

Running Time in Eclat.

Page 14: Frequent Itemset Mining(FIM) on BigData

Search space distribution (cont..) :-

To estimate the computation time of a subtree.o Total No. of items

o Order of frequency of items.

o Total Frequency of items.

Balanced Partitioning of prefix tree.

Page 15: Frequent Itemset Mining(FIM) on BigData
Page 16: Frequent Itemset Mining(FIM) on BigData