91
1 Empirical Study on Mining Association Rules Using Population Based Stochastic Search Algorithms K.INDIRA Under the guidance of Dr. S.KANMANI Professor, Department of Information Technology, Pondicherry Engineering College

dc meet-v2

Embed Size (px)

DESCRIPTION

vcv

Citation preview

Slide 1

2ORGANIZATIONOBJECTIVES INTRODUCTIONMOTIVATIONEMPIRICAL STUDY CONCLUSIONPUBLICATIONS REFERNECES23INDIAN INSTITUTE OF TECHNOLOGY, ROORKEE OBJECTIVESTo develop novel hybrid methodology for mining Association rules both effectively and efficiently using population based search methods namely Genetic Algorithm (GA) and Particle Swarm Optimization(PSO).

34 INTRODUCTIONDATA MININGExtraction of interesting information or patterns from data in large databases is known as data mining.45 ASSOCIATION RULE MININGAssociation rule mining finds interesting associations and/or correlation relationships among large set of data items. 56 ASSOCIATION RULESAssociation Rules are of form X Y with two control parameters support and confidence

Support, s, probability that a transaction contains X YConfidence, c, conditional probability that a transaction having X also contains Y67ASSOCIATION RULESTidItems bought10Milk, Nuts, Sugar20Milk, Coffee, Sugar30Milk, Sugar, Eggs40Nuts, Eggs, Bread50Nuts, Coffee, Sugar , Eggs, BreadCustomerbuys sugarCustomerbuys bothCustomerbuys milkLet minsup = 50%, minconf = 50%Freq. Pat.: Milk:3, Nuts:3, Sugar:4, Eggs:3, {Milk, Sugar}:3

Association rules:Milk Sugar (60%, 100%)Sugar Milk (60%, 75%)78 MOTIVATIONEXISTING SYSTEMApriori, FP Growth Tree, clat are some of the popular algorithms for mining ARs. Traverse the database many times.I/O overhead, and computational complexity is more.Cannot meet the requirements of large-scale database mining. Does not fit in memory and is expensive to build89 EVOLUTIONARY ALGORITHMApplicable in problems where no (good) method is available:Discontinuities, non-linear constraints, multi-modalities.Implicitly defined models (if-then-else constructs).Most suitable in problems where multiple solutions are required.Parallel implementation is easier.Evolutionary algorithms provide robust and efficient approach in exploring large search space.

910 GA AND PSO : AN INTRODUCTIONGenetic algorithm (GA) and Particle swarm optimization (PSO) are both population based search methods and move from set of points (population) to another set of points in a single iteration with likely improvement using set of control operators.

1011 GENETIC ALGORITHMA Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology.1112 PARTCILE SWARM OPTIMIZATIONPSOs mechanism is inspired by the social and cooperative behavior displayed by various species like birds, fish etc including human beings.12Association Rule (AR) MiningPopulation Based Evolutionary MethodsGenetic Algorithm (GA)Particle Swarm Optimization (PSO)Mining Association Rules using GAAnalyzing the role of Control parameters in GA for mining ARsMining ARs using Self Adaptive GAElitist GA for Association Rule MiningMining Association rules with PSOMining Association Rules with chaotic PSOMining ARs with Neighborhood Selection in PSOMining Association rules with Self Adaptive PSOHybrid GA/PSO (GPSO) for AR Mining BLOCK DIAGRAM OF RESEARCH MODULES1314 DATASETS DESCRIPTION Lenses Habermans Survival Car Evaluation Post operative care Zoo1415DATASET DESCRIPTIONDataset NameNo. of InstancesNo. of AttributesAttribute characteristicsLenses243CategoricalHabermans Survival3063IntegerCar Evaluation17286Categorical

Post Operative Patient908Categorical,IntegerZoo10117Categorical, Integer

1516MODULESModule 1Module 3Module 21617MODULE 11718MODULE 2Mining AR using SAPSO1819MODULE 31920MODULE 12021 MINING AR USING GAMethodology

Selection : Tournament

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : No Mutation

Fitness Function :

Dataset:Lenses, Iris, Haberman from UCI repository.

Population :Fixed ( Tested with 3 values)

2122 FLOWCHART OF GAInitialize Populationsatisfy constraints ?Evaluate FitnessSelect SurvivorsOutput ResultsCrossoverYesNo2223RESULT ANALYSIS

Population Size Vs Accuracy PREDICTIVE ACCURACY2324 RESULT ANALYSIS

PREDICTIVE ACCURACYMinimum Support and Confidence Vs Accuracy2425 RESULT ANALYSISPc = .25Pc = .5Pc = .75Accuracy %No. of GenerationsAccuracy %No. of GenerationsAccuracy %No. of GenerationsLenses95895169513Haberman697771837080Iris844586518755DatasetNo. of InstancesNo. of attributesPopulation SizeMin. SupportMinimum confidenceCrossover rateAccuracy in %Lenses244360.20.90.2595Haberman30633060.90.20.571Iris15052250.20.90.7587Comparison based on variation in Crossover ProbabilityComparison of the optimum value of Parameters for maximum Accuracy achieved2526 INFERENCES Values of minimum support, minimum confidence and population size decides upon the accuracy of the system than other GA parametersCrossover rate affects the convergence rate rather than the accuracy of the system

2627MODULE 127

28MINING AR USING SAGAMethodology

Selection : Roulette Wheel

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : Self Adaptive

Fitness Function :

Dataset:Lenses, Iris, Car from UCI repository.

Population :Fixed ( Tested with 3 values)

2829 FLOWCHART OF SAGASELF ADAPTIVEInitialize Population satisfy constraints ?Evaluate FitnessSelect SurvivorsOutput ResultsCrossover and mutationYesNo2930 RESULT ANALYSISACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEAL FOR TRADITIONAL GA3031Accuracy comparison between GA, SAGA and GA with parameters set to termination values of SAGA RESULT ANALYSIS3132 INFERENCESSelf Adaptive GA gives better accuracy than Traditional GA.

3233MODULE 13334Methodology

Selection : Elitism with roulette wheel

Crossover Probability : Fixed to Pc

Mutation Probability : Self Adaptive

Fitness Function : Fitness(x) = con(x)*(log(sup(x) * length(x) + 1)

Dataset : Lenses, Iris, Car from UCI repository.

Population : Fixed

GA WITH ELITISM FOR MINING AR3435No. Of IterationsLensesCar EvaluationHaberman49094.470891.692.891.6109087.5751587.59083.32091.687.591.62587.587.592.53083.393.7583.350907575RESULTS ANALYSISPredictive Accuracy for Mining AR based on GA with Elitism3536Predictive Accuracy for Mining AR based on GA with Elitism RESULTS ANALYSIS3637 RESULTS ANALYSISExecution time for Mining AR based on GA with Elitism3738 INFERENCES Marginally better accuracy arrived

The execution time increases rapidly once global optima reached

3839 MODULE 2 Mining AR using PSO3940 RELATED WORKS S.NoVariationApplicationYear1Inertia Weight and Acceleration coefficientsOptimization20062Global-Local Best Inertia Weight computing optimal control 20063Local Optima ChaosParameter Optimization20104Adaptive Population SizeOptimization

20095Cellular PSOFunction Optimization20116Adaptive inertia weight strategyOptimization20114041 MODULE 2 Mining AR using PSO4142 MINING ARS USING PSOMethodology

Each data itemset are represented as particles

The particles moves based on velocity

The particles position are updated based on

42Flow chart depicting the General PSO Algorithm:For each particles position (p) evaluate fitnessIf fitness(p) better than fitness(pbest) then pbest= pLoop until all particles exhaustSet best of pBest as gBestUpdate particles velocity and position Loop until max iterStartInitialize particles with random position and velocity vectors.Stop: giving gBest, optimal solution. FLOWCHART OF PSO4344 RESULTS ANALYSISDataset NameTraditional GASelf Adaptive GAPSOLenses87.591.692.8Haberman75.592.591.6Car evaluation8594.495Execution Time Predictive Accuracy4445PSO produce results as effective as self adaptive GA

Computational effectiveness of PSO marginally fast when compared to SAGA.

In PSO only the best particle passes information to others and hence the computational capability of PSO is marginally better than SAGA.

INFERENCES4546 MODULE 2 Mining AR using PSO4647MINING ARS USING CHAOTIC PSO

The new chaotic map model is formulated asMethodologyInitial point u0 and V0 to 0.1 The velocity of each particle is updated by4748 RESULT ANALYSISPredictive Accuracy Comparison4849Convergence Rate Comparison for Lenses RESULT ANALYSIS4950Convergence Rate Comparison for Car EvaluationRESULT ANALYSIS5051Convergence Rate Comparison for Habermans Survival

RESULT ANALYSIS5152Better accuracy than PSOThe Chaotic Operators could be changed by altering the initial values in chaotic operator functionThe balance between exploration and exploitation is maintained

INFERENCES5253 MODULE 2 Mining AR using PSO5354The concept of local best particle (lbest) replacing the particle best (pbest) is introduced

The neighborhood best (lbest) selection is as follows;Calculate the distance of the current particle from other particlesFind the nearest m particles as the neighbor of the current particle based on distance calculatedChoose the local optimum lbest among the neighborhood in terms of fitness valuesMining ARs using NPSOMethodology5455 INTERESTINGNESS MEASUREThe interestingness measure for a rule is taken from relative confidence and is as follows:

Where k is the rule, x the antecedent part of the rule and y the consequent part of the rule k.

5556 FLOWFCHART FOR NPSOCompute xi(k+1)Compute (f(xi(k+1))Reorder the particlesGenerate neighborhoods I =1k Ki = i +1 K = k+1StartK =1 ,Initialize xi(k), vi(k)Compute f(xi(k))Determine best particles in the neighborhood of iUpdate previous best if necessaryI NStop5657 RESULT ANALYSISPredictive Accuracy Comparison for Dynamic Neighborhood selection in PSO5758 RESULT ANALYSISDatasetInterestingness ValueLens0.82Car Evaluation0.73Habermans Survival0.8Measure of Interestingness 5859 RESULT ANALYSISExecution Time Comparison for Dynamic Neighborhood selection in PSO

No. of Iterations5960 CONVERGENCE RATE

Car EvaluationLensesHabermans Survival datasets6061The avoidance of premature convergence at local optimal points tend to enhance the results The selection of local best particles based on neighbors (lbest) rather than particles own best (pbest) enhances the accuracy of the rules mined INFERENCES6162 MODULE 2 Mining AR using PSO6263 MINING AR USING SAPSOinertia-weight parameter Is added to the velocity equation as

where, w is the inertia weight.

Three self adaptive method of inertia weight is proposed as

= max (max min) g/G

( t+1) = (t) (max min) /G

where, g is the generation index representing the current number of evolutionary generations, and G is a redefined maximum number of generations. Here, the maximal and minimal weights max and min are set to 0.9 and 0.4, based on experimental study.6364DatasetHighest PA achieved within 50 runs of iterationsNo weight (Normal PSO)w = 0.5w = 0.7Lenses87.588.0984.75Haberman87.596.0799.80Car96.499.8899.84POP Care91.698.6497.91Zoo83.396.8898.97EFFECT OF CHANGING WPredictive Accuracy Comparison6465 RESULT ANALYSISPredictive Accuracy Comparison for Lenses Dataset6566Predictive Accuracy Comparison for Habermans Survival Dataset RESULT ANALYSIS6667Predictive Accuracy Comparison for Post operative patient care Lenses Dataset RESULT ANALYSIS6768Predictive Accuracy Comparison for Zoo Dataset RESULT ANALYSIS6869Predictive Accuracy Comparison for Car Evaluation Dataset RESULT ANALYSIS6970 RESULT ANALYSISPredictive Accuracy Comparison 7071 INFERENCES Self adaptive methods perform better than other methodsIn term of computational efficiency SAPSO1 performs betterSetting of appropriate values for the control parameters involved in these heuristics methods is the key point to success in these methods

7172MODULE 3Mining Association rules using Hybrid GA/PSO (GPSO)7273 RELATED WORKS S.NoMethodApplicationYear

1Hybrid GA and PSOGlobal Optimization20092Chaotic Hybrid GA & PSOCircle Detection20103Hybrid GA and PSOJob shop Scheduling20104PSO with dynamic Inertia Weight and PSOClassification Rule Mining20127374 HYBRID GA/ PSO (GPSO) MODELGenetic AlgorithmParticle Swarm OptimizationEvaluate FitnessUpperLowerInitial PopulationRanked PopulationUpdated Population7475 RESULT ANALYSISPredictive Accuracy ComparisonDatasetGA PSOGPSOLenses7587.587.5Habermans survival509095Car Evaluation7594.498Post operative Patient8583.590Zoo92.595.45957576Car evaluation DatasetHabermans Survival Dataset RESULT ANALYSISPredictive Accuracy Comparison7677Post operative patient care DatasetLenses Dataset RESULT ANALYSISPredictive Accuracy Comparison7778Zoo Dataset RESULT ANALYSISPredictive Accuracy Comparison7879 INFERENCESGenerates rule with better predictive accuracy when compared to GA and PSO. Global search optimization of GA and powerful stochastic optimization offered by PSO are both combined in GPSO, resulting in association rules with consistent accuracy. 7980Genetic Algorithm when used for mining association rules performs better than existing methods accuracy achieved

Particle swarm optimization when adopted for mining association rules produces results closer to GA but with minimum execution time

Variations when introduced in both GA and PSO indicates self adaptive mechanism performs better than others

The premature convergence being the major drawback of PSO was handled by introducing inertia weights, chaotic maps, neighborhood selection adaptive inertia weight

Hybrid GA/PSO model produces consistent results in comparison with GA and PSO CONCLUSION8081WORK TO BE COMPLETEDBalancing with exploration an exploitation in PSO by self adaptive accelerative coefficients

Breeding Swarms: Adding Mutation into gBest

implementing other measures as Execution time Comparison, number of rules generated, recall.8182CONTRIBUTION TO RESEARCHChaotic operators based on two maps for mining association rule

Chaotic self adaption in PSO for mining AR

Hybrid GA/PSO (GPSO) for AR mining

8283Papers PublishedCONFERENCESK.Indira and S.Kanmani, Framework for Comparison of Association Rule Mining Using Genetic Algorithm, In : International Conference On Computers, Communication & Intelligence , 2010. K.Indira and S.Kanmani, Mining Association Rules Using Genetic Algorithm: The role of Estimation Parameters , In : International conference on advances in computing and communications, Communication in Computer and Information Science, Springer LNCS, Volume 190, Part 8, 639-648, 2011K.Indira, S. Kanmani , Gaurav Sethia.D, Kumaran.S and Prabhakar.J, Rule Acquisition in Data Mining Using a Self Adaptive Genetic Algorithm, In : First International conference on Computer Science and Information Technology, Springer LNCS Volume 204, Part 1, 171-178, 2011.K.Indira, S.Kanmani, Prasanth, Harish and Jeeva, Population Based Search Methods in Mining Association Rules , In : ThirdInternational Conference on Advances in Communication, Network, and Computing CNC 2012, LNICST pp. 255261, 2012.8384JOURNALK.Indira and S.Kanmani, Performance Analysis of Genetic Algorithm for Mining Association Rules, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 1, 368-376, March 2012

K.Indira and S.Kanmani, Rule Acquisition using Genetic Algorithm, accepted for publication in Journal of Computing

K.Indira and S.Kanmani, Enhancing Particle Swarm optimization using chaotic operators for Association Rule Mining, Elixir Computer Science & Engineering Journal, 46 ,8563-8566, 2912

K.Indira and S.Kanmani, Association Rule Mining by Dynamic Neighborhood Selection in Particle Swarm Optimization, communicated to International journal of Swam Intelligence, Inderscience Publications

8485JOURNALK.Indira and S.Kanmani, Association Rule Mining using Self Adaptive Particle Swarm Optimization, accepted for publication in International Journal of Computer Science Issues July Issue.

K.Indira and S.Kanmani, Measures for Improving Premature Convergence in Particle Swarm Optimization for Association Rule Mining, communicated to International Journal on Data Warehousing, IGI press.

K.Indira an S.Kanmani, Mining Association Rules using Hybrid Genetic Algorithm and Particle Swarm Optimization Algorithm (GPSO), to communicate

8586 Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On Real- Coded, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010

Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, Linkage Discovery through Data Mining, IEEE Magazine on Computational Intelligence, Volume 5, February 2010. Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, A Novel Genetic Algorithm Based on Image Databases for Mining Association Rules, 6th IEEE/ACIS International Conference on Computer and Information Science, Page(s): 977 980, 2007

Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithm for Large Scale Optimization, IEEE 2010.

Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithm with guided mutation for the large-scale portfolio optimization, Expert Systems with Applications 38 (2011), 33533363.

REFERENCES8687Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009

Hong Guo, Ya Zhou, An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application, 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009

Genxiang Zhang, Haishan Chen, Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining, International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 345, 2009

Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta, On the Discovery of Association Rules by means of Evolutionary Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011

REFERENCES8788Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, Third International Conference on Measuring Technology and Mechatronics Automation, 2011.Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi Objective Association Rule Mining with Genetic Algorithm without specifying Minimum Support and Minimum Confidence, Expert Systems with Applications 38 (2011) 288298.Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule Extraction, Applied Soft Computing 11 (2011) 733743. R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326336.Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaotic particle swarm optimization algorithm, Knowledge-Based Systems 22 (2009) 455460.

REFERENCES8889Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases, Journal of King Saud University Computer and Information Sciences (2011) 23, 16.Jing Li, Han Rui-feng, A Self-Adaptive Genetic Algorithm Based On Real- Coded, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010

Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule extraction, Applied Soft Computing 11 (2011) 733743.

Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence, Expert Systems with Applications 38 (2011) 288298.

REFERENCES8990Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 852, 2009.

Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.

Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle Swarm Optimization Algorithm for Global Optimization, 2010 Sixth International Conference on Natural Computation (ICNC 2010)

Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved evolutionary method with fuzzy logic for combining Particle Swarm Optimization and Genetic Algorithms, Applied Soft Computing 11 (2011) ,26252632.REFERENCES9091Thank You