Upload
rajiv
View
217
Download
0
Embed Size (px)
Citation preview
8/3/2019 Data Mining Using Genetic Algorithm
1/24
8/3/2019 Data Mining Using Genetic Algorithm
2/24
Contents
What Is Data Mining?
Architecture of Typical Data Mining System
Biological Terminologies
What is Genetic Algorithm(GA)?
Basic Principles of GA Why Data Mining using Genetic Algorithm?
Functions of Genetic Algorithm
Pseudo Code of GA
Applications of GA
Advantages and Disadvantages
The ToolMATLAB
Conclusion & Future Work
References
2
8/3/2019 Data Mining Using Genetic Algorithm
3/24
What Is Data Mining?
Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful)patterns or knowledge from huge amount of data[1].
Data mining: a misnomer?
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
Is everything data mining?
Simple search and query processing
Expert systems
3
8/3/2019 Data Mining Using Genetic Algorithm
4/24
Architecture: Typical Data Mining System
data cleaning, integration, and selection
Database or Data WarehouseServer
Data Mining Engine
Pattern Evaluation
Graphical User Interface
Knowl
edge-
Base
DatabaseData
Warehouse
World-Wide
Web
Other Info
Repositories [1]
4
8/3/2019 Data Mining Using Genetic Algorithm
5/24
Biological Terminologies[2]
Gene - Each gene encodes a particular protein. Basically can be said, that
each gene encodes a trait, for example color of eyes.
Chromosomes - A chromosome consist ofgenes, blocks of DNA.
Chromosomes are strings of DNA and serves as a model for the whole
organism.
Alleles - Possible settings for a trait (e.g. blue, brown) are called alleles.
Locus - Each gene has its own position in the chromosome. This position is
called locus.
Genome - Complete set of genetic material (all chromosomes) is calledGenome.
Genotype - Particular set of genes in genome is called Genotype.
PhenotypeThe genotype contains the information required to construct an
organism which is referred to as the phenotype. 5
8/3/2019 Data Mining Using Genetic Algorithm
6/24
Genetic Algorithm(GA) GA was developed by John Holland in 1970.
They are based on the genetic processes of biologicalorganisms.
Over many generations, natural populations evolveaccording to the principles ofnatural selection andsurvival of the fittest, first clearly stated by CharlesDarwin in the Origin of Species.
GAs are adaptive method which may be used to solve searchand optimization problems.
After a number of new generations built with the help of thedescribed mechanisms one obtains a solution that cannot be
improved any further. This solution is taken as a final one. 6
8/3/2019 Data Mining Using Genetic Algorithm
7/24
Basic Principles of GA
Coding
Fitness function
Reproduction Selection
Crossover
Mutation Convergence
7
8/3/2019 Data Mining Using Genetic Algorithm
8/24
Coding
Before a GA can be run, a suitable coding(or representation)
for the problem must be devised.
It is assumed that a potential solution to a problem may berepresented as a set of parameters (for example, the
dimensions of the beams in a bridge design).
For example, if our problem is to maximize a function ofthree variables, F(x, y, z), we might represent each variable by
a 10-bit binary number. Our chromosome would therefore
contain three genes, and consist of 30 binary digits.
8
8/3/2019 Data Mining Using Genetic Algorithm
9/24
Fitness Function
A fitness function must be devised for each problem to be
solved.
Given a particular chromosome, the fitness function returns asingle numerical fitness or figure of merit.
Which is supposed to be proportional to the utility or
ability of the individual which that chromosome represents.
9
8/3/2019 Data Mining Using Genetic Algorithm
10/24
Reproduction
During the reproductive phase of the GA, individuals are
selectedfrom the population and recombined, producing
offspring which will comprise the next generation.
Parents are selected randomly from the population using a
scheme which favours the more fit individuals.
Having selected two parents, their chromosomes arerecombined, typically using the mechanisms ofcrossoverand
mutation.
10
8/3/2019 Data Mining Using Genetic Algorithm
11/24
Example of Crossover & Mutation
11
8/3/2019 Data Mining Using Genetic Algorithm
12/24
Convergence
Convergence is the progression towards increasing uniformity.
A gene is said to have converged when 95% of the population
share the same value.
The population is said to have converged when all of the geneshave converged.
If the GA has been correctly implemented, the population will
evolve over successive generations so that the fitness of the
best and the average individual in each generation increases
towards the global optimum.
12
8/3/2019 Data Mining Using Genetic Algorithm
13/24
Why Data Mining using Genetic
AlgorithmThere are more reasons for preference using genetic algorithms-
Its robustness
Ability to work on large and noisy datasets,
GAs perform global search of the solution space in comparison tomost other algorithms that use Greedy approach
Coping well with attribute interaction.
Parallel approaches to genetic algorithms,
the scalability of these algorithms can be achieved. this characteristic is of great importance in data mining.
Moreover, genetic algorithms have high degree of autonomy thatenables discovery of knowledge previously unknown by the user.
13
8/3/2019 Data Mining Using Genetic Algorithm
14/24
Functions of Genetic Algorithm
The Fitness Function
The fitness score is returned as a result
Parent Selection
Mating Pool Crossover
Likelihood of crossover being applied is typically between 0.6and 1.0.
Mutation
Mutation is applied to each child individually after crossover. Itrandomly alters each gene with a small probability (typically0.001).
14
8/3/2019 Data Mining Using Genetic Algorithm
15/24
Pseudo Code of GA[3]
15
8/3/2019 Data Mining Using Genetic Algorithm
16/24
Applications of GA
Domain Application TypesControl gas pipeline, pole balancing, missile evasion, pursuit
Design semiconductor layout, aircraft design, keyboardconfiguration, communication networks
Scheduling manufacturing, facility scheduling, resource allocation
Robotics trajectory planning
Machine Learning designing neural networks, improving classificationalgorithms, classifier systems
Signal Processing filter design
Game Playing poker, checkers, prisoners dilemma
CombinatorialOptimization
set covering, travelling salesman, routing, bin packing,graph colouring and partitioning
8/3/2019 Data Mining Using Genetic Algorithm
17/24
Advantages and Disadvantages
Advantages:
Concept is easy to understand
Modular, separate from application
It doesnt have to know any rules of the problem in advance.
This is very useful for very complex and loosely defined
problem.
With a well defined fitness function and carefully chosen
attributes, genetic algorithm can perform much faster than
other algorithm such as the linear method.
17
8/3/2019 Data Mining Using Genetic Algorithm
18/24
Conti
Disadvantages:- The definition of the fitness function can be very complicated
sometime.
The fitness function may affect the performance of the process
significantly if the complexity of the fitness function increase.
It is because the fitness function is used to compare every
element in the sample population to every data in the training
data set.
Sometimes an acceptable solution cannot be derived even aftercountless iteration if the genetic operators are wrongly chosen.
18
8/3/2019 Data Mining Using Genetic Algorithm
19/24
The ToolMATLAB[4]
MATLABMatrix Laboratory
MATLAB is a high-performance language for technical
computing. It integrates computation, visualization and
programming in an easy-to-use environment where problems
and solutions are expressed in familiar mathematical notation.
Simulink -
Simulink is an interactive environment for modeling,simulating, and analyzing dynamic, multi domain systems. It
lets you build a block diagram, simulate the systems behavior,
evaluate its performance, and refine the design.
19
8/3/2019 Data Mining Using Genetic Algorithm
20/24
Typical Uses Of Matlab
Math and computation
Algorithm development
Data acquisition
Modeling, simulation, and prototyping Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface
building
20
8/3/2019 Data Mining Using Genetic Algorithm
21/24
8/3/2019 Data Mining Using Genetic Algorithm
22/24
8/3/2019 Data Mining Using Genetic Algorithm
23/24
References
1. Jiawei Han and Micheline Kamber, Data Mining: Concepts
and Techniques, 2006
2. http://www.obitko.com/tutorials/genetic-algorithms/index.php
3. David Beasley et. al. (1993). An Overview of Genetic
Algorithms: Part 1, Fundamentals, University Computing,
vol.15 (2), pp. 58-69.
4. Learning MATLAB, COPYRIGHT 1984 - 2004 by The
MathWorks, Inc.
23
8/3/2019 Data Mining Using Genetic Algorithm
24/24
Thank You
any
question or suggestion
24