Data Mining Using Genetic Algorithm

  • Upload
    rajiv

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Data Mining Using Genetic Algorithm

    1/24

  • 8/3/2019 Data Mining Using Genetic Algorithm

    2/24

    Contents

    What Is Data Mining?

    Architecture of Typical Data Mining System

    Biological Terminologies

    What is Genetic Algorithm(GA)?

    Basic Principles of GA Why Data Mining using Genetic Algorithm?

    Functions of Genetic Algorithm

    Pseudo Code of GA

    Applications of GA

    Advantages and Disadvantages

    The ToolMATLAB

    Conclusion & Future Work

    References

    2

  • 8/3/2019 Data Mining Using Genetic Algorithm

    3/24

    What Is Data Mining?

    Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and

    potentially useful)patterns or knowledge from huge amount of data[1].

    Data mining: a misnomer?

    Alternative names

    Knowledge discovery (mining) in databases (KDD), knowledge

    extraction, data/pattern analysis, data archeology, data dredging,

    information harvesting, business intelligence, etc.

    Is everything data mining?

    Simple search and query processing

    Expert systems

    3

  • 8/3/2019 Data Mining Using Genetic Algorithm

    4/24

    Architecture: Typical Data Mining System

    data cleaning, integration, and selection

    Database or Data WarehouseServer

    Data Mining Engine

    Pattern Evaluation

    Graphical User Interface

    Knowl

    edge-

    Base

    DatabaseData

    Warehouse

    World-Wide

    Web

    Other Info

    Repositories [1]

    4

  • 8/3/2019 Data Mining Using Genetic Algorithm

    5/24

    Biological Terminologies[2]

    Gene - Each gene encodes a particular protein. Basically can be said, that

    each gene encodes a trait, for example color of eyes.

    Chromosomes - A chromosome consist ofgenes, blocks of DNA.

    Chromosomes are strings of DNA and serves as a model for the whole

    organism.

    Alleles - Possible settings for a trait (e.g. blue, brown) are called alleles.

    Locus - Each gene has its own position in the chromosome. This position is

    called locus.

    Genome - Complete set of genetic material (all chromosomes) is calledGenome.

    Genotype - Particular set of genes in genome is called Genotype.

    PhenotypeThe genotype contains the information required to construct an

    organism which is referred to as the phenotype. 5

  • 8/3/2019 Data Mining Using Genetic Algorithm

    6/24

    Genetic Algorithm(GA) GA was developed by John Holland in 1970.

    They are based on the genetic processes of biologicalorganisms.

    Over many generations, natural populations evolveaccording to the principles ofnatural selection andsurvival of the fittest, first clearly stated by CharlesDarwin in the Origin of Species.

    GAs are adaptive method which may be used to solve searchand optimization problems.

    After a number of new generations built with the help of thedescribed mechanisms one obtains a solution that cannot be

    improved any further. This solution is taken as a final one. 6

  • 8/3/2019 Data Mining Using Genetic Algorithm

    7/24

    Basic Principles of GA

    Coding

    Fitness function

    Reproduction Selection

    Crossover

    Mutation Convergence

    7

  • 8/3/2019 Data Mining Using Genetic Algorithm

    8/24

    Coding

    Before a GA can be run, a suitable coding(or representation)

    for the problem must be devised.

    It is assumed that a potential solution to a problem may berepresented as a set of parameters (for example, the

    dimensions of the beams in a bridge design).

    For example, if our problem is to maximize a function ofthree variables, F(x, y, z), we might represent each variable by

    a 10-bit binary number. Our chromosome would therefore

    contain three genes, and consist of 30 binary digits.

    8

  • 8/3/2019 Data Mining Using Genetic Algorithm

    9/24

    Fitness Function

    A fitness function must be devised for each problem to be

    solved.

    Given a particular chromosome, the fitness function returns asingle numerical fitness or figure of merit.

    Which is supposed to be proportional to the utility or

    ability of the individual which that chromosome represents.

    9

  • 8/3/2019 Data Mining Using Genetic Algorithm

    10/24

    Reproduction

    During the reproductive phase of the GA, individuals are

    selectedfrom the population and recombined, producing

    offspring which will comprise the next generation.

    Parents are selected randomly from the population using a

    scheme which favours the more fit individuals.

    Having selected two parents, their chromosomes arerecombined, typically using the mechanisms ofcrossoverand

    mutation.

    10

  • 8/3/2019 Data Mining Using Genetic Algorithm

    11/24

    Example of Crossover & Mutation

    11

  • 8/3/2019 Data Mining Using Genetic Algorithm

    12/24

    Convergence

    Convergence is the progression towards increasing uniformity.

    A gene is said to have converged when 95% of the population

    share the same value.

    The population is said to have converged when all of the geneshave converged.

    If the GA has been correctly implemented, the population will

    evolve over successive generations so that the fitness of the

    best and the average individual in each generation increases

    towards the global optimum.

    12

  • 8/3/2019 Data Mining Using Genetic Algorithm

    13/24

    Why Data Mining using Genetic

    AlgorithmThere are more reasons for preference using genetic algorithms-

    Its robustness

    Ability to work on large and noisy datasets,

    GAs perform global search of the solution space in comparison tomost other algorithms that use Greedy approach

    Coping well with attribute interaction.

    Parallel approaches to genetic algorithms,

    the scalability of these algorithms can be achieved. this characteristic is of great importance in data mining.

    Moreover, genetic algorithms have high degree of autonomy thatenables discovery of knowledge previously unknown by the user.

    13

  • 8/3/2019 Data Mining Using Genetic Algorithm

    14/24

    Functions of Genetic Algorithm

    The Fitness Function

    The fitness score is returned as a result

    Parent Selection

    Mating Pool Crossover

    Likelihood of crossover being applied is typically between 0.6and 1.0.

    Mutation

    Mutation is applied to each child individually after crossover. Itrandomly alters each gene with a small probability (typically0.001).

    14

  • 8/3/2019 Data Mining Using Genetic Algorithm

    15/24

    Pseudo Code of GA[3]

    15

  • 8/3/2019 Data Mining Using Genetic Algorithm

    16/24

    Applications of GA

    Domain Application TypesControl gas pipeline, pole balancing, missile evasion, pursuit

    Design semiconductor layout, aircraft design, keyboardconfiguration, communication networks

    Scheduling manufacturing, facility scheduling, resource allocation

    Robotics trajectory planning

    Machine Learning designing neural networks, improving classificationalgorithms, classifier systems

    Signal Processing filter design

    Game Playing poker, checkers, prisoners dilemma

    CombinatorialOptimization

    set covering, travelling salesman, routing, bin packing,graph colouring and partitioning

  • 8/3/2019 Data Mining Using Genetic Algorithm

    17/24

    Advantages and Disadvantages

    Advantages:

    Concept is easy to understand

    Modular, separate from application

    It doesnt have to know any rules of the problem in advance.

    This is very useful for very complex and loosely defined

    problem.

    With a well defined fitness function and carefully chosen

    attributes, genetic algorithm can perform much faster than

    other algorithm such as the linear method.

    17

  • 8/3/2019 Data Mining Using Genetic Algorithm

    18/24

    Conti

    Disadvantages:- The definition of the fitness function can be very complicated

    sometime.

    The fitness function may affect the performance of the process

    significantly if the complexity of the fitness function increase.

    It is because the fitness function is used to compare every

    element in the sample population to every data in the training

    data set.

    Sometimes an acceptable solution cannot be derived even aftercountless iteration if the genetic operators are wrongly chosen.

    18

  • 8/3/2019 Data Mining Using Genetic Algorithm

    19/24

    The ToolMATLAB[4]

    MATLABMatrix Laboratory

    MATLAB is a high-performance language for technical

    computing. It integrates computation, visualization and

    programming in an easy-to-use environment where problems

    and solutions are expressed in familiar mathematical notation.

    Simulink -

    Simulink is an interactive environment for modeling,simulating, and analyzing dynamic, multi domain systems. It

    lets you build a block diagram, simulate the systems behavior,

    evaluate its performance, and refine the design.

    19

  • 8/3/2019 Data Mining Using Genetic Algorithm

    20/24

    Typical Uses Of Matlab

    Math and computation

    Algorithm development

    Data acquisition

    Modeling, simulation, and prototyping Data analysis, exploration, and visualization

    Scientific and engineering graphics

    Application development, including graphical user interface

    building

    20

  • 8/3/2019 Data Mining Using Genetic Algorithm

    21/24

  • 8/3/2019 Data Mining Using Genetic Algorithm

    22/24

  • 8/3/2019 Data Mining Using Genetic Algorithm

    23/24

    References

    1. Jiawei Han and Micheline Kamber, Data Mining: Concepts

    and Techniques, 2006

    2. http://www.obitko.com/tutorials/genetic-algorithms/index.php

    3. David Beasley et. al. (1993). An Overview of Genetic

    Algorithms: Part 1, Fundamentals, University Computing,

    vol.15 (2), pp. 58-69.

    4. Learning MATLAB, COPYRIGHT 1984 - 2004 by The

    MathWorks, Inc.

    23

  • 8/3/2019 Data Mining Using Genetic Algorithm

    24/24

    Thank You

    any

    question or suggestion

    24