Estimation Of Distribution Algorithm based on Markov Random Fields

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Estimation Of Distribution Algorithm based on Markov Random Fields. Siddhartha Shakya School Of Computing The Robert Gordon University. Outline. From GAs to EDAs Probabilistic Graphical Models in EDAs Bayesian networks Markov Random Fields - PowerPoint PPT Presentation

Text of Estimation Of Distribution Algorithm based on Markov Random Fields

  • Estimation Of Distribution Algorithm based on Markov Random FieldsSiddhartha ShakyaSchool Of ComputingThe Robert Gordon University

    Siddhartha Shakya

  • OutlineFrom GAs to EDAsProbabilistic Graphical Models in EDAsBayesian networksMarkov Random FieldsFitness modelling approach to estimating and sampling MRF in EDAGibbs distribution, energy function and modelling the fitnessEstimating parameters (Fitness modelling approach)Sampling MRF (several different approaches)Conclusion

    Siddhartha Shakya

  • Genetic Algorithms (GAs)Population based optimisation techniqueBased on Darwin's theory of EvolutionA solution is encoded as a set of symbols known as chromosomeA population of solution is generatedGenetic operators are then applied to the population to get next generation that replaces the parent population

    Siddhartha Shakya

  • Simple GA simulation

    Siddhartha Shakya

  • GA to EDA

    Siddhartha Shakya

  • Simple EDA simulation0.50.51.01.00.500111111011010101111

    Siddhartha Shakya

  • Joint Probability Distribution (JPD)Solution as a set of random variables

    Joint probability Distribution (JPD)

    Exponential to the number of variables, therefore not feasible to calculate in most casesNeeds Simplification!!

    Siddhartha Shakya

  • Factorisation of JPDUnivariate model: No interaction: Simplest model

    Bivariate model: Pair-wise interaction

    Multivariate Model: interaction of more than two variables

    Siddhartha Shakya

  • Typical estimation and sampling of JPD in EDAs Learn the interaction between variables in the solutionLearn the probabilities associated with interacting variablesThis specifies the JPD: p(x)Sample the JPD (i.e. learned probabilities)

    Siddhartha Shakya

  • Probabilistic Graphical ModelsEfficient tool to represent the factorisation of JPDMarriage between probability theory and Graph theoryConsist of Two componentsStructureParametersTwo types of PGMDirected PGM (Bayesian Networks)Undirected PGM (Markov Random Field)

    Siddhartha Shakya

  • Directed PGM (Bayesian networks)Structure: Directed Acyclic Graph (DAG)

    Independence relationship:A variable is conditionally independent of rest of the variables given its parents

    Parameters: Conditional probabilities

    Siddhartha Shakya

  • Bayesian networksThe factorisation of JPD encoded in terms of conditional probabilities is

    JPD for BN

    Siddhartha Shakya

  • Estimating a Bayesian networkEstimate structure

    Estimate parameters

    This completely specifies the JPD

    JPD can then be Sampled

    Siddhartha Shakya

  • BN based EDAsInitialise parent solutionsSelect a set from parent solutionsEstimate a BN from selected setEstimate structure Estimate parametersSample BN to generate new populationReplace parents with new set and go to 2 until termination criteria satisfies

    Siddhartha Shakya

  • How to estimate and sample BN in EDAsEstimating structureScore + Search techniques Conditional independence test Estimating parametersTrivial in EDAs: Dataset is completeEstimate probabilities of parents before childSamplingProbabilistic Logical Sampling (Sample parents before child)

    Siddhartha Shakya

  • BN based EDAsWell established approach in EDAsBOA, EBNA, LFDA, MIMIC, COMIT, BMDAReferencesLarraiaga and Lozano 2002Pelikan 2002

    Siddhartha Shakya

  • Markov Random Fields (MRF)Structure: Undirected Graph

    Local independence: A variable is conditionally independent of rest of the variables given its neighbours

    Global Independence: Two sets of variables are conditionally independent to each other if there is a third set that separates them.

    Parameters: potential functions defined on the cliques

    X1X3X2X4X6X5

    Siddhartha Shakya

  • Markov Random FieldThe factorisation of JPD encoded in terms of potential function over maximal cliques is

    JPD for MRF

    Siddhartha Shakya

  • Estimating a Markov Random fieldEstimate structure from data

    Estimate parametersRequires potential functions to be numerically defined

    This completely specifies the JPD

    JPD can then be SampledNo specific order (not a DAG) so a bit problematic

    Siddhartha Shakya

  • MRF in EDAHas recently been proposed as a estimation of distribution technique in EDAShakya et al 2004, 2005Santana et el 2003, 2005

    Siddhartha Shakya

  • MRF based EDAInitialise parent solutionsSelect a set from parent solutionsEstimate a MRF from selected setEstimate structureEstimate parameters Sample MRF to generate new populationReplace parent with new solutions and go to 2 until termination criteria satisfies

    Siddhartha Shakya

  • How to estimate and sample MRF in EDALearning StructureConditional Independence test (MN-EDA, MN-FDA)Linkage detection algorithm (LDFA)Learning ParameterJunction tree approach (FDA)Junction graph approach (MN-FDA)Kikuchi approximation approach (MN-EDA)Fitness modelling approach (DEUM)SamplingProbabilistic Logic Sampling (FDA, MN-FDA)Probability vector approach (DEUMpv)Direct sampling of Gibbs distribution (DEUMd)Metropolis sampler (Is-DEUMm)Gibbs Sampler (Is-DEUMg, MN-EDA)

    Siddhartha Shakya

  • Fitness modelling approachHamersley Clifford theorem: JPD for any MRF follows Gibbs distribution

    Energy of Gibbs distribution in terms of potential functions over the cliques

    Assuming probability of solution is proportional to its fitness:

    From (a) and (b) a Model of fitness function - MRF fitness model (MFM) is derived

    Siddhartha Shakya

  • MRF fitness Model (MFM)Properties:Completely specifies the JPD for MRFNegative relationship between fitness and Energy i.e. Minimising energy = maximise fitnessTask:Need to find the structure for MRFNeed to numerically define clique potential function

    Siddhartha Shakya

  • MRF Fitness Model (MFM)Let us start with simplest model: univariate model this eliminates structure learning :)

    For univariate model there will be n singleton clique

    For each singleton clique assign a potential function

    Corresponding MFM

    In terms of Gibbs distribution

    Siddhartha Shakya

  • Estimating MRF parameters using MFMEach chromosome gives us a linear equationApplying it to a set of selected solution gives us a system of linear equationsSolving it will give us the approximation to the MRF parametersKnowing MRF parameters completely specifies JPDNext step is to sample the JPD

    Siddhartha Shakya

  • General DEUM frameworkDistribution Estimation Using MRF algorithm (DEUM)Initialise parent population PSelect set D from P (can use D=P !!)Build a MFM and fit to D to estimate MRF parametersSample MRF to generate new populationReplace P with new population and go to 2 until termination criterion satisfies

    Siddhartha Shakya

  • How to sample MRFProbability vector approachDirect Sampling of Gibbs DistributionMetropolis samplingGibbs sampling

    Siddhartha Shakya

  • Probability vector approach to sample MRFMinimise U(x) to maximise f(x)To minimise U(x) Each ixi should be minimumThis suggests: if i is negative then corresponding xi should be positiveWe could get an optimum chromosome for the current population just by looking on However not always the current population contains enough information to generate optimum We look on sign of each i to update a vector of probability

    Siddhartha Shakya

  • DEUM with probability vector (DEUMpv)

    Siddhartha Shakya

  • Updating RuleUses sign of a MRF parameter to direct search towards favouring value of respective variable that minimises energy U(x)Learning rate controls convergence

    Siddhartha Shakya

  • Simulation of DEUMpv011111010100101010000.40.60.60.60.60.50.50.50.50.5432101111101014321

    Siddhartha Shakya

  • ResultsOneMax Problem

    Siddhartha Shakya

  • ResultsF6 function optimisation

    Siddhartha Shakya

  • ResultsTrap 5 functionDeceptive problemNo solution found

    Siddhartha Shakya

  • Sampling MRFProbability vector approachDirect sampling of Gibbs distributionMetropolis samplingGibbs sampling

    Siddhartha Shakya

  • Direct Sampling of Gibbs distributionIn the probability vector approach, only the sign of MRF parameters has been used

    However, one could directly sample from the Gibbs distribution and make use of the values of MRF parameters

    Also could use the temperature coefficient to manipulate the probabilities

    Siddhartha Shakya

  • Direct Sampling of Gibbs distribution

    Siddhartha Shakya

  • Direct Sampling of Gibbs distributionThe temperature coefficient has an important roleDecreasing T will cool probability to either 1 or 0 depending upon sign and value of alphaThis forms the basis for the DEUM based on direct sampling of Gibbs distribution (DEUMd)

    Siddhartha Shakya

  • DEUM with direct sampling (DEUMd)1. Generate initial population, P, of size M 2. Select the N fittest solutions, N M3. Calculate MRF parameters 4. Generate M new solutions by sampling univariate distribution

    5. Replace P by new population and go to 2 until complete

    Siddhartha Shakya

  • DEUMd simulation011111011101101010104432

    Siddhartha Shakya

  • Experimental resultsOneMax Problem

    Siddhartha Shakya

  • F6 function optimization

    Siddhartha Shakya

  • Plateau Problem (n=180)

    Siddhartha Shakya

  • Checker Board Problem (n=100)

    Siddhartha Shakya

  • Trap function of order 5 (n=60)

    Siddhartha Shakya

  • Ex