Click here to load reader

The Implementation of Machine Learning in the Game of Checkers

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

The Implementation of Machine Learning in the Game of Checkers. Billy Melicher Computer Systems lab 08-09. Abstract. Machine learning uses past information to predict future states Can be used in any situation where the past will predict the future Will adapt to situations. Introduction. - PowerPoint PPT Presentation

Text of The Implementation of Machine Learning in the Game of Checkers

  • The Implementation of Machine Learning in the Game of Checkers

    Billy Melicher

    Computer Systems lab 08-09

    *

    *

  • Abstract

    Machine learning uses past information to predict future statesCan be used in any situation where the past will predict the futureWill adapt to situations

    *

    *

  • Introduction

    Checkers is used to explore machine learning Checkers has many tactical aspects that make it good for studying

    *

    *

  • Background

    MinimaxHeuristics Learning

    *

    *

  • Minimax

    Method of adversarial searchEvery pattern(board) can be given a fitness value(heuristic)Each player chooses the outcome that is best for them from the choices they have

    *

    *

  • Minimax

    Chart from wikipedia

    *

    *

  • Minimax

    Has exponential growth rateCan only evaluate a certain number of actions into the future ply

    *

    *

  • Heuristic

    Heuristics predict out come of a boardFitness value of board, higher value, better outcomeNot perfectRequires expertise in the situation to create

    *

    *

  • Heuristics

    H(s) = c0F0(s) + c1F1(s) + + cnFn(s)H(s) = heuristicHas many different terms In checkers terms could be:

    Number of checkers

    Number of kings

    Number of checkers on an edge

    How far checkers are on board

    *

    *

  • Learning by Rote

    Stores every game playedConnects the moves made for each boardRelates the moves made from a particular board to the outcome of the boardMore likely to make moves that result in a win, less likely to make moves resulting in a lossGood in end game, not as good in mid game

    *

    *

  • How I store data

    I convert each checker board into a 32 digit base 5 number where each digit corresponds to a playable square and each number corresponds to what occupies that square.

    *

  • Learning by Generalization

    Uses a heuristic function to guide movesChanges the heuristic function after games based on the outcomeGood in mid game but not as good in early and end gamesRequires identifying the features that affect game

    *

    *

  • Development

    Use of minimax algorithm with alpha beta pruningUse of both learning by Rote and GeneralizationTemporal difference learning

    *

    *

  • Temporal Difference Learning

    In temporal difference learning, you adjust the heuristic based on the difference between the heuristic at one time and at anotherEquilibrium moves toward ideal functionU(s)
  • Temporal Difference Learning

    No proof that prediction closer to the end of the game will be better but common sense says it isChanges heuristic so that it better predicts the value of all boardsAdjusts the weights of the heuristic

    *

  • Alpha Value

    The alpha value decreases the change of the heuristic based on how much data you haveDecreasing returnsNecessary for ensuring rare occurrences do not change heuristic too much

    *

  • Development

    Equation for learning applied to each weight:w=(previous-current)(previous+current/2)Equation for alpha value:a=50/(49+n)
  • Results

    Value of weight reaches equilibriumChanges to reflect the learning of the programOccasionally requires programmer intervention when it reaches a false equilibrium

    *

  • Results

    *

  • Results

    *

  • Results

    *

  • Results

    Learning by rote requires a large data setRequires large amounts of memoryNecessary for determining alpha value in temporal difference learning

    *

  • Conclusions

    Good way to find equilibrium weightSometimes requires interventionDoesn't require much memorySubstantial learning could be achieved with relativelly few runsLearning did not require the program to know strategies but does require it to play towards a win

Search related