65
CMPS 441 Artificial Intelligence Notes 0.0 Course Information Primary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky Secondary textbook: Machine Learning; Tom M. Mitchell 1.1 Quizzes, Assignments, Exams, Projects, Etc. There will be 3 quizzes on the following material: 1. Overview and Classic AI 2. Statistical and Fuzzy Systems 3. Neural Networks and Evolutionary Systems The final exam will have two parts. If you have a 93% average on your work prior to the final, your final will have one part, otherwise 2 parts. There will several homework assignments, and 3 or more programming projects. If you do all your work on time and study for the quizzes you should do well. 1.2 Grading The grade will be based on a weighted average. Quizzes and the final will comprise 60% of the grade, homework, programs and projects fill out the remaining points. 1

CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

CMPS 441 Artificial Intelligence Notes

0.0 Course Information

Primary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky

Secondary textbook: Machine Learning; Tom M. Mitchell

1.1 Quizzes, Assignments, Exams, Projects, Etc.

There will be 3 quizzes on the following material:

1. Overview and Classic AI 2. Statistical and Fuzzy Systems3. Neural Networks and Evolutionary Systems

The final exam will have two parts. If you have a 93% average on your work prior to the final, your final will have one part, otherwise 2 parts.

There will several homework assignments, and 3 or more programming projects.

If you do all your work on time and study for the quizzes you should do well.

1.2 Grading

The grade will be based on a weighted average. Quizzes and the final will comprise 60% of the grade, homework, programs and projects fill out the remaining points.

A university grade scale will be used:

A 90 – 100 B 80 – 89C 70 – 79D 60 – 69F < 60

There will little to no curving done. Only in certain cases, for example if a student has a 79.5 average, and did all their work on time, they are likely to be bumped from a C to a B. If on the other hand a student has a 79.8 average, smoked the tests, but did not turn in all of their assignments, they will receive a C.

1

Page 2: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

On another note, quality of work will also be taken into consideration. An A will be awarded for work that it of excellent quality and fulfills all requirements. If work is turned in that is of excellent quality and goes beyond the requirements this will be noted and will definitely be taken into account in the grading process.

1.3 MiscellaneousLearning is like many other endeavors in that practice and repetition aid in the process. This class will present several topics. It is important to be clear on the concepts that are being taught and it is equally important to put those concepts to practical use through various homework assignments and projects. The homework assignments and projects help by tying together the concepts learned in the classroom in a practical manner.

For an average student, an easy rule of thumb to ensure an “A” is that for every hour spent in class, 3 hours should be spent outside of class. So for this class, spend about 9 hours a week outside of class and you should get an “A”. If it takes time, that is almost better. Difficult material will help you learn strategies to deal with complex material. Many times gifted students have real trouble when they finally run into material that is hard for them. This is because they have never experienced cases in which they don’t what to do, so they have a difficult time developing a strategy to find answers.

1.0 IntroductionRead chapter 1

Objective: Gain an overview of knowledge about the field.

Assignments: Page 21, Questions 2, 4, 5, 6, 7, 8, 10, 11

2.1 Definitions: Intelligence – the ability to understand and learn things.

o Understand – analyze o Learn – create a behavior in order to cope with a new situation.

Intelligence – the ability to think and understand instead of doing things by instinct or automatically.

o The second of these two definitions imply that there is local adaptation and learning occurring rather than preprogrammed or generational learning.

o Many behaviors that bugs, animals, and people exhibit are inherited. For example, deer and horses begin to walk within a short time from being born.

Intelligence – The ability to learn and understand, to solve problems and to make decisions.

2

Page 3: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Thinking is the activity of using your brain to consider a problem or to create an idea. (page 1)

o Can something without a brain think?o Does something have to be alive to think?o If something thinks, is it alive?

2.2 History: Alan Turing – Alan Turing, “Computing Machinery and Intelligence”, 1950.

o Proposed the concept of a universal machine (Turing Machine)o Helped break codes in WWII.o Designed “Automatic Computing Engine”o Wrote the first program that could play chess.o Some key questions that he asked – still relevant today

Is there thought without experience? Is there intelligence with life?

o Turing game – Can a computer communicate with a person so well that the person cannot tell whom they are talking with, computer or person?

Turing thought that by the year 2000 computers would pass this test.

Early History (1943 – 1956)o Warren McCulloch and Walter Pitts designed a model of human nerves in

which each neuron was either in an on state or off state. They showed that there model was equivalent to a Turing machine and showed some basic structures.

o Claude Shannon showed the need for heuristics in order to solve complex problems.

Heuristic – (book definition) A strategy that can be applied to complex problems; it usually - but not always – yields a correct solution. Heuristics, which are developed from years of experience, are often used to reduce complex problem solving to more simple operations based on judgement. Heuristics are often expressed as rules of thumb.

What does this definition imply in terms of the solution space?

What does it imply in terms of solution optimality. How does this compare with a search algorithm or greedy

algorithm? Heuristic search – A technique that applies heuristics to guide the

reasoning and thus reduce the search space for a solution. Algorithm – (American Heritage Dictionary) A rule or procedure

for solving a problem. What does this definition imply?

o Under defined circumstances, an algorithm will find a correct solution.

3

Page 4: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Optimal Solution – The best solution to a problem. Many algorithms that find optimal solutions have to check every solution in some way. When the solution space is large, this can be prohibitive.

Claude Shannon’s 1950 paper on chess playing programs pointed out the number that a typical chess game had about 10120 possible moves.

How long would it take a computer to pick the first move if it could evaluate 1 move in every time cycle and the computer were a 10 gigahertz (109 evaluation cycles per second) machine?

o Other notable scientists: John McCarthy John Von Neumann Marvin Minsky

Middle History (Great expectations)o John McCarthy develops LISPo Marvin Minsky focuses on formal logico McCulloch and Pitts neural networks further developed by Rosenblatt

(perceptrons)o Approach to solving problems:

General methods (Weak methods) Reality sets in (late 60’s and early 70’s) Expert systems show success – contrast with weak methods. Neural networks rebirth Evolutionary Computation

2.3 Summary1. What is Artificial Intelligence – AI is the study of making machines think.2. Two main branches of AI

a. Classic AI techniques - rooted in heuristics, symbolic computing and expert systems.

i. Chess playing – Alan Turingii. Mycin, Prospector – expert systems developed at Stanford

iii. Symbolic processing - Deterministic searching of solution spaces.b. Machine Learning techniques – numerically based, neural networks, non-

deterministic solution space searching (genetic algorithm, simulated annealing.

i. McCulloch/Pitts neuronsii. Non-deterministic searching – genetic algorithm, simulated

annealing, evolutionary programming.

4

Page 5: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

2.0 LISP1. LISP

o Second high level language developed. FORTRAN was the first.o Program and data take the same form.

Functions:

(car <list>)o (car (1 2 3)) => 1

(cdr <list>)o (cdr (1 2 3)) => (2 3)o (cdr (1)) => NIL

(list <element1 element2 element3 ….. elementN>) o (setf fullhouse (list 2 2 3 3 3)))

(setf <variable [list or atom]> <value>)o (setf eat 8)o (setf eatthis (list pie cake))

(length <list>)o (length fullhouse) => 5

(defun <function name> <arg list> (exp1) (exp2) (exp3) ….(expN) ) (cond ( (condition 1) (exp1)..(expN)) ( (condition 2) (exp1)..(expN)) (t (exp1)..

(expN)) ) (append (list1) (list2))

o (append (list 1 2 3) (list 4 5)) => (1 2 3 4 5) (cons (list1) (list2))

o (cons (list 1 2 3) (list 4 5)) => ((1 2 3) 4 5) (eval (list))

o (eval (car (1 2 3)) => 1 (mapcar <function name> <list 1> <list 2> ….<list n>)

o (mapcar 'car '((1 a) (2 b) (3 c))) => (1 2 3) o (mapcar 'abs '(3 -4 2 -5 -6)) => (3 4 2 5 6)o (mapcar 'cons '(a b c) '(1 2 3)) => ((A . 1) (B . 2) (C . 3))

Examples:

(defun init ()(setf big (list 1 2 3 4 5 6 7 8 9 10))(setf fullhouse (list 2 2 2 3 3))

)

(defun lastone (list)(cond

(( = (length list) 1)(first list)

)( t

5

Page 6: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

(print list) (lastone (cdr list))

) ))

(defun sumList(addlist)(cond

((= (length addlist) 1)(car addlist)

)(t

(+ (car addlist)(sumList(cdr addlist))))

))

(defun average (list)(/ (sumList list) (length list))

)

(defun middleNum (nums)(setf mid (/ (length nums) 2))(getNum mid nums)

)

(defun getNum (mid nums)(cond

((= 1 mid)(car nums)

)(t

(getNum (- mid 1) (cdr nums)))

))

(defun insultMe (message)(append (list 'Yo 'Mamma) message)

)

Link to LISP Helphttp://www.lisp.org/HyperSpec/FrontMatter/Chapter-Index.html

6

Page 7: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

; Homework assigment 2; Lisp review; Fall 2007

(setf big '(1 2 3))

(defun revMe (rlist)(cond

((= (length rlist) 1)rlist

)( t

(append (revMe (cdr rlist)) (list (car rlist))))

))

; (1 2 3); (2 3); (3); (append (3) (list 2)) => (3 2); (append (3 2) (list 1)) = > (3 2 1)

(defun countMe (rlist)(cond

((equal NIL (cdr rlist))1

)(t

(+ 1 (countMe (cdr rlist))))

))

(defun fact (num)(cond

((= 1 num)num

)(t

(* (fact (- num 1)) num))

))

7

Page 8: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

(defun funTheList (op nums)(setf comList (append (list op) nums))(print comList)(eval comList)

)

8

Page 9: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

3.0 Classic AI Systems (Rule-Based Systems)

While important to our study, we don’t want to spend the whole semester on this topic. We will concentrate on rule-based reasoning and do some LISP.

Here is a link to some LISP literature from Wikipedia:

http://en.wikipedia.org/wiki/Lisp_programming_language#Syntax_and_semantics

Early methods were based on providing the computer general methods of problems solving broad classes of problems. The computer searched for solutions using these general methods. This approach is referred to as a “weak method”. They applied “weak” as in non-problem specific information to a task domain. The results were not satisfactory.

Domain specific systems showed success so we will talk about those.

Summary – Weak methods relied upon general problem solving techniques and little to no a-priori domain specific information. This approach has shown merit for more trivial problems, but has not been shown to be scaleable to larger more complex problems.

Expert systems operate with an abundance of domain specific knowledge and are very narrow in their abilities and are quite inflexible. They have had good success in several areas, such as medicine and prospecting.

Key points of interest:

2. Knowledge (page 25)o What is it?o Where does it come fromo How do we represent it?

Production rules (if – then statements)3. Structure of AI program (page 31)

9

Page 10: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

The production rules are rules of thumb that we obtained through research; reading books and papers; experimentation; asking experts what they would do in a variety of situations.

The short-term memory consists of a set of facts that describes the current problem to be solved.

The reasoning function acts as an inference engine, user interface, and a tracing mechanism that can be used to explain the programs line of reasoning.

Inference engine – matches facts in the short-term memory against the rules in the long-term memory.

User interface – communicates to the user. Typically its job would be collect facts from the user and store them in the short-term memory. When it draws conclusions from its rules, these may also be stored as facts. If the inference engine comes to impasse before drawing a conclusion, it may direct the user interface to ask the user for more information so it can continue.

Explanation facility – tracks the rule/fact matching process so that the system can show its line of reasoning.

Other modules may include: External databases Models/programs Developer interface

10

Page 11: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

The key to the functionality of the system is the inference engine. The diagram below shows the inference engine operational procedure.

4. Forward chaining – The accumulation of facts to support a conclusion. We gather information and then infer whatever we can from it. Data driven reasoning.

Database Facts:

{a, b, c, d, e}

Rules:

/* Rule 1. */If (y == TRUE) && (d == TRUE) {

Z = TRUE;}

/* Rule 2. */If (x == TRUE) && (b == TRUE) && (e == TRUE) {

Y = TRUE;}

11

Page 12: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

/* Rule 3. */If (a == TRUE) {

X = TRUE;}

/* Rule 4. */If (c == TRUE) {

L = TRUE;}

/* Rule 5. */If (l == TRUE) && (M == TRUE) {

N = TRUE;}

Step 1:Facts = {a, b, c, d, e}Rule 3 fires => Facts = {a, b, c, d, e, x}

Step 2:Facts = {a, b, c, d, e, x}Rule 4 fires => Facts = {a, b, c, d, e, x, l}

Step 3:Facts = {a, b, c, d, e, x, l}Rule 2 fires => Facts = {a, b, c, d, e, x, l, y}

Step 4:Facts = {a, b, c, d, e, x, l, y}Rule 1 fires => Facts = {a, b, c, d, e, x, l, y, z}

Here our rule search pattern always picked up from the last rule that fired and continued down the list.

When hit the bottom of the list we started at the top again.

What would happen if we always started at rule 1 and searched downward.

o Would the results be different?o Would the path of reasoning be different?o Does it matter?

12

Page 13: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

2. Backward chaining – Given a likely conclusion, can we find the facts to support it. Goal driven reasoning.

Database Facts:

{a, b, c, d, e}

Rules:

/* Rule 1. */If (y == TRUE) && (d == TRUE) {

Z = TRUE;}

/* Rule 2. */If (x == TRUE) && (b == TRUE) && (e == TRUE) {

Y = TRUE;}

/* Rule 3. */If (a == TRUE) {

X = TRUE;}

/* Rule 4. */If (c == TRUE) {

L = TRUE;}

/* Rule 5. */If (l == TRUE) && (M == TRUE) {

N = TRUE;}

Step 1:Facts = {a, b, c, d, e}Goal = z; Rule 1 => z ; sub-goal required => y

Step 2:Facts = {a, b, c, d, e}Goal = y; Rule 2 => y ; sub-goals required => x, b, e; b and e exist, x is required

Step 3:Facts = {a, b, c, d, e}Goal = x; Rule 3 => x ; sub-goals required => none

13

Page 14: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Now we start forward chaining process to confirm that the goal z is supportable with the existing fact set.

Step 4:Facts = {a, b, c, d, e}Rule 3 fires => Facts = {a, b, c, d, e, x}

Step 5:Facts = {a, b, c, d, e, x}Rule 2 fires => Facts = {a, b, c, d, e, x, y}

Step 6:Facts = {a, b, c, d, e, x, y}Rule 1 fires => Facts = {a, b, c, d, e, x, y, z}

4.0 Statistical Based Systems

Here we will introduce probability, conditional probability, and Bayesian reasoning. We will relate it back to section 3.

4.1 Introduction to Probability

The science of odds and counting.

Flip a coin. What is the chance of it landing heads side up? There are 2 sides, heads and tails. The heads side is one of them, thus

P(Heads) = Heads side Heads side + Tails side

P(Heads) = 1/(1+1) = ½ = .50 = 50%

Throw a die. What is the probability of getting a 3.

P(die = 3) = 1 side with a 3 6 sides total

P(die = 3) = 1/6 = .167 = 16.7%

14

Page 15: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

4.2 Conditional Probability (page 59)Events A and B are events that are not mutually exclusive, but occur conditionally on the occurrence of one another.

The probability that event A will occur:

p(A)

The probability that event B will occur:

p(B)

The number of times that both A and B occur or the probability that both events A and B will occur is called the joint probability.

15

Page 16: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Mathematically joint probability is defined as:

p(A B)

i.e. the probability that both A and B will occur

The probability that event A will occur if event B occurs is called conditional probability.

P(A|B) = the number of times A and B can occur the number of times B can occur

or

P(A|B) = P(A and B) P(B)

or

P(A|B) = P(A B) P(B)

Lets take an example.

16

Page 17: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Suppose we have 2 dice and we want to know what the probability of getting an 8 is. Normally if we roll both at the same time the probability is 5/36.

1,1 1,2 1,3 1,4 1,5 1,62,1 2,2 2,3 2,4 2,5 2,63,1 3,2 3,3 3,4 3,5 3,64,1 4,2 4,3 4,4 4,5 4,65,1 5,2 5,3 5,4 5,5 5,66,1 6,2 6,3 6,4 6,5 6,6

But what happens if we roll the first die and get a 5, now what is the probability of getting an 8?

There is only one way to get an 8 after a 5 has been rolled. You have to roll a 3.

Looking at the formula:

P(A|B) = P(A B) P(B)

Lets rephrase it for our problem:

P(getting an 8 using 2 die | given that we roll a 5 with first dice) = P(rolling 5 and 3) P(rolling a 5)

P(A|B) = (1/36) / (1/6) = (1/36) * (6/1) = 6/36 = 1/6

So the chances improve slightly 1/6 > 5/36 by only 1/36.

Why do they improve, but only slightly?

An intuitive explanation is that the only way to not be able to get an 8 using two dice rolled in sequence (one then the other) is if the first die is a 1. The chance of that is only 1/6. So it would not really matter what the first die is, as long as is not a 1, you can still get an 8 with the combination of the two.

17

Page 18: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

4.3 Bayes RuleBayes theorem is an algebraic rewriting of conditional probability. The explanation from Negnevitsky follows (page 59) :

Starting with conditional probability:

P(A|B) = P(A B) P(B)

P(A B) = P(A|B) * P(B)

Recognizing that the intersection of A and B is communitve:

P(A B) = P(B A)

And that the following is true:

P(B|A) = P(B A) P(A)

P(B A) = P(B|A) * P(A)

Remembering that the intersection is communitve we can form the following equation:

P(A B) = P(B|A) * P(A)

Now this can be substituted into the conditional probability equation:

P(A|B) = P(A B) P(B)

Yielding Bayes theorem:

P(A|B) = P(B|A) * P(A) P(B)

18

Page 19: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

We like to rewrite it using h and D, h being a hypotheses that we are testing and D being data that we want to support our hypotheses. Bayes theorem:

p(h|D) = p(D|h) * p(h)p(D)

Machine Learning, Mitchell, page 156 gives some intuitive meaning to the equations on pages 57 – 60 of Negnevitsky.

P(h|D) - the probability that hypotheses h is true given the data D. Often referred to as the posterior probability. It reflects our confidence that h is

true after we have seen the data D. The posterior probability reflects the influence of the data D while the prior

probability P(h) does not, it is independent of D.

P(D|h) – The probability that data D exists in when hypotheses h is true. Think of h as the answer to what happened in a murder case. One way to think of

this is that if h is true, what is the probability that the data (evidence) D will exist. D is the evidence that backs up the case, it is what proves that h is true. In general P(x|y) denotes the probability of x given y. It is the conditional

probability as discussed above.

P(h) – initial probability that hypotheses h holds, before we have observed the data.

Often called the prior probability of h. It will reflect any background knowledge that we have about the correctness of hypothesis h.

If we have no initial knowledge about the hypotheses (h0…..hn), we would divide the probability equally among the set of available hypotheses (in which h is a member).

P(D) – The prior probability that the data D will be observed (this the probability of D given no prior knowledge that h will hold). Remember that it is completely independent of h.

Looking at the equation we can make some observations:

P(h|D), the probability of h being true given the presence of D increases with commonness of h being true independently.

P(h|D), the probability of h being true given the presence of D increases with the likelihood of data D being associated with hypotheses h. That is the higher our confidence is in saying that data D is present only when h is true, the more we can say for our hypothesis depends on D.

19

Page 20: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

When p(D) is high that means our evidence is likely to exist independently of h, so it weakens the link between h and D.

4.4 Bayes Rule in AI

Brute Force Bayes Algorithm

Here is a basic Bayes rule decision algorithm:

Initialize H a set of hypothesis such that h0……hn H For each hi in H calculate the posterior probability

p(hi|D) = p(D|hi) * p(hi)

p(D) Output MAX(h0……hn)

Naïve Bayes Classifier

This classifier applies to tasks in which each example is described by a conjunction of attributes and the target value f(x) can take any value from the set of v.

A set of training examples for f(x) is provided.

In this example we want to use Bayes theorem to find out the likelihood of playing tennis for a given set weather attributes.

f(x) v = (yes, no) i.e. v = (yes we will play tennis, no we will not play tennis)

The attribute values are a0…a3 = (Outlook, Temperature, Humidity, and Wind).

20

Page 21: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

To determine our answer (if we are going to play tennis given a certain set of conditions) we make an expression that determines the probability based on our training examples from the table.

Day Outlook Temperature Humidity Wind Play Tennis1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Normal Weak Yes6 Rain Cool Normal Strong No7 Overcast Cool Normal Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Normal Weak Yes10 Rain Mild Normal Weak Yes11 Sunny Mild Normal Strong Yes12 Overcast Mild High Strong Yes13 Overcast Hot Normal Weak Yes14 Rain Mild High Strong No

Remembering Bayes Rule:

p(h|D) = p(D|h) * p(h)p(D)

We write our f(x) in that form:

P(Play Tennis | Attributes) = P(Attributes | Play Tennis) * P(Play Tennis) P(Attributes)

Or

P(v|a) = P(a|v) * P(v) P(a)

Lets look closely at P(a|v)

P(a|v) = P(a0…a3 | v0,1)

Or

P(a|v) = P(Outlook, Temperature, Humidity, Wind | Play tennis, Don’t Play tennis)

21

Page 22: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

In order to get a table with reliable measurements every combination of each attribute a0…a3 for each hypotheses v0,1 our table would have be of size 3*3*2*2*2 = 72 and each combination would have to be observed multiple times to ensure its reliability. Why, because we are assuming an inter-dependence of the attributes (probably a good assumption). The Naïve Bayes classifier is based on simplifying this assumption. That is to say, cool temperature is completely independent of it being sunny and so on.

So :

P(a0…a3 | vj=0,1) P(a0|v0) * P(a1|v0) * P(an|v0) P(a0|v1) * P(a1|v1) * P(an|v1)

or

P(a0…an | vj) i P(ai | vj)

A more concrete example:

P(outlook = sunny, temperature = cool, humidity = normal, wind = strong | Play tennis)

P(outlook = sunny | Play tennis) * P(temperature = cool | Play tennis) *P(humidity = normal | Play tennis) * P(wind = strong | Play tennis)

The probability of observing P(a0…an | vj) is equal the product of probabilities of observing the individual attributes. Quite an assumption.

Using the table of 14 examples we can calculate our overall probabilities and conditional probabilities.

First we estimate the probability of playing tennis: P(Play Tennis = Yes) = 9/14 = .64

P(Play Tennis = No) = 5/14 = .36

Then we estimate the conditional probabilities of the individual attributes. Remember this is the step in which we are assuming that the attributes are independent of each other:

Outlook:P(Outlook = Sunny | Play Tennis = Yes) = 2/9 = .22P(Outlook = Sunny | Play Tennis = No) = 3/5 = .6

P(Outlook = Overcast | Play Tennis = Yes) = 4/9 = .44

22

Page 23: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

P(Outlook = Overcast | Play Tennis = No) = 0/5 = 0

P(Outlook = Rain | Play Tennis = Yes) = 3/9 = .33P(Outlook = Rain | Play Tennis = No) = 2/5 = .4

TemperatureP(Temperature = Hot | Play Tennis = Yes) = 2/9 = .22P(Temperature = Hot | Play Tennis = No) = 2/5 = .40

P(Temperature = Mild | Play Tennis = Yes) = 4/9 = .44P(Temperature = Mild | Play Tennis = No) = 2/5 = .40

P(Temperature = Cool | Play Tennis = Yes) = 3/9 = .33P(Temperature = Cool | Play Tennis = No) = 1/5 = .20

HumidityP(Humidity = Hi | Play Tennis = Yes) = 3/9 = .33P(Humidity = Hi | Play Tennis = No) = 4/5 = .80

P(Humidity = Normal | Play Tennis = Yes) = 6/9 = .66P(Humidity = Normal | Play Tennis = No) = 1/5 = .20

WindP(Wind = Weak | Play Tennis = Yes) = 6/9 = .66P(Wind = Weak | Play Tennis = No) = 2/5 = .40

P(Wind = Strong | Play Tennis = Yes) = 3/9 = .33P(Wind = Strong | Play Tennis = No) = 3/5 = .60

Suppose the day is described by :

a = (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong)

What would our Naïve Bayes classifier predict in terms of playing tennis on a day like this?

Which ever equation has the higher probability (greater numerical value)

P(Playtennis = Yes | (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong))

Or

P(Playtennis = No | (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong))

23

Page 24: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Is the prediction of the Naïve Bayes Classifier. (for brevity we have omitted the attribute names)

Working through the first equation….

P(Yes|(sunny, cool, high, strong)) = P((sunny, cool, high, strong) | Yes) * P(Yes) P(sunny, cool, high, strong)

Now we do the independent substitution for:

P((sunny…..)|Yes)

And noting that the denominator, P(sunny, cool, high, strong), includes both:

P((sunny, cool, high, strong) | Yes) and

P((sunny, cool, high, strong) | No)

cases, our equation expands to:

= P(sunny|Yes)*P(cool|Yes)*P(high|Yes)*P(strong|Yes) * P(yes) P((sunny, cool, high, strong) | Yes) + P((sunny, cool, high, strong) | No)

Remember the quantities in the denominator are expanded using the independent assumption in a similar way that the first term in the numerator.

= (.22 * .33 * .33 * .33) * .64 (.22 * .33 * .33 * .33)*64 + (.6 * .2 * .8 * .6)*.36

= .0051 .0051 + .0207

= .1977

Working through the second equation in a similar fashion….

P(No|(sunny, cool, high, strong)) = P((sunny, cool, high, strong) | No) * P(No) P(sunny, cool, high, strong)

= .0207 .0051 + .0207

= .8023

24

Page 25: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

As we can see, the Bayes Naïve classifier gives a value of just about 20% for playing tennis in the described conditions, and value of 80% for not playing tennis in these conditions, therefore the prediction is that no tennis will be played if the day is like these conditions.

5.0 Fuzzy SystemsChapter 4 in the book – page 87.

Statements and Definitions:

Father of fuzzy logic is Lotfi Zadeh The word fuzzy is the opposite of crisp. In our conventional programming logic

we use Boolean logic. Boolean logic uses sharp distinctions. (Yes/no, true/false, etc.)

o Question with crisp answer: Q: Do you want a donut? A: Yeso Fuzzy answer to same question: Q: Do you want a donut? A: Sort of.

Fuzzy logic is not logic that is fuzzy, but logic that is used to describe fuzziness. It is the theory of fuzzy sets, sets that describe fuzziness.

Fuzzy logic is useful to us because it helps us quantify and describe the words that people use when describing and thinking about problems and situations. Examples:

o The car is really quick.o Tom’s back is quite hairy.o The motor is running really hot.

Considering the examples above, using Boolean logic we would need to define a point at which a car is really quick, but where is that.

25

Page 26: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

A Camaro is considered really quick to many people. If it runs a 14.1 second quarter mile are all cars that are slower than Camaros really slow?

How many hairs does someone need to have on their back for it to be quite hairy? If Tom’s back has 2500 hairs, and Mary’s has 2200, is Mary lucky because she does not have a hairy back?

If we define really hot as 245 degree’s or higher, is it normal for my car to run at 243 degrees?

As can be seen from the example above, a car is either quick, or slow.

26

Page 27: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

5.1 Fundamentals of Fuzzy LogicThe basic idea of fuzzy set theory is that an element of a fuzzy set belongs with a certain degree of membership. Thus a proposition may be partly true or partly false. This degree is a real number between 0 to 1.

The universe of discourse is the range of all possible values applicable to a chosen variable. The classic example is the set of tall men. The elements of the set are all men, but they have varying degrees of membership based on their heights. The universe of discourse is their heights.

Degree of membershipName Height (cm) Crisp FuzzyChris 208 1 1.0Mark 205 1 1.0John 198 1 .98Tom 181 1 .82David 179 0 .78Mike 172 0 .24Bob 167 0 .15Steven 158 0 .06Bill 155 0 .01Peter 152 0 .00

Below we graph the degree of membership versus height for both Crisp and Fuzzy sets.

27

Page 28: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

5.2 Set definitionsCrisp

Let X be the universe of discourse. x is a member of X Let A be a crisp set. The membership function fA(x) determines if x is a member of A.

fA(x) : X → 0, 1

where

fA(x) = 1, if x is a member of A, 0 if x is not a member of A.

Fuzzy

Let X be the universe of discourse. x is a member of X Let A be a fuzzy set. The membership function µA(x) determines the degree of membership element x

has in fuzzy set A.

µA(x) : X → [ 0, 1]

µA(x) = 1 if x is totally in A;

28

Page 29: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

µA(x) = 1 if x is not in A;0 < µA(x) < 1 if x is partly in A.

5.3 Defining membership functionsSo the key here is how to define the membership function µA(x)

Lets take another example:

Universe of discourse X – men’s heights

X = {x1, x2, x3, x4, x5}

A is a crisp subset of X such that A = { x2, x3 }

Membership to A can be described by :

A = {(x1, 0), (x2, 1), (x3, 1), (x4, 0), (x5, 0)}

That is A is a set of pairs {(xi, µA(xi))} where µA(xi) is the membership function of element xi in the subset of A.

29

Page 30: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Representation of Fuzzy sets:

A = { (x1, µA(x1), (x2, µA(x2), ………(xn, µA(xn)}

Or

A = { µA(x1)/ x1, µA(x2)/ x2,………… µA(xn)/ xn}

The function µA is the function that determines an elements degree of membership in the set. Typical functions are Gaussian, Sigmoid, etc. Linear functions are used commonly, they take less computation time, thus the linear fit function. A fit vector can be used. Below are the fit vectors for the ‘short, average, tall men’ graph.

For the above example the fit vectors are:

Tall men = { 0/180, .5/185, 1/190} = {0/180, 1/190}

Short men = {1/160, .5/165, 0/170} = {1/160, 0/170}

Average men = {0/165, 1/175, 0/185}

30

Page 31: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

5.4 Linguistic Variables and HedgesExample –

John is tall.

The linguistic variable John takes the linguistic value tall.

We can use these variables in fuzzy expert systems.

IF wind is strongTHEN sailing is good

IF John is tallTHEN John runs fast

We use hedges (very, somewhat, quite, more or less, etc) to modify our linguistic variables.

tall        

height lineara little (µA(x)^1.3)

Very (µA(x)^2)

more or less sqrt(µA(x))

         170 0 0 0 0180 0 0 0 0181 0.1 0.050119 0.01 0.316228182 0.2 0.123407 0.04 0.447214183 0.3 0.209054 0.09 0.547723184 0.4 0.303863 0.16 0.632456185 0.5 0.406126 0.25 0.707107186 0.6 0.51475 0.36 0.774597187 0.7 0.628966 0.49 0.83666188 0.8 0.748199 0.64 0.894427189 0.9 0.871998 0.81 0.948683190 1 1 1 1200 1 1 1 1

31

Page 32: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Linguistic varialbles and hedges

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

170 175 180 185 190 195 200

height

mem

bers

hip tall (linear)

tall (a little)

tall (very)tall (more or less)

To calculate these variable values use the line equation to find the linear function:

y = m*x + b

m = (y1 – y0)/(x1 – x0)

b = y0 – m*x0

Then do the operation that the hedge calls for.

5.45Operations on Fuzzy sets – see page 97

5.5 Putting it all together

Fuzzy Rules:IF x is ATHEN y is B

x, y : linguistic variables (crisp)

A, B : linguistic values determined by fuzzy sets on the universe of discourses X, and Y.

32

Page 33: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Example:

IF speed over speed limit is somewhat fastTHEN ticket is very expensive

speed over speed limit (somewhat fast)

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80

speed

ticket expense (very expensive)

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120 140 160

ticket expense

Let speed over speed limit = 30mph

That gives a degree of membership of .77 in the somewhat fast set.

33

Page 34: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Then using the .77 degree of membership in the somewhat fast set maps to a ticket costing between 80 and 100 dollars in the ticket very expensive set.

speed over fast 0 fast 1  ticket cost expensive

expensive 2

0 0 0   20 0.2 0.045 0.1 0.316228   40 0.4 0.16

10 0.2 0.447214   60 0.6 0.3615 0.3 0.547723   80 0.8 0.6420 0.4 0.632456   100 1 125 0.5 0.707107   120 1 130 0.6 0.774597   140 1 135 0.7 0.83666        40 0.8 0.894427        45 0.9 0.948683        50 1 1        55 1 1        60 1 1        65 1 1        70 1 1        

             

To find the exact value of the ticket, you would need to interpolate between 80 and 100 dollars.

You need to make sure that you can do this. What are the fit vectors for these two sets?

How does it all work?

Rule 1:IF project_funding is adequateOR project_staffing is smallTHEN risk is low

Rule 2:IF project_funding is marginalAND project_staffing is largeTHEN risk is normal

Rule 3:IF project_funding is inadequateTHEN risk is high

Rule 1:IF x is A3OR y is B1

34

Page 35: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

THEN z is C1

Rule 2:IF x is A2AND y is B2THEN z is C2

Rule 3IF x is A1THEN z is C3

funding       staffing     risk    percent adequate low high percent small large percent low high normal

0 0 1 0 0 1 0 0 1 0 010 0 1 0 10 1 0 10 1 0 020 0 1 0 20 1 0 20 0.66 0 030 0 0.8 0 30 0.8 0 30 0.33 0 0.3340 0.33 0.6 0 40 0.6 0.2 40 0 0 0.6750 0.67 0.4 0 50 0.4 0.4 50 0 0 160 1 0.2 0 60 0.2 0.6 60 0 0 0.6770 0.67 0 0.25 70 0 0.8 70 0 0.33 0.3380 0.33 0 0.5 80 0 1 80 0 0.66 090 0 0 0.75 90 0 1 90 0 1 0

100 0 0 1 100 0 1 100 0 1 0

funding

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120

percent of funding

degr

ee o

f meb

ersh

ip

adequate

low

high

35

Page 36: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

staffing

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120

percent staffing

degr

ee o

f mem

bers

hip

small

large

risk

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120

percent risk

degr

ee o

f mem

bers

hip

low

high

normal

5.5 Mamdani-style inference

1. Fuzzification2. Rule Evaluation3. Aggregation of rule outputs4. Defuzzificaiton

36

Page 37: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Example:

Step : Fuzzify inputsLet project funding = 35%

µ(inadequate_funding) = µ(A1) = .5;

µ(marginal_funding) = µ(A2) = .2;

µ(adequate_funding) = µ(A3) = 0.0;

Let project staffing = 60%

µ(staffing_small) = µ(B1) = .1;

µ(staffing_large) = µ(B2) = .7;

Step 2: Evaluate the rules

When we evaluate the rules, we must evaluate each all if parts (antecedents) and find values for the then parts (consequents). For conjunctions like AND and OR we do operations like min and max.

Rule 1:(A3 OR B1) ≈ max(0, .1) = .1

Mapping .1 to the risk_low set we get:

C1 = .1

Rule 2:(A2 AND B2) ≈ min(.2, .7) = .2

Mapping .2 to the risk_normal set we get:

C2 = .2

Rule 3:A1 = .5

C3 = .5

37

Page 38: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Step 3: Aggregation of consequentsIn this step we unify the outputs of all the rules.

In our example we clipped the top off each of our 3 risk sets, so now we sum these areas into one resultant area. See the bottom of figure 4.10 on page 108.

Step 4: Defuzzification In order to turn our resultant fuzzy sum (we summed the clipped fuzzy sets from each of our rules in 1 big fuzzy resultant set) we find the center of gravity (COG) of the set.

What we mean by center of gravity is that we want to locate the point at where we can slice the set into two equal pieces.

We can use the following formula to get an estimate of the COG.

COG ≈ ∑ab µA(x)x / ∑a

b µA(x)

µA(x) the consequent of each of our rulesx the x axis value of the setb the upper range of the x domain of the seta the lower range of the x domain of the set

x y1 x y29.99 0 30.001 0

10 0.1 30.001 0.320 0.1 40 0.330 0.1 50 0.3

30.001 0 60 0.3    60.001 0

Center gravity test

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 10 20 30 40 50 60 70membership

y1y2

38

Page 39: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

For our example problem:

COG = (0 + 10 + 20)*.1 + (30 + 40 + 50 + 60)*.2 + (70 + 80 + 90 + 100)*.5 .1 + .1 + .1 + .2 + .2 + .2 + .2 + .5 + .5 + .5 +.5

= 3 + 36 + 170 3.1

= 209/3.1 = 67.4%

So the aggregate risk is 67.4%

Assignment questions – page 1261, 2, 3, 4, 5, 6, 7, 8, 11

6.0 Evolutionary Systems/Non Deterministic Searching

Chapter 7, page 219

Now we change our tact from use of a-priori knowledge to one of adaptation.

Evolutionary Computation consists of Genetic Algorithm – Computer optimization based on Darwinian processes in

biology. Evolutionary strategies – Similar to Genetic Algorithm but uses statistically

generated offsets to vary solution. Genetic Programming – Use of Genetic algorithm to generate code.

Based on: Darwin’s classical theory of evolution Weismanns’s theory of natural selection Mendals concept of genetics

Process Reproduction Mutation Competition Selection

Species undergo various pressures: Environmental Competition for resources among:

39

Page 40: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

o Themselveso Other species

Dangers from other species

Topics include:

Genetic algorithm Genetic programming Simulated annealing

6.1 Basic Genetic algorithm

40

Page 41: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Pseudo Code

#define popSize 100#define chromoSize 20#define numGens 100#define MU 10

char population[popSize][chromoSize]char newPop[popSize][chromoSize]

int scores[popSize]

void main(){

int j, kint mom, dad;

/* Set up random number generator. */srand(time);

Generate Population(popSize)

for j = 1 to numGensrankPop(popSize, chromoSize, "hello tv land!)

sortLo2Hi(popSize)

for k = 1 to popSizekid = kmom = getMom(popSize)dad = getDad(popSize)crossOver(mom, dad, kid, chromoSize)mutate(MU, chromoSize, kid)

}population = newPop

}}

void Create Chromosome (int chromoIndex, int chromoSize){

range = high limit - lower limit

for j = 1 to chromoSizechromo[chromoIndex][j] = rand()%range + lower limit

41

Page 42: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

end for}

void Generate Population (int popSize)

{for j = 1, popSize

population [j] = Create a chromosomeend for

}

int Get fitness (int chromoIndex, int chromoSize, char fitString){

int j;int sum;

sum = 0;for j = 1 to chromoSize

sum = sum + abs(fitString[j] - population[chromoIndex][j]) end for

return sum;}

void rankPop(int popSize, int chromoSize, char *string){

int j;

for j = 1 to popSizescores[j] = Get fitness (j, chromoSize, string);

end for}

void sortLo2Hi(int popSize){

int j,k

for j = 1 to popSizefor k = j + 1 to popSize

if (scores[j] < scores[k]) {exchange(scores[j], scores[k])exchange(population[j], population[k])

42

Page 43: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

end ifend for k

end for j}

int getMom(int popSize){

top = .3 * popSize

mom = rand() % top

return mom;}

int getPop(int popSize){

getMom(popSize)}

void crossOver(int mom, int dad, int kid, int chromoSize){

int cross, j;

cross = rand() % chromoSize

for j = 1 to crossnewPop[kid][j] =population[mom][j]

end j

for j = cross to chromoSizenewPop[kid][j] = population[dad][j]

end j}

void mutate(int chance, int chromosize, int kid){

int muGene;

if (rand() % 100 < chance) muGene = rand() % chromoSize.......Alter newPop[kid][muGene] ......

end if}

43

Page 44: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Example – Spell “hello tv land”Generations: 100Population: 100. Mutation rate: 1

hello tv land D≤N{╫WnX(╬q?I generation 0; best score 608.000000 'M[ª}ëïï‼→Çg generation 1; best score 460.000000 ÇÇsÇÇÇÇÇÇÇÇg generation 2; best score 431.000000 ÇÇsÇÇ►ïì‼▬Çj generation 3; best score 355.000000 fÇÇÇÇ►ïì‼▬Çj generation 4; best score 342.000000 fÇÇÇÇ►|Ç~|Çgk► generation 5; best score 320.000000 `ÇÇÇÇ►ïì‼|~gk► generation 6; best score 259.000000 fÇÇÇê►ïÇ‼|sgh► generation 7; best score 246.000000 fÇÇÇÇ►|ì‼|}gh► generation 8; best score 246.000000 fÇÇÇÇ►|Ç‼|~gk► generation 9; best score 237.000000 SÇÇÇÇ►ïÇ‼u~gh► generation 10; best score 223.000000 SpÇÇê►ïÇ▬|sgh☼ generation 11; best score 209.000000 SÇ⌂ÇÇ►|Ç‼|sgh► generation 12; best score 203.000000 SpÇyÇ►ïÇ‼usgh► generation 13; best score 189.000000 SpÇyÇ►|Ç‼ufgh► generation 14; best score 161.000000 RiÇÇÇ►|Ç‼ufgh► generation 15; best score 160.000000 SpÇyy►|Ç‼ufgh► generation 16; best score 154.000000 SpÇyÇ►|x‼ufgh► generation 17; best score 153.000000 SiÇwÇ►|z‼ufgh► generation 18; best score 146.000000 SiÇwÇ►|x‼ufgh► generation 19; best score 144.000000 RiÇyÇ►|x‼ufgh↔ generation 20; best score 132.000000 RhÇly►|Ç ufgh► generation 21; best score 119.000000 ShÇjy►|v‼ofgh↔ generation 22; best score 106.000000 ShÇjy►|v‼ofgh↔ generation 23; best score 106.000000 RhÇly►|z ufgh↔ generation 24; best score 100.000000 Oisy|∟yz ufgh↔ generation 25; best score 86.000000 Oisjy∟yz ufgh↔ generation 26; best score 72.000000 Oisjy∟yz ofgh↔ generation 27; best score 66.000000 Oisjy∟yv ufjh▲ generation 28; best score 64.000000 Oisjy∟yv ufjh▲ generation 29; best score 64.000000 Oiljy∟yv ufjh▲ generation 30; best score 57.000000 Khljy∟|v u`gh▲ generation 31; best score 54.000000 Khljy∟|v u`jh▲ generation 32; best score 51.000000 Ohljy∟yv u`kh generation 33; best score 49.000000 Khlky∟zv ufmh generation 34; best score 47.000000 Khljy∟yv s`kh generation 35; best score 43.000000 Kills∟yv u`kh▲ generation 36; best score 40.000000 Khljl∟yv u`lh generation 37; best score 37.000000 Kilks∟yv qclh generation 38; best score 35.000000 Kdlls∟yv s`mh generation 39; best score 31.000000

44

Page 45: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Kdlls∟yv h`mh generation 40; best score 28.000000 Kdlls▼yv g`mh generation 41; best score 26.000000 Kdllo▼yt q`mh generation 42; best score 24.000000 Kdllo▼yv q`mh generation 43; best score 22.000000 Hdllo▼yt h`mh generation 44; best score 20.000000 Hdllo▼yv g`me generation 45; best score 16.000000 Hdllo▼yv i`me generation 46; best score 14.000000 Hdllo▼yv j`me generation 47; best score 13.000000 Heljo!uv j`me generation 48; best score 10.000000 Hello!uv j`me generation 49; best score 8.000000 Hello!uv j`me generation 50; best score 8.000000 Hello▼sv l`me generation 51; best score 6.000000 Hello▼uv m`me generation 52; best score 7.000000 Hello▼uv m`me generation 53; best score 7.000000 Hello!tv m`me generation 54; best score 6.000000 Hello!uv m`md generation 55; best score 6.000000 Hello!tv m`md generation 56; best score 5.000000 Hello!tv mamd generation 57; best score 4.000000 Hello tv m`md generation 58; best score 4.000000 Hello!tv lamd generation 59; best score 3.000000 Hello tv lamd generation 60; best score 2.000000 Hello tv lamd generation 61; best score 2.000000 Hello tv lamd generation 62; best score 2.000000 Hello tv lamd generation 63; best score 2.000000 Hello tv lamd generation 64; best score 2.000000 Hello tv lamd generation 65; best score 2.000000 Hello tv land generation 66; best score 1.000000 Hello tv land generation 67; best score 1.000000 Hello tv lamd generation 68; best score 2.000000 Hello tv lamd generation 69; best score 2.000000 Hello tv lamd generation 70; best score 2.000000 Hello tv lamd generation 71; best score 2.000000 Hello tv lamd generation 72; best score 2.000000 Hello tv lamd generation 73; best score 2.000000 Hello tv lamd generation 74; best score 2.000000 Hello tv lamd generation 75; best score 2.000000 Hello tv land generation 76; best score 1.000000 Hello tv land generation 77; best score 1.000000 Hello tv land generation 78; best score 1.000000 Hello tv land generation 79; best score 1.000000 Hello tv land generation 80; best score 1.000000 Hello tv land generation 81; best score 1.000000 Hello tv land generation 82; best score 1.000000 Hello tv land generation 83; best score 1.000000 Hello tv land generation 84; best score 1.000000 Hello tv land generation 85; best score 1.000000

45

Page 46: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

Hello tv land generation 86; best score 1.000000 Hello tv land generation 87; best score 1.000000 Hello tv land generation 88; best score 1.000000 Hello tv land! generation 89; best score 0.000000 Hello tv land! generation 90; best score 0.000000 Hello tv land! generation 91; best score 0.000000 Hello tv land! generation 92; best score 0.000000 Hello tv land! generation 93; best score 0.000000 Hello tv land! generation 94; best score 0.000000 Hello tv land! generation 95; best score 0.000000 Hello tv land! generation 96; best score 0.000000 Hello tv land! generation 97; best score 0.000000 Hello tv land! generation 98; best score 0.000000 Hello tv land! generation 99; best score 0.000000Press any key to continue

6.2 Discussion of Parent Selection FunctionsWe talk about several methods of parent selection:

Elite Roulette Polygamy Dog and Cat Alien

6.3 Evolutionary Strategies (page 242)Similar to GA.

Major differences:

Uses only mutation Population consists of 1 individual, many parameters.

46

Page 47: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

6.4 Simulated Annealing

47

Page 48: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

6.5 Genetic Programming

6.6 Fitness Functions

7.0 Neural Networks

Book Sections: 6.1 to 6.3; pages 163 to175

Brief intro to neural networkso Nerve cells transmit information to and from various parts of the body.

48

Page 49: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

o As creatures get more complex and thing called cephalization occurs. This is when nerves tend to lead towards a central point.

o The more advanced creatures are, the higher the degree of cephalization. o A little biology

Nerve cell

o Artificial Neural Network

o Structure of ANN

49

Page 50: CMPS 441 Artificial Intelligence Notes€¦  · Web viewPrimary textbook: Artificial Intelligence A guide to Intelligent Systems; Second Edition; Michael Negnevitsky. Secondary textbook:

o Supervised learning PERCEPTRON

y = x1* w1 + x2* w2….. xn* wn

e(p) = y(d) – y(p); p = 1,2, 3, 4,…. wi(p+1) = wi(p) + α* xi(p) * e(p)

PERCEPTRON training algorithm

1. Create training set, each example needs to have a desired output matched to it.

2. Initialization (set weights to random numbers (-1 to 1)3. For each training example

a. Present to perceptronb. Calculate yc. Execute transfer functiond. Adjust weights.

50