132
Artificial Intelligence UNIT I CO.1 Explain the concept behind problem representation paradigms & its characteristics, production system and defining problem as a state space representation. 1.1 WHAT IS AI? HISTORY & APPLICATIONS 1.2 ARTIFICIAL INTELLIGENCE AS REPRESENTATION & SEARCH AI Problems

Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Artificial IntelligenceUNIT I

CO.1 Explain the concept behind problem representation paradigms & its characteristics, production system and defining problem as a state space representation.

1.1 WHAT IS AI? HISTORY & APPLICATIONS

1.2 ARTIFICIAL INTELLIGENCE AS REPRESENTATION & SEARCH

AI Problems

AI is the study to make computers do things which at the moment, people do better fails to

include some areas of potentially very large impact, namely problems that cannot now be solved

well by either computers or people .

Page 2: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

AI Problems

Samuel – Wrote a checkers playing program that not only played games with opponents but

also used its experience at those games to improve its later performance.

Computers could perform well at those tasks simply by being fast at exploring a large number

of solution paths & then selecting the best one. it was thought that this process required very little

knowledge & could therefore be programmed easily later this assumption turned out to be false

since no computer is fast enough to overcome the combinational explosions generated by most

problems .

Common sense reasoning: - reasoning of objects, Sequence of actions and consequences. Ex

if you let go of something, it will fall to the floor and may be break.

GPS- General Problem Solver applied to several common sense task Newell, Shaw & Simon.

Perception – Vision, Speech

Natural Language – Understanding, Generation testing

AI Flourishing most as a Practical discipline as opposed to a purely research are primarily the

domains that require only specialized expertise without the assistance commonsense knowledge.

Before embarking on a study of special AI problems & solutions techniques. It is to discuss if not to

answer following :

1) What are our underlying assumption @

2) What kinds of techniques will be useful for solving AI problems?

3) At what level of details if at all we trying to model human intelligence?

4) How will we know when we have succeed in building an intelligent program?

Page 3: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Task Domain of AI

Mundane Tasks Formal Task Expert Task

1) Perception

- Vision

- Speech

2) Natural language

Problem of understanding

- Spoken language is a

perceptual problems

– Generation

- Translation

3) Commonsense reasoning

4) Robot Control

1) Games

- Chess

- Backgammon

- Chekers

- Go

2) Mathematic

- Geometry

- Logic

- Integral calculus

- Providing ppts

1) Engineering - Design- Fault finding- Manufacturing

planning

2) Scientific Analysis3) Medical Diagnosis4) Financial Analysis

The Underlying Assumption:-

Physical Symbol system hypothesis:

- Newell & Simon – describe the physical symbol system as Consist of a set of entities called a

symbol – expressions/patterns another type of entity- symbol structure – composed of no. of

instances/ token related besides these structures, the system also contain a collection of process that

operate on expressions to produce other expressions. Process of creation, modification,

reproduction & destructions it is a machine that produces a through time an evolving collection of

symbol structures.

The physical symbol system hypothesis:-

A physical symbol system has the necessary & sufficient means for general intelligent action

by experimentation hypothesis is only a hypothesis. No way of providing or disprove it on logical

grounds we may find that it is false – bulk of evidence – true but only way to determine its truth is

by experimentation.

Computer- provides the perfect medium for this experimentation since intelligence –

programs to perform selected task such as people.

As it has become increasingly easy to build computing machines so it has become

increasingly possible to conduct empirical investigations of the physical symbol system hypothesis.

Page 4: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Attempt to reduce a particularly human activity the understanding of jokes, to a process of

symbol manipulation is provided in the book mathematics and humor.

Physical symbol system will prove able to model some aspects of human intelligence & nor

others.

Physical symbol system by hypothesis is two fold.

What is an AI technique?

AI problems spans a very broad spec are there any techniques that are appropriate for the

solution of a variety of these problem besides the fact that they manipulate symbol.

How-technique is useful in solving AI tasks.

Intelligence requires knowledge, knowledge possess some properties

It is voluminous.

It is hard to characterize accurately.

It is constantly changing.

It differs from data by being organized in a way that corresponds to the ways it will be used.

For ad to conclude that AI technique is a method that exploits knowledge that should be represented

in such a way that.

Knowledge captures generalizations.

Situation that share important properties are grouped together else more & updating will be

required who must provide it.

It can easily be modified to correct errors & to reflect changes in the world & in our world

view.

It can be used in a great many situation even if it is not totally accurate or complete.

It can be used to help overcome its own sheer _ _ _ _ _ _ _ _ _ _ _ __ _ _ __ .

The range of possibilities that must usually be considered.

There is some degree of independence between problems & problems solving techniques.

Problems & series of approaches for solving each of them.

Ex: Tic tac – toe

Their complexity

Their use of generalizations.

The clarity of their knowledge.

The extensibility of their approach.

Representation of what is called AI techniques.

1 2 3

Page 5: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

4 5 6

7 8 9

O – blank – 2

1 – x – 3

2 – 0 – 5

Question Answering

Three important AI techniques.

Search : Frame work in which any direct techniques can be embedded

Use of knowledge.

Abstraction :- Provides away of separating important features & variations from the many

unimportant ones that would otherwise overwhelm any process.

It is not possible give a precise definition of an AI techniques.

The level of the model

What we are trying to do our goal to make intelligent things

Modeling human intelligence

Of easiest way to do things

Syllabus dictionary phrases computers can do non AI problems.

2nd Class of problems.Model human intelligence commuters learn newspaper & answer the Q’s.

2) To enable computers to understand human reasoning.

3) To enable people to understanding computer reasoning.

4) To exploit what knowledge we can learn from people. Clues how to proceed.

5) To test psychological theories of human performance behavior of a paranoid person on a system

perminal.

6) Level of individual newrons.

Human cognitive theories.

Goal of simulating human performance.

Goal of building an intelligent program.

In either way we need a good model of the processes involved in intelligent reasoning.

Field of cognitive science in which psychologists linguistics & computer scientists work together.

Page 6: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Criteria for success.How well we knowledge have succeeded.

What is intelligence?

Alan Turing proposed the followed method of determining whether machine can think.

Turing Test.

2 – people & the machine to be evaluated.

Ask Question by typing

Interrogator ----------------------------- person/computer.

Separate Room knows only A & B.

Aims to determine which is the person & which is the machine.

Goal of machine is to fool the interrogator into believing that it is the person. If machine succeed

them conclude machine thinks.

Chess rating in a same way as human

DENDRAL program to analyzes organize compounds to determine their structure programs that

meets some performance standard for a particular to criteria for success.

1.3 PRODUCTION SYSTEM, BASICS OF PROBLEM SOLVING

*Production system:-

Structure the AI Problems in a way that facilitates describing & performing the search process.

Production system provides such structures a set of rules each consisting of left side – that

determines the applicability of the rule & a right side – that describes the operation to be performed

Knowledge/ database – information must be structured in any appropriate way.

Control strategy: - to match to database & resolve conflict when several a rule applier

Family of general production system interpreters. Basic production system language, such as ops5 *

Act. More complex hybrid system expert system shells knowledge based expert system.

General problem solving architectures like SOAR, a system based on a specific set of cognitively

motivated hypothesis about the nature of problem solving the process of solving the problem can

useful is modeled as a production system.

Control strategies

The first requirement of a good control strategy is that it causes motion.

Ex Water jug problem indefinitely filling the 4 gallon jug.

The second requirement of a good control strategy is that it be systematic.

Page 7: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Strategy causes motion but it is likely to arrive at the same state several times during the process –

exploring a useless sequence of operators several times before we finally find a solution.

To build a system to solve a particular problem – 4 things.

1) Define the problem precisely. Initial situations, Final situations constitute acceptable solution.

2) Analyze the problem.

Important features

3) Isolate & represent the task knowledge.

4) Choose best problem solving technique & apply it them.

Problem classification: -

Generic ctrl strategy i.e. appropriate for solving a problem.

Production system characteristics:-

Diff classes of problems production system are a good way to describe the operations that can be

performed in a search for a solution to a problem.

Two Q’s

1. can production system like problems, be described by the set of characteristics that shed some

light on how they can easily be implemented

2. if so what relationship are there between problems types & the types of production systems

best suited to solving the problems?

Monotonic Production system A.P.S in which the application of a rule never prevents the later

application of another rule that could also have been applied at the time of the list rule was selected.

A non monotonic production system is one in which this is not true.

A partially commutative p.s is a p.s with the property that if the application of a particular

sequence of rules transforms state X into state Y then any permutation of these rules i.e. allowable

also transforms state x into state y

A communicative production system is a p.s that is both monotonic & partially communicative

kinds of problems & kind of production system all problems can solved by all kinds of system.

Monotonic Non Monotonic

Partially Commutative Theorem providing Robot

Navigation

Non Partially commutative Chemical synthesis Bridge.

Pcom- monot Ignorable pbms

Implemented without the ability to backtrack.

Page 8: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Non Monot – Partially commute – useful for problems in which changes occur but can be reversed

& in which order of operation is not critical for (North), south, east, west.

To execute the path is not important

8 puzzles, & block problems are partially commutative.

Non partially commutative – useful in which irreversible changes occur chemical reaction – order is

important in irreversible process it is particularly the first time .

1.4 EXAMPLE-WATER JUG PROBLEM

Page 9: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

1.5 PROBLEM REPRESENTATION PARADIGMS

Play chess specify

Starting position of the chess board.

The rulers that define the legal moves.

Board position that represent win implicit goal of not only playing but winning goal state

king is under attack.

8x* array.

Moves can be described as a set of rules two parts.

Left side pattern to be matched against the current board position.

Right side that describe the change to be made to the board position to reflect the move

separate rule roughly 10120 possible board position.

So many rules practical difficulties

No person could ever supply a complete set of such rules. It would take too long & could

certainly not be done without mistake.

No program could easily handle those rules more storing problems.

To minimize the such problems

Use some convenient notations for describing pattern & substitutions

EX:

white pawn at

Square (file e, rank 2)

And

Square (file e, rank 3)

Is empty move pawn from square (file e, rank 2) to square (file e, rank 4)

And

Page 10: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Square (file e, rank 4)

Is empty

Problems of playing chess as a problem of moving around in a state space where each state

corresponds to a legal position of the board. The state space representations form the basic of most

of the AI methods. Its structure corresponds to the structure of problem solving in two important

ways.

It allows for a formal definition of a problem as the need to convert some given situation into

some desired situation using on using a set of permissible operations.

Permit to define a process for solving a problem with a combination of techniques search

reach rule to find some path.

Search is a very important process in the solutions of hard problems for which no more direct

technique are available.

State space representation ex 4 gallon – 3 gallon no marker pumps how u can get exactly 2 gallons

of water into the 4 gallon jug.

State space can be described as the set of ordered pairs of integers(x,y) such that x=0,1,2,3,4 &

y=0,1,2,3 start state (0,0), goal(2,n).

Operators to be used to solve the problem left sides matched against the current state right side is

the change after applying control structure that loops certain rules several ways of making selection

If the 4 gallon jug is not full fill it (Stupid if the jug is full)

Ex Production rules for the water jug problem.

1) (x,y) --> (4,y) Fill the 4 gal jug

2) If x<4

3) (s,y) if x<3 -(x,3) fill the 3 gal jug

4) 3) (X, y) if x>0 (x-d, y) pour some water out of the 4 gal jug.

One solution to the water jug problem.

Rule 3, 4 never gets us any closer to a solution.

Rules that define problem & some knowledge about its solution.

Rule *4, 2) Solution is reached – rule 4

But empty 4 gal & transfer

Capture special ease knowledge

These rules cannot add power the system – (5 or 12) (9 or 11)

In fact depending upon the control strategy the use of rules degrade the performance

Operationalization: -

Page 11: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

1st step to solve a problem must be the creation of formal & manipulability description of the

problem itself, Write programs that produce such formal descriptions from informal ones.

Ex. What it means to understand an English sentence our goal is to solve difficult unstructured

problems.

Summarizing

1) Define a state space that contains all the possible configurations of the relevant objects ( &

perhaps some impossible ones)

2) Specify one or more states within that space initial states)

3) Goal states one or more solutions states(Acceptable)

4) Specify rules that describe the actions (operators)

Doing is will live to thought

* What unstated assumptions are present in the informal problem description?

*How general should the rules be?

How much of the work required solving the problem should be recomputed & represented in the

rules.

1.6 DEFINING PROBLEM AS A STATE SPACE REPRESENTATION

.

1.7 PROBLEM CHARACTERISTICS I

Problem Characteristics:-Heuristic search is a very general method applicable to a large class of problems in order to choose

the most appropriate method for a particular problem it is necessary to analyze the problem along

several key dimensions is the problem decomposable into a set of independent smaller or easier sub

problems?

Can solutions steps be ignored or at least undone if they prove unwise?

Is the problems universe predictable?

Page 12: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Is a good solution to the problem obvious without comparison to all other possible solution?

Is the desired solution. a state of the world or a path to a state?

Is a large amount of knowledge absolutely requires solving the problem, or is knowledge important

only to constrain the search?

Can a computer that is simply given the problem required to solve the problem, or is knowledge

important only to constrain the search?

Can a computer that is simply given the problem return the solution? Or will the solution. of the

problem require interaction between, the computer & a person.

1.6.1. Is the problem Decomposable?

Using this technique of problem decomposed we can of ten solve very large problems easily.

Ex: Blocks world problem.

ON (C,A) ON (B,C) and ON (A,B)

Put B on C

Move A to table Put A on B

A Proposed solution for the Blocks

1. Clear (x) (block x has nothing on it)

ON (x, table) [pick up x &put it on the table]

2. Clear (x) and clear (y) ON(x, y) [put x on y]

Goal are underlined states that have been achieved are not underlined.

C

A B

B

C

A

ON (B,C) and ON (A,B)

ON (A,B)ON (B,C)

ON (B,C) Clear (A) ON (A,B)

Clear (A) ON (A,B)

Page 13: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

B on C & A on B two separate problems

1. Given the start state putting B on C is simple

2. Second sub goal is not quite simple.

:: Only operators we have allow us to pick up single blocks at a time.

We have to clear of A by removing C before we can pick up A & put it on B.

Which can be done easily however if we now try to combine the two solutions into one solutions

we will fail. The two sub problems & not independent.

1.6.2. Can solution step be ignored or undone?

Theorem Proving – we proceed by first proving a lemma - that we think will be useful eventually

we realize that the lemma is no help at all. Are we in trouble? No. just proceeds from first with

everything true & in memory – lost only effort in moving the blind alley.

8 Puzzle –

In solving if we make a stupid move we change our mind & slide another slide as the space is

already occupied but we can back track & undo the first move.

Mistakes can be recovered but not as easily with Theorem proving. An additional step must

be performed to undo each incorrect step. Whereas no action was required to undo a useless lemma

control structure need top record all steps playing chess.

No backup – simply try to make the best of the current situation & go from there.

Three important classes of problems

Ignorable – (theorem proving) -- Simple ctrl structure

Recoverable (8 puzzle) solution steps can be undone. – Slightly more complicated ctrl

structure that sometime makes mistake.

Irrecoverable (chess) in which solution steps cannot be undone.—great deal of decision is

required.

1.6.3 – Is the universe predictable?

Certain outcome (8 puzzles) plans the game so as to minimize undo uncertain outcome. (Play

bridge) we plan by not with certainty we can know where all cards are & what other players do on

their terms.

Investigate several plans & use probability planning – problem solving without feedback from the

environment.

Planning revise plan uncertain outcome expensive

Ignorable verses recoverable verses irrecoverable uncertain outcome.

Page 14: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Playing bridge – fairly well available accurate estimates of plots.

Controlling a robot arm outcome is uncertain

Helping a lawyer decide how to defend his client against a murder charge.

Cannot even list all the possible outcomes.

1.8 PROBLEM CHARACTERISTICS II

1.6.4 – is a good solution absolute or relative?

Consider the problem of answering questions based on a database of simple facts.

1. Marcus was a man

2. Marcus was a Pompeian

3. Marcus was born in 4 A.D

4. All men are mortal

5. All Pompeian died when the volcano erupted in 79 A.D.

6. No mortal lives longer than 150 years

7. It is now 1991 A.D.

Is Marcus alive?

By representing these facts in a formal language, such as predicate logic & then using formal

influence methods.

OR

It is now 1991 A.D axiom 7

All Pompeian’s died in 79 A.D axiom 5

All Pompeian’s are dead now 7,5

Marcus was a Pompeian 2

Marcus is dead. 11,2

Path is not essential

Ex. Of Traveling salesman problem.

Path is essential to make shorter these two examples illustrate the difference between any path

problem & best path problem & computationally harder to solve.

1.6.5 Is the solution a state or a path?

Consider the problem of finding a consistent interpretation for the sentence.

“The bank president ate a dish of pasta salad with the fork.

Several components each of which in isolation may have more than one interpretation Ambiguity.

Bank financial institution, side of a river but only one has president.

Answer:1,4 (1,4) --- (8)3, 7 (3,7) --- (9)6, (8,6,9) --- (10) dead.

Page 15: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Dish object of the verb eat. It is possible that a dish was eaten. But more likely pasta salad is a salad

containing pasta meaning s can be formed from pairs of nouns.

Ex dog food does not normally contain dog “with the fork” could modify several parts of the

sentence. “With vegetables”, “with her friends”

Solution Only the interpretation itself is required

In contrast to water jug problem final state (2, 0)

So path is required not the final state.

Thus two problems. Natural language understands & water jug problem whose solution is a state of

them word & problems whose solution is a path to a state.

Problem states correspond to situation in the word not sequence of operations.

1.6.6 What is the role of knowledge?

Ex. Chess & news paper story difference in which a lot of knowledge is required.

1.6.7 Does the task require interaction with a person?

* Solitary no interaction data in soln out

*Conversational intermediate communication

Ex theorem providing.

UNIT II

CO.2 Analyse various AI search algorithms (uninformed, informed, heuristic, constraint satisfaction, best-first search, problem reduction

2.1 UNINFORMED SEARCH TECHNIQUES

Page 16: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Breadth first search Generate all the offspring of the roof by applying each of the applicable rules to the initial state for

each leaf node generate all its successors – continue until some rule produces a goal state.

Algorithm

1) Put the initial node on a list START

2) If (START is empty ) or (START =goal) search terminate

3) Remove the first node from START. Call this node A

4) If (A=Goal) terminate search with success

5) Else if node A has successive, generate all of them and add them at the tail of START

6) Go to step 2

Root

Page 17: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

A B C

D E F G H I J

Goal Node

Depth first search Goal is reached early if it is on the left hand side of the tree.

Algorithm

1) Put the initial node on a list START

2) If (START is empty) or (START=goal) terminate

3) Remove the first node from START, call this node

4) If(A=goal) terminate search with success

5) Else if node A has successors, generate all of them & add them at the beginning of START

6) Go to step 2.

Root

(0,0)

(4,0)

(4,3) (0,0) (1,3)

(0,3)

(4,3) (0,0) (3,0)

A

D E

B

F H

C

I JG

Page 18: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Goal

Advantage of the Depth – first search

Requires less memory – current paths node are stored. Breadth first search – all the tree generated

so far must be stored Depth first search may find soln. without examining much of the search space

at all in contrast with breadth first search in which all parts of the tree must be examined to level n

before any nodes on level n+1 can be examined. This is particularly significant if many acceptable

solutions exist

Depth first search can stop when one of them is found.

Advantage of the Breadth – first search

Breadth first search will not get trapped exploring a blind alley .Contrast to depth first search which

follows a single unfruitful path.

If there is a solution then Breadth first search is guaranteed to find it.

Longer path never explored until all shorter ones have already been examined. This contrast with

depth first search which find a long path to a solution in one part of the tree.

Ex – Traveling salesman problem:-

Combinatorial Explosion: - To spend more time then willing

If there are N cities then the number of different paths among them is 1.2 …. (N-1) or (N-1).

(0,0)

(4,0)

(4,3)

Page 19: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The time to examine a single path is proportional to N, so the total time required to perform this

search is proportional to N! – A Large number branch & bound technique explore shortest path –

exponential.

2.2 INFORMED HEURISTIC BASED SEARCH

Heuristic search:-

A heuristic is a technique that improves the efficiency of a search process possibly by sacrificing

claims of completeness heuristics are like tour guides – good to the extent that they point in

generally interesting directions; they are bad to the extent that they may miss points of interest to

particular individuals.

Nearest neighbors heuristic: - works by selecting locally superior alternation applying it to traveling

salesman problem.

1) Arbitrarily select a starting city.

2) To select the next city, look at the cities not yet visited, & select the one closet to the current

city go to it next.

3) Repeat step 2 until all cities have been executes in time proportional to N2 a Significant

improvement over N!

4) Error bounds.

Without heuristics –

Ready do we actually need the optimum solution: a good approximation will usually serve very

well? In fact there is some evidence that people when they solve problems are not optimizers but

rather are satisfiers.

Ex- Search for parking space.

Most people stop as soon as they find a fairly good space, even as soon ass they find a fairly good

space, even if there might be a slightly better space up ahead. Although the approximations

produced by heuristics may not be very good in the worst case worst cases rarely arise in the real

world.

Trying to understand why heuristic works, or why it doesn’t work, often leads to a deeper

understanding of the problem.

Page 20: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

*Problem solve: - How to solve a given problems use earlier solutions

Polya’s work serves as an excellent guide for people who want to become better problem solvers.

There are major ways in which domain specific, heuristic knowledge can be incorporated into a rule

themselves: ex. Rules in chess playing as a set of sensible moves as determined by the rule writer.

As a heuristic function that evaluates individual problem states & determines how desirable they

are.

AI study of techniques for solving exponentially hard problems in polynomial time by exploiting

knowledge about the problem domain.

Heuristic Search:

Heuristics are approximations used to minimize the searching process Problem for which no

exact algorithms are known & one needs to find an approximate & satisfying solutions. Problems

for which exact solutions are known but computationally infeasible.

Heuristics are numbers which guide the search process.

Following algorithms make use of heuristic evaluation functions

1) Hill Climbing 2) Constraint Satisfaction 3) Best First Search 4) AI Algorithms

5) AO Algorithms 5) Beam search

2.3 GENERATE AND TEST

Generate & Test:-

Algorithm:-

1. Generate a possible solution for some problem space for others it means generating a path

from a start state.

2. Test to see if this is actually a solution by comparing the chosen point or end points of chosen

path to the set of acceptable goal states.

3. Quit if solution is found else go to step.

Long time if the problem space is large is a depth first search since complete solutions must be

generated before testing.

Page 21: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Exhaustive search

Generating solution randomly but no guarantee solution will be found this form is also known as

British museum algorithm

Reference to find an object by searching randomly an object

Depth first three with backtracking

Traverse a graph rather than a tree.

2.4 HILL-CLIMBING

Hill Climbing –

Variant of generate & test in which feedback from the test procedure is used top help the generator

decide which direction to move in the search space. Test procedure responds to only yes/no but if

heuristic function is attached which provides an estimate of how close a given state is to a goal

state. Used when a good heuristic function is available.

Ex you are at unfamiliar city without a map & u wants to get down there. You simply aim for the

tall buildings: - the function is just the distance between the current location & the location of the

tall building & the desirable states are those in which the distance is minimized. Is a good solution

absolute or relative? Absolute solution exists; recognize a good state just by examining it. Relative

solution maximization or minimization (problem)

Algorithm:-

1. Evaluate the initial state if go at then return it & quit, else continue with initial state as the current

state.

2. Loop until a solution is found or until there are no new operators left to be applied in the current

state:-

a. Select an operator that has not yet been applied to the current state & apply it to produce a

new state.

b. Evaluate new state

i. If it is a goal state, then return it & quit

ii. If it is not a goal state but it is better than the current state, then make if the current state,

iii. If it is not better than the current state, then continue in the loop.

Page 22: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Difference is the use of an evaluation function as a way to inject task specific knowledge

into the control process.

Better solution higher heuristic value/ lower heuristic value depending upon the problem.

Steepest Ascent Hill Climbing or gradient search.

Considers all the moves from the current state & select the best one as the next state.

Algorithm:

1. Evaluate the initial state :-

2. Loop until a solution is found or until a complete iteration produces no change to current state:

a. Let SUCC be a state such that any possible successor of the current state will be better than

SUCC.

b. For each operator that applies to the current state do.

i. Apply the operator & generate a new state

ii. Evaluate the new state of not compare it to SUCC it it is better then SUCC to this state. It is

not better leave SUCC if it is better then set SUCC to this state. If it is not better, leave SUCC

alone.

Both algorithm may fail to find a solution algo may terminate not by finding a goal but by getting a

state from which no better states can be generated. This will happen if the program has reached a

local maximum, a plateau or a ridge.

Local maximum: - A state better than its entire neighbor, but is not better than some other states

farther a way.

At a local maximum all moves appear to make things worse, they are particularly frustrating

because they often occur almost within sight of a solution called as foothills .

Plateau: - A flat are of these search space in which a whole set of neighbor states have the same

value. Difficult to determine the best direction in which to move by making local comparisons.

Ridge: - Special kind of local (Maximum are of the search space that is higher than surrounding

areas & that itself has a slope. These are some ways of dealing with these problems, although

these methods are by no means guaranteed. Back track to some earlier node & try going in a

different direction. Maintain a list of paths almost taken & go back to one of them if the path that

was taken leads to a dead end. Make a big jump. Apply two or more rules before doing the test.

This corresponds to moving in several directions at once.

Hill Climbing

Algorithms

Page 23: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

1. Put the initial node on a list start

2. If (start is empty) or (start =goal) terminate search.

3. Remove the fist node from start. Call “a”.

4. If (a=goal) terminate search with success.

5. Else if node “a” has successors generate all of they find out how far they are from the goal

node. Sort them by the remaining distance from the goal and add them to the beginning of the

START.

6. Go to step 2.

Root

Search tree for hill climbing procedure

Ex. While listening to music adjusting tune and volume to make it melodies.

Tuning carburetor of a scoter, accelerator is raised to its maximum. Once it is tuned, so that the

engine keeps running for a considerably long period of time.

Problems local maxima, plateaus, ridge, backtracking, big jump trying different paths and choosing

different path: - all neighboring points have same value.

2.5 BEST-FIRST SEARCH

Best First search In hill climbing one move is selected and all the others are rejected, never to be reconsidered.

This produces the straight line behavior that is characteristic of hill climbing.

In best first search one move is selected but the others are kept around so that they can be

revisited later if the selected path becomes less promising.

8 3 7

27 2 2

9

Page 24: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Further best available state is selected in best first search even if that state has a value i.e. is

lower than the value of the state that was just explored. This constrast with hill climbing, which will

stop if there are no successor or states with better values than the current state.

Depth first search Good allows a solution without all competing branches to be expanded.

Breadth first search Good it does not get trapped on dead end paths.

Combining the two is to follow a single path at a time, but switch paths whenever some

competing path looks more promising then the current one Best First search.

BEST First Search Or Graph : Since each of its branches represents an alternative problem solving path.

AO* Algorithm is used for AND/OR graphs searching of AND/OR graph using AO* Algorithms.

Concept of problem reduction using AND/OR trees. A* Algorithm is not adequate for AND/OR are because for AND tree, all branches of it must be scanned to arrive at a solution.

Consider the fig.

A

B C D53

A

B C D53

E F4 6

A

B C D53

6 5

G H4 6

E F

I J 12

A

B C D53

6 5

G H4 6

E F

4A B C D

76

Root

5

Page 25: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

We find the minimal is B value is (4) But B forms a parts of the AND graph & hence we have to take into account the other branch of the AND tree. The estimate now has the value 9.

This forces us to rethink about the options & now we choose D because it has the lowest Algorithm.

Algorithm 1) Create initial graph. GRAPH with a single node NODE. Compute the evaluation function value

of NODE.

2) Repeat until NODE is solved or cost reaches a very high value that connot be expanded.

2.1) Select a node NODE 1 from NODE. Keep track of the path.

2.2) Expand NODE 1 by generating its children. For children which are not the ancestors of

NODE 1, evaluate the evaluation function value. If the child node is a terminal one, label it

END_NODE.

2.3) Generate a set of nodes DIFF. NODEs having only NODE1.

2.4) Repeat until DIFF_NODES is empty.

2.4.1) Choose a node choose_NODES is empty DIFF_NODES such that none of the

descendants of choose NODE is in DIFF_NODES

2.6 PROBLEM REDUCTION

Estimate the cost of each node emerging from choose-

NODE. This cost is the total of the evaluation function value & the cost of the arc.

Find a minimal value & mark a connector through which minimum is achieved overwriting the

previous it is different.

If all the output nodes of the marked connector are marked END-NODE label CHOOSE NODE as

OVER.

IF CHOOSE NODE has been marked OVER or the cost has changed, add to set DIFF-NODES all

ancestors of CHOOSE – NODE.

9 38

Goal : Acquire TV set

Goal : Steal TV Set Goal : Earn some money Goal : Buy TV

A

B C D

A

B C D

E F G H I J

Page 26: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

(5) (3) (4) 17 9 27

5 10 3 4 15 10

A simple AND OR Graph

2.7 CONSTRAINT SATISFACTION

Constraint satisfaction :-

Lot of constraint in real world even then solutions is found without violating the constraints’ design

problem in manufacturing planning and optimum travel tool.

Cryptarithmetic Problems: Ex. S E N D Here the constraints are

+ M O R E all alphabets have different numeric values

M O N E Y Since addition rules of addition are to be adhered to

Although guessing may still be required, the number of allowable. Guesses is reduced & so the

degree of search is curtailed.

A goal state is any state that has been constrained “enough” where enough must be defined for

each problem.

i.e. each letter is assigned a unique value.

Constraint satisfaction is a two step process.

i) constraints are discovered & propagated as far as possible throughout the system.

ii) if there is still not a solution search begins.

Initially rules for propagating constraints.

M = 1 S + M + C3 cannot be more than 19

S = 8 or 9 S + M + C3 > 9 (to generate a carry)

& M = 1

S + 1 + C3 > 9 & C3 is atmost 1

S + C3 > 8

O= 0 S + 1 + C3 (<=1) must be atleast 10 to generate a carry atmost 11 but M

already =1 so O cannot be 1.

C4 (1) C3 (0) C2(1) C1 (0)

S(8,9) E(12) N(3) D

M(1) O(0) R(9) E(2)

M(1) O(0) N(3) E(2) Y

C4 C3 C2 C1

S E N D

M O R E

M O N E Y

Page 27: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

N = E or E+1 depending on C2 but N cannot have same value So N = E + 1 & C2 = 1

C2 = 1 in order for C2 to be 1 the sum N + R + C1 > 9

So N + R > 8

E < > 9 N + R cannot be greater than 18 even with a carry in i.e. G =1.

So E cannot be 9

Assume

Now no more constraints are generated then to make progress from here we have to guess.

Let E be assigned suppose E = 2 since it occurs three times.

Next cycle begins

N = E + 1 = 2 + 1 = 3

R = 8 or 9 N(3) + R + C1 (1,0) = 2 or but since N is already 3 the sum of these non –ve

numbers cannot be less than 3.

R + 3 (0+1) = 12 and R = 8 or 9.

y.2 + D = Y OR 2 + D = 10 + Y e1 is generated.

Again no further constraints is generated a guess is required.

When C1 = 1 --- eventually reach deal end the system backtracks & try C1 = 0

2 + D = 10+y

D = 8 + y D = 8 or 9 already E =2 & y = 0 or i conflict

S; R, D cannot have same values i.e. 8 or 9 .

So C1 cannot be 1 it is realized initially some scarch could have been avoided.

Constraint propagation were not so sophisticated. Depends on reasoning. Sophisticate constraint – in which the specific cause of the inconsistency is identifified & only constraints that depend on that culprit are undone. Others left out. This approach is called Dependency directed backtrack. C1 = 0 2 + D = y

N + R = 10 + E R = 12 – 3 = 9 S = 8

D = 5 y = 6 ; D = 4 y = 7

2.8 Means End Analysis

Means – End Analysis:-

Page 28: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

A collection of search strategies that can reason either forward or backward, but for a given

problem, one direction or the other must be chosen. Often however a mixture of the two directions is

appropriate.

Such a mixed strategy would make it possible to solve the major parts of problems first & then

go back and solve the small problems that arise in “gling” the big process together. A technique

known as Means – ends. Analysis is allows us to do that.

Differences between the current state & the goal.

Operator that can reduce the differences is formed.

Operator cannot be applied to current state set up a sub problem of getting to a state to

which it can be applied.

Ex. General Problem Solver.

Backward chaining in which operators are saluted & then sub goal are set up to establish the

preconditions of the operators is called operator sub goaling.

Ex. Mathematical logic as the representation formalism.

Consider the English sentence.

Spot is a dog.

Represented in logic as dog (spot)

Logical representation of the fact that

All dogs have tails.

Vx: dog(x) has tail (x)

Using deductive mechanism of logic new representation object.

Has tail (spot)

Using backward mapping function generate English sentence spot has a tail.

Mapping functions are not one – to – one. In fact they are not even functions but rather many two

sentences.

Ex. “All dogs has a at least one tail or the fact that each dog has several tails.

What facts the sentences represent & then convert those facts into the new representations.

UNIT III

Page 29: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

CO.3 Explain the fundamentals of knowledge representation (logic-based, frame-based, semantic nets), inference and theorem proving ,Know how to build simple knowledge-based systems.

3.1 KNOWLEDGE REPRESENTATION ISSUES

Knowledge Representation Issues

Ex best first knowledge.

It becomes clear that particular knowledge representation models allow for more specific more

powerful problem solving mechanisms that operate on them.

Examine specific techniques that can be used for representing & manipulating knowledge

within programs.

Representation & Mapping:-

Large amount of knowledge & mechanisms to manipulate that knowledge to create solutions to

new problems.

-A variety of ways of representing knowledge (facts)

Two different kinds of entities.

Facts :- truths in some relevant world

These are the things we want to represent.

Representations of facts in some chosen formalism.

Things we are actually manipulating. Structuring these entities is as two levels.

The knowledge level, at which facts concluding each agents behavior & current goals are

described.

The symbol level at which representations of objects at the knowledge level are defined in

terms of symbols that can be manipulated by programs.

English understanding English generation

Internal RepresentationsFacts

English Representations

Page 30: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Manipulate I Rs. Of the facts it is given Reasoning programs. These manipulating results inner

structures stored as R fig shows How three kinds of objects relate to each other ? Mapping between

facts & Representations.

Focus on facts, Representations, and on the two mappings that must exist between them.

L as representations mappings.

Forward representation mapping. Maps from facts to representations

Backwards representation mapping goes the other way from representations to facts.

Representation – natural language particularly English sentences.

English Representation of those facts in order to facilitate getting information into and out of the

system.

Mapping functions from English sentences to the representation. We are actually going to use &

from it back to sentences.

Lines represent – attributes.

Boxed nodes – objects & values of attributes of objects.

Arrows – point from an object to its value along the corresponding attribute line slot & filler

structure. Semantic network or a collection of frames.

viewing a Node as a Frame

Base ball player

Is a : Adult male.

Bats : (Equal Nanded)

Height: 6-1

Batting average: 252

Answer to the following queries.

Team (Pee-Wee-Reese) = Brooklyn Dodgers.

This attribute had a value stored explicitly in the knowledge base

Batting – average (Three – finger – Brown ) = .106

- Instance attribute to Pitcher & extract the value stored there – best guesses.

In the face of a lack of more precise information in fact in 1906 Browns Bathing avg was .204

Height (Pee –wee – Reese) = Right.

Bats (three-finger – Brown) = Right. Is a hierarchy – Baseball player?

Rule for computing a vale – (that for handed) as i/p person – Right handed.

Page 31: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Inferential Knowledge:-

Property inheritance is a powerful form of inference.

Procedures some of which reason forward from given facts to conclusion.

Resolution which exploits a proof by contradiction strategy.

Logic provides a powerful structure in which to describe relationships among value it is often

useful to combine this, or some other powerful description language, with is a hierarchy.

Procedural knowledge:-

So far – knowledge have concentrated on relatively static declarative facts.

Another useful kind of knowledge is operational or procedural knowledge procedural knowledge

can be represented in many ways. Simple code –

Code – more powerful since it makes exploits use of the name of the node whose value for handed

is to be formed.

Use of production rules: they are argument with information on how they are to be used.

Important difference is in how the knowledge is used by the procedures that manipulate it.

Procedural knowledge as Rules.

If: ninth inning, and

Score is close, and

Less than 2 outs, and

First base is vacant, and

Batter is better hitter than next batter

Then: walk the batter.

Desired real reasoning

Forward representation backward representation Mapping Mapping

Operation of program

Abstract reasoning process that a program is intended to model.

Final factsInitial facts

Internal representation of initial facts

Internal representation of final facts

Page 32: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Programmers do concrete implications of abstract concepts.

Approaches to knowledge Representation.

A good system for the representation of knowledge in a particular domain should possess the

following four properties:-

Representational adequacy the ability to represent all of the kinds of knowledge that are needed

in that domain.

Inferential Adequacy: - the ability to manipulate the representation structures in such a way as

to derive new structures corresponding to new knowledge inferred from ol.

Inferential Efficiency: - the ability to incorporate into the knowledge structure additional

information that can be used to focus the attention of the inference mechanism in the most

promising directions.

Acquisitioned Efficiency: - the ability to acquire new information easily. The simplest case

involves direct insertion by a person of new knowledge into the database.

No single system that optimizes all of the capabilities for all kinds of knowledge has yet been

found. As a result multiple techniques for knowledge representations exist.

Simple Relational Knowledge

Player Height Weight Bats -

Throws

Very weak inferential capabilities not possible to answer.

Who is the heaviest player but if procedure is provided then these facts will enable the procedure to

compute an answer.

Providing support

Page 33: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Inheritable Knowledge

is a

height

is a height

equal to handed s height

bats

is a is a Batting average

Instance instance -- class membership

Useful form of inference is property inheritance, in which elements of specific classes

inherit attributes & values from more general classes in which they are included.

Object are organized into classes --- must be arranged in a generalization hierarchy.

Issues in knowledge Representation.

Are any attributes of objects so basic that they occur in almost every problem domain? If

there are, we need to make sure that they are handled appropriately in each of the mechanisms we

propose. If such attributes exist, what are they?

Are there any important relationships that exist among attributes of objects

At what level should knowledge be represented? Is there a good set of primitives into which

all knowledge can be broken down? Is it helpful to use such primitives?

How should sets of objects be represented?

Given a large amount of knowledge stored in a database how can relevant parts be accessed

when they are needed?

RightPerson

Adult Male 5.10

Baseball player

6.1

.252

.106 Pitcher Fielder .262

Chicago cubs

Three finger brown

Pee Wee reese Brooklyn Dodgers

Page 34: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Important Attributes: - two attributes.

Instance & is A – they support properly inheritance they represent class membership

& class inclusion or by predicate logic.

Relationship among Attributes.

Attributes themselves are the entities what properties do they have independent of the specific

knowledge they encode for properties.

Inverses: - represents both relationships in a single representation that ignores focus. Team

(Pee-wee-Reese, Brooklyn- Dodgers)

How it is used depends on assertions.

2nd approach is to sue attributes that focus on a single entity but to sue them in pairs. One the

inverse of the other.

Team = Brooklyn –Dodgers

Team members = Pee – Wee – Reese

An is a Hierarchy of attributes – specializations of a attributes height specialization physical

size --- physical attribute generalization.

Techniques for Reasoning about values

Information @ the type of the value

Height – measured in a unit of length.

Constraints on the value often stated in terms of related entities. Age of a

person – cannot be greater then parent age.

Rules for computing the value when it is needed for bats attribute – backwards

Rules also called as if needed rules.

Rules that describe actions that should be taken if a value ever becomes known

called – forward rules -- or if added rules

Single valued attributes :-

Baseball player at any time can have only a single height & be a member of only one team

introduces an explicit notation for temporal interval. If how different values are ever asserted for

the same temporal interval, signal a contradiction automatically.

Assume that the only temporal interval that is of interest is now. So if a new value is

asserted replace the old value.

Page 35: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The Frame Problem:-

How to represent efficiently sequences of problem states that arise from a search process.

Consider the world of household robot facts like on (Plant 12, Table 34) under (Table 34, window

13) and in (Table 34, Room 15)

Too big no back Chair no back, too wide

too high, no back

Stool

Table

sideboard

drawn Desk no knee room

Fig. 4.11 A similarity Net

Thus whole problem of representing the facts that change as well as those that do not is

known as the frame problem.

Frame axioms—in robot world – a table with a plant on it under the window, suppose we move the

table to the centre of the room we must also infer that the plant is now in the center of the room too

but that the window is not.

Page 36: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

3.2 FIRST ORDER LOGIC, PREDICATE LOGIC

Using Predicate logic

Representing facts – the language of logic

Logic symbols “–” (material implication) “–” (not) “v” (OR) “Λ” (and) “” (for all) and “”

(there exists)

5.1 Representing simple facts in logic:-

Explore the use of propositional logic as a way of representing the sort of world knowledge

that an AI system might need.

We can easily represent real world facts as logical propositions written as well formed

formulas (wff) in proper logic as shown in fig.

It is raining RAINING

It is sunny SUNNY

It is windy WINDY

It is raining then it is not sunny

Represent the fact stated by the classical sentence

Socrates is a Man SOCRATESMAN

Plato is a man PLATOMAN

The two are totally separate assertion & we would similarities between Socrates and Plato. It would

be much better to represent these facts as:

MAN (SOCRATES) & MAN (PLATO)

Since now the structure of the representation reflects the structure of the knowledge.

But to do that we need to be able to use predicates applied to arguments.

All men are mortal MORTAL MAN

Fails to capture the relationship between any individual being a man & that individual being a

mortal needs variables and quantification tow write know separeat st about the mortality of ever

prosperity wondered aimlessly before found a companion in you.

Representation as a set of wff’s in predicate logic

1. Marcus was a man.

Man (Marcus)

If fails to capture some of the information in the English sentrence, notion of past tense .

2. Marcus was Pompeian.

Pompeian (Marcus)

3. All Pompeian are Romans

Ex : Pompeian (x) Roman (x)

Page 37: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

4. Caesar was a ruler

Ruler (Caesar)

5. All Roman were either loyal to caesar or hated him.

(x) : Roman (x) loyal to (x, Caesar) hate (x, Caesar)

Exclusive OR.

(x) : Roman (x)

6. Everyone is loyal to someone

(x) : Ǝy : loyal to (x,y)

Scope of quantifiers or Ǝy : x : loyal to (x,y)?

7. People only try to assassinate rulers they are not loyal to

x: y : person (x) Ʌ ruler (y) Ʌ

Try assassinate (x,y) loyal to (x,y)

8. Marcus tried to assassinate Caesar.

Try assassinate (Marus, Caesar)

Was Marcus loyal to Caesar ?

Loyal to (Marcus, Caesar)

Fig. An attempt to prove loyal to (Marcus, Caesar)

Loyal to (Marcus, Caesar)

↑ (7, substitution)

Person (Marcus) Ʌ ruler (Caesar) Ʌ

Try assassinate (Marcus, Caesar)

↑ (4)

Person (Marcus)

Try assassinate (Marcus, Caesar)

↑ (8)

Person (Marcus)

9. All men are people

x : man (x) person (x)

Page 38: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Three important issues in converting English sentences into logical statements & then using

those statement to deduce new ones.

Many English statements are ambiguous choosing the correct interpretation may be difficult.

Choice of representation simple representation.

Even in very simple situations, a set of sentences is unlikely to contain all the information

necessary to reason about the topic at hand.

Representing Instance & Is a Relationships. Captured the relationships they are used to express namely class membership & class

inclusion.

Ist part already represented in the representations class membership is represented with unary

predicates (such as Roman) each of which corresponds to a class. Asserting that P(x) is true is

equivalent to asserting that x is an instance (or element) of P.

2nd Class use the instance is a binary one and the 1st argument is an object & 2nd is ______ to

which the object belongs do not use an explicit is a predicate instead sublass relationship – (3)

3rd Contains representation that use both the instance & is a predicate explicitly is a simplifies

the use of (3) but it uses an extra axioms (6)

This additional axiom describes ho an instance relation & is a relation can be combined to

derive a new instance relation.

Fig. Three way of Representing class membership.

1) Man (Marcus) 1) instance (Marcus, man)

2) Pompeian (Marcus) 2) instance (Marcus, Pomp)

3) x : Pomp (x) Roman (x) 3) x : instance (x, Pomp) inst (x, Roman)

4) Ruler (Caesar) 4) instance (Caesar, ruler)

5)     x : Roman (x) loyal to (x, Caesar) 5) x : instance (x, Roman)

˅ hate (x, Caesar) loyal to (x, Caesar) ˅ hate (x, Caesar)

1. Inst (Marcus, man)

2. inst (Mar, Pampion)

3. Is a (Pompeian, Roman)

4. Instance (Caesar, ruler)

5. x : instance (x, Roman) loyal to (x, Caesar) ˅ hate (x, Caesar)

6. x : y : z : instance (x,y) Ʌ is (y,2) instance (x, 2)

Forward Reason :

Page 39: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

- Branching factor is great - Use some heuristic rules for deciding which answer is more likely & then try to prove that

one first - If it fails efforts is loosed.- Simpley try both answers simultaneously & stop one effort is successful.

3.3 Unification

UNIFICATION ALGORITHM

In propsoitional logic it is easy to determine that two literals can not both be true at the same time. Simply look for L and ~L . In predicate logic, this matching process is more complicated, since bindings of variables must be considered.

For example man (john) and man(john) is a contradiction while man (john) and man(Himalayas) is not. Thus in order to determine contradictions we need a matching procedure that compares two literals and discovers whether there exist a set of substitutions that makes them identical . There is a recursive procedure that does this matching . It is called Unification algorithm. In Unification algorithm each literal is represented as a list, where first element is the name of a predicate and the remaining elements are arguments. The argument may be a single element (atom) or may be another list. For example we can have literals as

( tryassassinate Marcus Caesar)

( tryassassinate Marcus (ruler of Rome))

To unify two literals , first check if their first elements re same. If so proceed. Otherwise they can not be unified. For example the literals

( try assassinate Marcus Caesar)

( hate Marcus Caesar)

Can not be Unfied. The unification algorithm recursively matches pairs of elements, one pair at a time. The matching rules are :

i) Different constants , functions or predicates can not match, whereas identical ones can.

ii) A variable can match another variable , any constant or a function or predicate expression, subject to the condition that the function or [predicate expression must not contain any instance of the variable being matched (otherwise it will lead to infinite recursion).

iii) The substitution must be consistent. Substituting y for x now and then z for x later is inconsistent. (a substitution y for x written as y/x)

The Unification algorithm is listed below as a procedure UNIFY (L1, L2). It returns a list representing the composition of the substitutions that were performed during the match. An

Page 40: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

empty list NIL indicates that a match was found without any substitutions. If the list contains a single value F, it indicates that the unification procedure failed.

UNIFY (L1, L2)

1. if L1 or L2 is an atom part of same thing do

(a) if L1 or L2 are identical then return NIL

(b) else if L1 is a variable then do

(i) if L1 occurs in L2 then return F else return (L2/L1)

© else if L2 is a variable then do

(i) if L2 occurs in L1 then return F else return (L1/L2)

else return F.

2. If length (L!) is not equal to length (L2) then return F.

3. Set SUBST to NIL

( at the end of this procedure , SUBST will contain all the substitutions used to unify L1 and L2).

4. For I = 1 to number of elements in L1 do

i) call UNIFY with the i th element of L1 and I’th element of L2, putting the result in S

ii) if S = F then return F

iii) if S is not equal to NIL then do

(A) apply S to the remainder of both L1 and L2

(B) SUBST := APPEND (S, SUBST) return SUBST.

3.4 STRUCTURED KNOWLEDGE REPRESENTATION

Strong Slot & Filler Structures: -No hard & fast rules @ what kinds of objects & links are good in general for knowledge

representation. CD, scripts & cyc embody specific notions of what types of objects & relations are

permitted.

10.1 Conceptual Dependency : - is a theory of how to represent the kind of knowledge about events

that is usually contained in natural language sentences.

The goal is to represent knowledge that

Facilitates drawing inferences from the sentences

Page 41: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Rman

I

Is independent of the language in which the sentences were originally stated.

Conceptual primitives that can be combined to form the meanings of words in any particular

language.

Structure & a specific set of primitives.

I gave the man a book.

I ATRANS book

Arrows indicate direction of dependency

Double arrow indicates two way link between actor & action.

P indicates past tense.

ATRANS is one of the primitive acts used by the theory it indicates transfer of possession

O indicates the object case relation

R indicates the recipient case relation.

A set of primitive acts. Actions are built.

ATRANS Transfer of an abstract relationship (e.g. Give)

PTRANS Transfer of the physical location of an object (e.g. go)

PROPEL Application of physical force to an object (e.g. push)

MOVE Movement of a body part by its owner (e.g. kick)

GRASP Grasping of an object by an actor (e.g. clutch)

INGEST ingestion of an object by an animal (e.g. eat)

EXPEL Expulsion of something from the body of an animal. (e.g. cry)

MTRANS Transfer of mental information (e.g. tell)

MBUILD Building new information out of odd (e.g. decide)

SPEAK Production of sounds (e.g. say)

ATTEND Focusing of a sense organ toward a stimulus (e.g. listen)

2nd Set – Set of allowable dependencies among the conceptualizations described in a sentence.

ACTS - Actions

PPs - objects (picture procedure)

AAs - modifiers of actions (action aiders)

PAs - modifiers of PPs (Picture aiders)

The set of conceptual tenses

P past ? interrogative

F future / negative

P O

Page 42: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

P

Pass by

P O

I

o

field

bag

Health (-10)

yesterday

P

Bob

gunI

t transistion nil present

ts start transmission delta timeless

tf finished transition C conditional

K continuing

1) John ran John PTRANS

2) John is tall John height (>average)

3) John is a doctor John doctor

4) A nice boy boy

nice

5) John’s dog boy

nice

6) John pushed the car John PROPEL car

John7) John took the book from Mary John ATRANS

Mary

book

* I gave the man a book

8) John ate ice cream with a spoon John INGEST do

ice creamspoon

9) John fertilized the field John PTRANS

Fertilizedsize > x

10) The plants grew plants size = x

11) Bill shot Bob Bill PROPEL bullet

Bob

12) John ran yesterday John PTRANS

13) While going home I saw a frog I PTRANS t

I MTRANS frog

P

o

P

P

o

P

Home

I

O D

cp

eye

O D

Page 43: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

woods

14) I heard a frog in the woods MTRANS frog

Ex :

1) Ram ate the hot cake Ram INGEST

Hot cake

2) Rajiv will go to Delhi Rajiv PTRANS

3) Arjun saw laxmi on the Arjun MTRANSHill with a telescope

4) Shyam qualified the exam Shyam PTRANS

In the first class

3.5 BACKWARD CHAINING ,RESOLUTION

P

O

Ocp

ears

R

P

O

Delhi

Rajiv

D

I

P

O

Cp

eyes

R

hill

laxmi Arjun

Use/do

O

telescope

Exam

Shyam

R

First class

Page 44: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

3.6 RESOLUTION

Resolution :

Precisely one of winter & the winter will be true at any point if winter is true then cold must

be true to guaranteed the truth of the 2nd clause.

A proof procedure that carried out in a single operation the variety of processes involved in

reasoning with statements in predicate logic.

It produces proof by refutation given any two clauses A and B if there is a literal P1 in A

which has a complementary literal P2 in B delete P1 & P2. From A & B and construct a disjunction

the remaining clauses.

The clauses so constructed called the resolvent of A and B.

1) A : P ˅ Q ˅ R

B : - P ˅ Q ˅ R

C : - Q ˅ R

A B

Q ˅ R C

R

Page 45: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

2) A : P ˅ Q ˅ R

B : - P ˅ R

C : - Q

D : - R

Example : Winter ˅ summer Winter ˅ cold Summer ˅ cold new clause

* Theorem Proving inferred using resolution from old.

Two methods i) Start with the given axioms use the rules of inference & prove ii) Prove that the negation of the result is not true.

Given that a) Physician (Bhaskar) ….. (2)

b) x : Physician (x) knows surgery (x) ……(3) Method 2

Knows _surgery (Bhaskar) ……(1) Equation (3)

Physician (x) knows surgery (x) ……(4)

It gives substitute x = Bhaskar - Physician (x) knows surgery (x) …..(6)

It contradicts assumption that was made.

3.7 SEMANTIC NETS

Semantic Networks.

Semantic Network is a structure for representing knowledge as a pattern of interconnected

nodes and arcs. It is also defined as a graphical representation of knowledge.

The objects under consideration serves as nodes & the relationships with another node give the

arcs.

Nodes represent

A B

X : Q ˅ R C

Y : R D

Z : Nil

Page 46: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Entities, Attributes, States or Events Arcs in the network give the relationship between the

nodes & Labels on the arc specify what type of relationship actually exists.

Weak slot and filler structures. Knowledge is structural as a set of entities and their attributes.

“Knowledge poor” structures “weak”

Is a & instance relations.

A semantic Network

Question : What is the connection between the Brooklyn Dodgers & Blue.

Has part

Uniform color

team

Mammal

Person

Pee-wee ReeseBlue Brooklyn Dodgers

Nose

is a

instance

has

is a

Scooter Two - wheeler Motor – bike

Brakes Moving – vehicles Engine

Electrical system Fuel - system

is a is a

has

has has

Page 47: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

This kind of reasonin exploits one of the important advantages that slot & filler

structures have over purely logical representation because of entity based organization.

* John gave the book to Marry

John is taller than Bill

Partitioned Semantic Nets

The dog bit the mail carrier nodes d,b, & m represent a particulare dog a particular biting & a

particular mail carrier. A single net with no partitioning.

agent

object

Give

EV 7 John B K 23

Bookinstance

instance

Mary

Beneficiary

John

H1 H2

Bill

Height

Greater than

Height

dogs

d b M

Male carrier

bite

is ais a is a

assailant victim

Page 48: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Every dog has bitten a mail carrier …. (b)

x : Dog (x) Ǝ y : Mail carrier (y) Ʌ Bite (x,y)

Node g stands for the assertion given aboveg is an instance of the special class gs of general statements about the world (i.e. those with universal quantifiers)Every element of gs has too attributes a form which states the relation being asserted & one or more connections.

For every dog d there exist ability event b & a mail carrier m.Every dog in town has bitten the constable. C lies outside existential quantifier. Thus it is not viewed as an existentially quantified variable whose value may depend on the value of d.

b M

Male carrier

bitedogs

d

is ais a is a

assailant victim

Gs

g

is a

bite

is a

Gs

g b C

constablesdogs

d

is a

victim

SA

assailant

Town dogs

is ais aform

S1

Page 49: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Every dog has bitten every mail carrier

Space S1 is included in SA

3.8 FRAMES

FRAMES :- means of representing common sense knowledge. Knowledge is organized into small packets called “Frames”. All frames of a given situation constitute the system.

A frame can be defined as a d at a structure that has slots for various objects & a collection of frames consist of expectation for a given situation. Frame are used to represent two types of knowledge viz. declarative/factual and procedural, declarative & procedural Frames: -

A frame that merely contains description about objects is call a declarative type/factual situational frame.

Name of the frame

Slots in the frame

Name : Computer Centre

9gs

b M

Male carrier

bitedogs

d

is ais a is a

assailant victim

form

is a

SA

S1

A/c Stationary cupboard

Computer Dumb terminals

Printer

Page 50: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Frames which have procedural knowledge embedded in it are called action procedure frames. The action frame has the following slots.

Actor slot which holds information @ who is performing the activity. Source Slot hold information from where the action has to begin. Destination slot holds information about the place where action has to end. Task slot This generates the necessary sub frames required to perform the operation.

Linking of procedural sub frames

Name : Cleaning the ict of carburetor

Actor

Object

  Source DestinationScooter Scooter

Task 1 Task 2 Task 3Remove

CarburetorClean Nozzle Fix

Carburetor

Name : Remove Carburetor

Actor Object

Source Destination

Task 1 Task 2 Task 3Remove

CarburetorClean Nozzle Fix Carburetor

Expert Carburetor

Scooter Scooter

Expert

Page 51: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

3.9 SCRIPTS, ONTOLOGY.

Scripts : -

A mechanisms for representing knowledge about common sequences of events.

A script is a structure that describes a stereotyped sequence of events in a particular content

consist of slots contains values/default values.

Components of a script

Entry conditions – conditions before the events described in the script can occur.

Result – conditions that will in general be true after the events described in the script have occurred.

Props - slots representing objects that are involved in the event described in the script.

Roles – Slots representing people who are envolved in the events described in the script.

Track – The specific variation on a more general pattern that is represented by this particular script.

Scenes – The actual sequences of events that occur.

Pseudo form of a restaurant script

Script : Going to a restaurant

Props : Food

Tables

Menu

Money

Roles : Owner

Customer

Waiter

Cashier

Scene1 : Entering the restaurant.

Enters the restaurant.

scans the tables chooses the best one.

decides to sit there.

goes there.

occupies the seat.

Entry conditions

Customer is hungry

Customer has money

Owner has food.

Scene 2: Ordering the food.

Customer asks for menu.

Waiter brings it.

Customer glances it.

Chooses what to eat.

Orders that item.

Results :

Customer is hungry.

Owner has more money.

Customer has less money.

Owner has less food.

Scene 3 :

Eating the food.

Waiter brings the food.

Customer eats it.

Page 52: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

UNIT IV

CO.4 Demonstrate working knowledge of reasoning in the presence of incomplete and/or uncertain information by applying Bayesian Networks and Fuzzy Logic.

4.1 HANDING UNCERTAIN KNOWLEDGE

UncertaintyDefinition: Uncertainty means that many of the simplifications that are possible with deductive inference are no longer valid.

Why does uncertainty arise? Agents almost never have access to the whole truth about their environment. Agents cannot find a categorical answer. Uncertainty can also arise because of incompleteness, incorrectness in agents understanding of

properties of environment.

To act rationally under uncertainty we must be able to evaluate how likely certain things are. With FOL a fact F is only useful if it is known to be true or false. But we need to be able to evaluate how likely it is that F is true. By weighing likelihoods of events (probabilities) we can develop mechanisms for acting

rationally under uncertainty.

4.2 RATIONAL DECISIONS, BASICS OF PROBABILITY

Probabilistic reasoningUsing logic to represent and reason we canrepresent knowledge about the world with facts and rules, like the following ones:

bird(tweety). fly(X) :- bird(X).

Page 53: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

We can also use a theorem-prover to reason about the world and deduct new facts about the world, for e.g.,

?- fly(tweety). Yes

However, this often does not work outside of toy domains - non-tautologous certain rules are hard to find.

A way to handle knowledge representation in real problems is to extend logic by using certainty factors.

In other words, replace IF condition THEN fact with IF condition with certainty x THEN fact with certainty f(x)

Unfortunately cannot really adapt logical inference to probabilistic inference, since the latter is not context-free.

Replacing rules with conditional probabilities makes inferencing simpler.

Replace smoking -> lung cancer or lotsofconditions, smoking -> lung cancer with P(lung cancer | smoking) = 0.6

Uncertainty is represented explicitly and quantitatively within probability theory, a formalism that has been developed over centuries.

A probabilistic model describes the world in terms of a set S of possible states - the sample space. We don’t know the true state ofthe world, so we (somehow) come up with a probability distribution over S which gives the probability of any state being the true one. The world usually described by a set of variables or attributes.

Default Reasoning & the closed world assumption.

-- uncertainty as a result of incomplete knowledge.

-- plausible default assumptions. Pat @ 20 yr old normally assume that

-- Learned that Pat has suffered from blackouts.

-- U will be forced to revise your beliefs.

Expressed as a (x) : Mb1 (x) ……. Mb2 (x)

c (x)

Page 54: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

a(x) precondition wff for the conclusion wff c(x)

M is the consistency operator & the bi (x) are conditions each of which must be separately consistent

with the KB for the conclusion c (x) to hold.

Symbolic Reasoning under uncertainty.

Introduction to Non monotonic Reasoning.

ABC Murder story.

Abbott, Babbit & Cabot be suspects in a murder case

- Uncertain fuzzy & often changing knowledge

- Non monotonic Reasoning :- axioms and/or the rules of inference are extended to make it

possible to reason with incomplete information.

These systems preserve however the property that any given moment a statement is

either believed to be tree, to be false or not believed to be either.

- Statistical Reasoning :- Representation is allowed to have numeric measure of certainty.

Abbot has an alibi, in the register of a respected hotel.

Babbit has a alibi, for his brother in law testified that babbit was visiting him in Brooklyn at the

time.

Cabot pleads alibi too, claiming to have been watching a ski meet in the cat skills.

Belief

1) The Abbott did not commit the crime.

2) The Babbitt did not.

3) That Abbott or Babbitt or Cabot did.

Goodluck. Cabot documents his alibi caught by television in the sidelines at the skimees.

A new belief thrust upon us.

4) That cabot did not.

- Technique for maintaining several parallel belief spaces.

- Conventional Reasoning systems 1st order logic are designed to work with information that has

three important properties.

i) It is complete with respect to the domain of interest.

ii) It is consistent.

iii) New facts can be added as they become available – monotonicity.

Nonmonotonic reasoning systems on the other hand are designed to be able to solve problems in

which all of these ppts are missing.

No reason to suspect the crime then assume he didn’t

Page 55: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Make clear the distinction between .

It is known that >P

It is not known whether P.

Predicate logic 1st instance 2nd as well a system.

Any inference that depends on the lock of some piece of knowledge a non monotonic inference.

- Inferences based on lack of knowledge.

- A new assertions are made.

- Defeasible nonmonotic inference may be defeated (rendered invalid)

N.M.R. doesn’t share this ppt T entails trust then T combined with N also entails W.

- How can a knowledge base be updated properly.

Valid sets of justification.

Abott is in town this week & so is available to testify but if we wait until next week he may be

out of town.

- Techniques for maintaining valid sets of justifications.

- How can knowledge be used to help resolve conflicts when there are several inconsistent non-

monotonic inferences that could be drawn.

- Contradiction to resolve.

- Locally consistent globally inconsistent no options to believe all of them at once.

Statistical Reasoning

- Problem in which genuine randomness in the world – playing card.

- Likelihood of various outcomes exploit it.

- 2nd class of Problem - No randomness behaves normally – unless somekind of exception –

common sense tasks : for which statistical function as summaries of the world.

- Numerical summary that tells us how often an exceptions of some sort can be expected to

occur.

* Probability & Baye’s Theorem.

- Collect evidence &

- Modify its behavior on the basis of the evidence.

Bayesian statistics is a statistical theory of a evidence

Conditional pbt P (H/E)

Pbt of hypothesis - H given that E evidence observed.

Page 56: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

For which we require

Prior pbt of H & the extent to which E provides evidence of H.

Define universe.

Then let

P (Hi/E) – pbt that hypothesis Hi is true given E

P (E /Hi) - pbt that we will observe evidence E given that hypothesis i is true.

P (Hi) - a prior pbt that hypothesis i is a true in the absence of any specific evidence.

K - no. of possible hypothesis.

Bayes theorem then states that

P (Hi/E) = (E /Hi) . P (Hi)

Σkn=1 P (E/Hn).)(Hn)

* Examining the geological evidence at a particular location to determine whether that would be a

good place to dig to find a desired mineral

Minerals Copper, Uranium.

* Medical diagnosis problem.

S : Patient has spots.

M : Patient has measles.

F : Patient has fever.

- Conditional pbts that arises from their conjuction.

- Given a prior body of evidence e & some new observations E we need to compute

P (H/E, e) = P (H/E). P(e/E,H) joints pbts.

P(e/E)

Bayes theorem intractable for several reasons.

Knowledge acquisition problems in surmountable.

- Too many plots – substantial empirical evidence – people are poor pbt estimation.

Space to store pbts too large.

Time required to compute pbts too large.

Despite these problems. B.T. attractive basis for an uncertain reasoning system.

* Certain factos & Rule based system

MYCIN Attemps to recommend appropriate therapies with bacterial infections.

Certain by fact in the rules consequent.

Rule

If

i) Stain of organism is gram positive.

ii) Morphology Coccus &

Page 57: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

iii) Growth conformation clumps.

Action

Then identify of the organism is staphylo coccus

Q. Certainty factor – measure between 0 to 1

If 0 evidence fails to support the hypothesis measure.

Cf = evidence either supports or denies a hypothesis.

- Strong independence assumptions that make it relatively easy to use.

- Assumptions create dangers if rules are not written carefully.

S : Sprinkler was on last night

W : Grass is wet

R : it rained last night

Rules:-

If the sprinkler was on last night then there is suggestive evidence (O.S) that the grass will be

wet this morning.

C.F. – 0.8 sprinkler suggest wet.

- 0.72 wet suggest rain.

Believe that it rained because we believe the sprinkler was on.

Danger whenever justification of a belief are important to determining its consequences.

Need to know – why we believe the grass is wet.

Bayesian Networks :-

CF as a mechanism for reducing the complexity of a Bayesian reasoning system.

B-N /ws preserve the formalism & rely instead on the modularity of the world.

constraints networks

ways of representing knowledge as sets of constraints.

2 ways propositions can influence the likelihood of each other

i) Causes influence the likelihood of their symptoms.

ii) Observing a symptom affects the likelihood of all of its possible causes.

Bayesian Network make a clear distinction @ two influences.

Page 58: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Representing causality uniformly

A graph contains an additional node corresponding to the propositional variable that tell us whether it

is currently a raining season.

- More information is needed to use as probabilistic reasoning.

- Pbt tablers are provided.

- The pbt of rain on given night is 0.9

> rain is 0.1

- Need a mechanism for computing the influence of any arbitrary node on any other.

Fig conditional pbts for a Bayesian Network

Attribute pbt

P(wet/sprinkler, Rain) 0.95

P(wet/7sprinkler, Rain) 0.9

(Rain) 0.8

P(wet/7sprinkler, Rain)

(Rain) 0.1

4.3 AXIOMS OF PROBABILITY

Review of probabilityGiven a set U (universe), a probability function is a function defined over the subsets of U that maps each subset to the real numbers and that satisfies the Axioms of Probability

1. Pr(U) = 12. Pr(A) ∈[0,1]3. Pr(A ∪B) = Pr(A) + Pr(B) –Pr(A ∩B)

Note if A ∩B = {} then Pr(A ∪B) = Pr(A) + Pr(B)

Sprinkler

Wet

Rain Sprinkler

Wet

Rain

Rainy season

Page 59: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The primitives in probabilistic reasoning are random variables. Just like primitives in Propositional Logic are propositions. A random variable is not in fact a variable, but a function from a sample space S to another space, often the real numbers. For example, let the random variable Sum (representing outcome of two die throws) be defined thus: Sum(die1, die2) = die1 +die2

Each random variable has an associated probability distribution determined by the underlying distribution on the sample space

Continuing our example: P(Sum = 2) = 1/36, P(Sum = 3) = 2/36, . . . , P(Sum = 12) = 1/36

Consider the probabilistic model of the fictitious medical expert system mentioned before. The sample space is described by 8 binary valued variables. Visit to Asia? A Tuberculosis? T Either tub. or lung cancer? E Lung cancer? L Smoking? S Bronchitis? B Dyspnoea? D Positive X-ray? X

There are 28= 256 events in the sample space. Each event is determined by a joint instantiation of all of the variables. S = {(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f), (A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t), . . . (A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t)}

Since S is defined in terms of joint instantiations, any distribution defined on it is called a joint distribution. ll underlying distributions will be joint distributions in this module. The variables {A, T, E, L, S, B, D, X} are in fact random variables, which ‘project’ values. L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f) = f L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t) = f L(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t) = t

Each of the random variables { A, T, E, L, S, B, D, X } has its own distribution, determined by the underlying joint distribution. This is known as the margin distribution. For example, the distribution for L is denoted P(L), and this distribution is defined by the two probabilities P(L = f) and P(L = t). For example,

P(L = f) = P(A = f, T = f,E = f,L = f,S = f,B = f,D = f,X = f) + P(A = f, T = f,E = f,L = f,S = f,B = f,D = f,X = t) + P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f) . . . P (A = t, T = t,E = t,L = f, S = t,B = t,D = t,X = t)

P (L) is an example of a marginal distribution.

Here’s a joint distribution over two binary value variables A and B.

Page 60: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

We get the marginal distribution over B by simply adding up the different possible values of A for any value of B (and put the result in the “margin”).

In general, given a joint distribution over a set of variables, we can get the marginal distribution over a subset by simply summing out those variables not in the subset.

In the medical expert system case, we can get the marginal distribution over, say, A, D by simply summing out the other variables:

However, computing marginals is not an easy task always. For example,

P(A = t, D = f) = P(A = t, T = f,E = f,L = f,S = f,B = f,D = f,X = f) + P(A = t, T = f,E = f,L = f,S = f,B = f,D = f,X = t) + P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = f) + P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = t) . . . P(A = t, T = t,E = t,L = t, S = t,B = t,D = f,X = t)

This has 64 summands! Each of whose value needs to be estimated from empirical data. For the estimates to be of good quality, each of the instances that appear in the summands should appear sufficiently large number of times in the empirical data. Often such a large amount of data is not available.

However, computation can be simplified for certain special but common conditions. This is the condition of independence of variables.

Two random variables A and B are independent iff

P(A,B) = P(A)P(B)

i.e. can get the joint from the marginals

This is quite a strong statement: It means for any value x of A and any value y of B

P(A = x, B = y) = P(A = x)P(B = y)

Note that the independence of two random variables is a property of a the underlying

probability distribution. We can have

Conditional probability is defined as:

It means for any value x of A and any value y of B

Page 61: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

If A and B are independent then

Conditional probabilities can represent causal relationships in both directions. From cause to (probable) effects

From effect to (probable) cause

4.4 BAYE’S RULE AND CONDITIONAL INDEPENDENCE , BAYESIAN NETWORKS

Bayesian NetworksRepresentation and Syntax

Bayes nets (BN) (also referred to as Probabilistic Graphical Models and Bayesian Belief Networks) are directed acyclic graphs (DAGs) where each node represents a random variable. The intuitive meaning of an arrow from a parent to a child is that the parent directly influences the child. These influences are quantified by conditional probabilities.

BNs are graphical representations of joint distributions. The BN for the medical expert system mentioned previously represents a joint distribution over 8 binary random variables {A,T,E,L,S,B,D,X}.

Fig.4.1 Bayesian Network for Medical Expert System

Conditional Probability TablesEach node in a Bayesian net has an associated conditional probability table or CPT. (Assume

all random variables have only a finite number of possible values). This gives the probability values for the random variable at the node conditional on values for its parents. Here is a part of one of the CPTs from the medical expert system network.

If a node has no parents, then the CPT reduces to a table giving the marginal distribution on that random variable.

Consider another example, in which all nodes are binary, i.e., have two possible values, which we will denote by T (true) and F (false).

Page 62: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Fig.4.2: Conditional Probability TablesWe see that the event "grass is wet" (W=true) has two possible causes: either the water

sprinkler is on (S=true) or it is raining (R=true). The strength of this relationship is shown in the table. For example, we see that Pr(W=true | S=true, R=false) = 0.9 (second row), and hence, Pr(W=false | S=true, R=false) = 1 - 0.9 = 0.1, since each row must sum to one. Since the C node has no parents, its CPT specifies the prior probability that it is cloudy (in this case, 0.5). (Think of C as representing the season: if it is a cloudy season, it is less likely that the sprinkler is on and more likely that the rain is on.)

Semantics of Bayesian Networks The simplest conditional independence relationship encoded in a Bayesian network can be

stated as follows: a node is independent of its ancestors given its parents, where the ancestor/parent relationship is with respect to some fixed topological ordering of the nodes. In the sprinkler example above, by the chain rule of probability, the joint probability of all the nodes in the graph above is,

P(C, S, R, W) = P(C) * P (S|C) * P(R|C,S) * P(W|C,S,R)

By using conditional independence relationships, we can rewrite this as

P(C, S, R, W) = P(C) * P(S|C) * P(R|C) * P(W|S,R)

Where we were allowed to simplify the third term because R is independent of S given its parent C, and the last term because W is independent of C given its parents S and R. We can see that the conditional independence relationships allow us to represent the joint more compactly. Here the savings are minimal, but in general, if we had n binary nodes, the full joint would require O(2^n) space to represent, but the factored form would require O(n 2^k) space to represent, where k is the maximum fan-in of a node. And fewer parameters make learning easier.

The intuitive meaning of an arrow from a parent to a child is that the parent directly influences the child. The direction of this influence is often taken to represent casual influence. The conditional probabilities give the strength of causal influence. A 0 or 1 in a CPT represents a deterministic influence.

Page 63: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Decomposing Joint distributionsA joint distribution can always be broken down into a product of conditional probabilities

using repeated applications of the product rule.

We can order the variables however we like:

Conditional Independence in Bayes Net A Bayes net represents the assumption that each node is conditionally independent of all its

non-descendants given its parents.So for example,

Fig.4.3: Example for Conditional Bayes Net

Note that, a node is NOT independent of its descendants given its parents. Generally,

Variable ordering in Bayes NetThe conditional independence assumptions expressed by a Bayes net allow a compact

representation of the joint distribution. First note that the Bayes net imposes a partial order on nodes: X <= Y iff X is a descendant of Y. We can always break down the joint so that the conditional probability factor for a node only has non-descendants in the condition.

Fig.4.4: Variable Ordering in Bayes Net

Page 64: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The Joint Distribution as a Product of CPTs Because each node is conditionally independent of all its non descendants given its parents,

and because we can write the joint appropriately we have:

So the CPTs determine the full joint distribution. In short,

Bayesian Networks allow a compact representation of the probability distributions. An unstructured table representation of the “medical expert system” joint would require 28−1 = 255 numbers. With the structure imposed by the conditional independence assumptions this reduces to 18 numbers. Structure also allows efficient inference — of which more later.

Conditional Independence and d-separation in a Bayesian NetworkWe can have conditional independence relations between sets of random variables. In the

Medical Expert System Bayesian net, {X, D} is independent of {A, T, L, S} given {E, B} which means:P(X, D | E, B) = P(X,D | E, B, A, T, L, S) equivalently . . . P(X, D, A, T, L, S | E, B) = P(A,T, L, S | E, B)P(X, D | E, B)

We need a way of checking for these conditional independence relations

Conditional independence can be checked using the d-separation property of the Bayes net directed acyclic graph. d-separation is short for direction-dependent separation.

Fig.4.5: Conditional Independence and d-separation in a Bayesian Network

If E d-separates X and Y then X and Y are conditionally independent given E.

E d-separates X and Y if every undirected path from a node in X to a node in Y is blocked given E.

Defining d-separation:

Page 65: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

A path is blocked given a set of nodes E if there is a node Z on the path for which one of these three conditions holds:

1. Z is in E and Z has one arrow on the path coming in and one arrow going out. 2. Z is in E and Z has both path arrows leading out. 3. Neither Z nor any descendant of Z is in E, and both path arrows lead in to Z.

Building a Bayes Net: The Family Out? Example

We start with a natural language description of the situation to be modeled:

I want to know if my family is at home as I approach the house. Often my wife leaves the light on if she goes out, but also sometimes if she is expecting a guest. When nobody is in home the dog is put in backyard, but he is also put there when he has bowl trouble. If the dog is in backyard, I will hear her barking, but I may be confused some other dog is barking.

Building the Bayes net involves the following steps.

We build Bayes net to get probabilities concerning what we don’t know given what we do know. What we don’t know is not observable. These are called hypothesis events – we need to know what the hypothesis events in a problem are?

Recall that a Bayesian network is composed of related (random) variables, and that a variable incorporates an exhaustive set of mutually exclusive events - one of its events is true. How shall we represent two hypothesis events in a problem?

Variables whose values are observable and which are relevant to the hypothesis events are called information variables. What are the information variables in a problem?

In this problem we have three variables, what is the causal structure between them? Actually, the whole notion of ‘cause’ let alone ‘determining causal structure’ is very controversial. Often (but not always) your intuitive notion of causality will help you.

Sometimes we need mediating variables which are neither information variables nor hypothesis variables to represent causal structures.

Learning of Bayesian Network Parameters One needs to specify two things to describe a BN: the graph topology (structure) and the

parameters of each CPT. It is possible to learn both of these from data. However, learning structure is much harder than learning parameters. Also learning some of the nodes are hidden, or we have missing data, is much harder than when everything is observed.

This gives rise to 4 cases:

We discuss only the first case only.

Page 66: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Known structure, full observability

We assume that the goal of learning in this case is to find the values of the parameters of each CPT which maximizes the likelihood of the training data, which contains N cases (assumed to be independent). The normalized log-likelihood of the training set D is a sum of terms, one for each

We see that the log-likelihood scoring function decomposes according to the structure of the graph, and hence we can maximize the contribution to the log-likelihood of each node independently (assuming the parameters in each node are independent of the other nodes). In cases where N is small compared to the number of parameters that require fitting, we can use a numerical prior to regularize problem. In this case, we call the estimates Maximum A Posterori (MAP) estimates, as opposed to Maximum Likelihood (ML) estimates.

Consider estimating the Conditional Probability Table for the W node. If we have a set of training data, we can just count the number of times the grass is wet when it is raining and the sprinkler is on, N(W=1,S=1,R=1), the number of times the grass is wet when it is sprinkler is off. N(W=1,S=1,R=1),etc, Given these counts (which are the sufficient statistics), we can find the Maximum Likelihood Estimate of the CPT as follows:

where the denominator is N(S=s,R=r) = N(W=0,S=s,R=r) + N(W=1,S=s,R=r). Thus "learning" just amounts to counting (in the case of multinomial distributions). For Gaussian nodes, we can compute the sample mean and variance, and use linear regression to estimate the weight matrix. For other kinds of distributions, more complex procedures are necessary.

As is well known from the HMM literature, ML estimates of CPTs are prone to sparse data problems, which can be solved by using (mixtures of) Dirichlet priors (pseudo counts). This results in a Maximum A Posteriori (MAP) estimate. For Gaussians, we can use a Wishart prior, etc.

4.5 EXACT AND APPROXIMATE INFERENCE IN BAYESIAN NETWORKS

Inferencing in Bayesian Networks Exact Inference The basic inference problem in BNs is described as follows: Given 1. A Bayesian network BN 2. Evidence e - an instantiation of some of the variables in BN (e can be empty) 3. A query variable Q

Compute P(Q|e) - the (marginal)conditional distribution over Q

Given what we do know, compute distribution over what we do not. Four categories of inferencing tasks are usually encountered. 1. Diagnostic Inferences (from effects to causes)

Given that John calls, what is the probability of burglary? i.e. Find P(B|J) 2. Causal Inferences (from causes to effects) Given Burglary, what isthe probability that John calls, i.e. P(J|B) Mary calls, i.e. P(M|B) 3. Intercausal Inferences (between causes of a common event)

Page 67: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Given alarm, what is the probability of burglary? i.e. P(B|A) Now given Earthquake, what is the probability of burglary? i.e. P(B|AE) 4. Mixed Inferences (some causes and some effects known) Given John calls and no Earth quake, what is the probability of Alarm, i.e. P(A|J,~E)

We will demonstrate below the inferencing procedure for BNs. As an example consider the following linear BN without any apriori evidence.

Consider computing all the marginals (with no evidence). P(A) is given, and

We don't need any conditional independence assumption for this.

For example, suppose A, B are binary then we have

Now,

P(B) (the marginal distribution over B) was not given originally. . . but we just computed it in the last step, so we’re OK (assuming we remembered to store P(B) somewhere).

If C were not independent of A given B, we would have a CPT for P(C|A,B) not P(C|B).Note that we had to wait for P(B) before P(C) was calculable.

If each node has k values, and the chain has n nodes this algorithm has complexity O(nk2). Summing over the joint has complexity O(kn).

Complexity can be reduced by more efficient summation by “pushing sums into products”.

Approximate Inferencing in Bayesian Networks Many real models of interest, have large number of nodes, which makes exact inference very

slow. Exact inference is NP-hard in the worst case.) We must therefore resort to approximation techniques. Unfortunately, approximate inference is #P-hard, but we can nonetheless come up with approximations which often work well in practice. Below is a list of the major techniques.

Variational methods. The simplest example is the mean-field approximation, which exploits the law of large numbers to approximate large sums of random variables by their means. In particular, we essentially decouple all the nodes, and introduce a new parameter, called a variational parameter, for each node, and iteratively update these parameters so as to minimize the cross-entropy (KL distance) between the approximate and true probability distributions. Updating the variational parameters becomes a proxy for inference. The mean-field approximation produces a lower bound on the likelihood. More sophisticated methods are possible, which give tighter lower (and upper) bounds.

Sampling (Monte Carlo) methods. The simplest kind is importance sampling, where we draw random samples x from P(X), the (unconditional) distribution on the hidden variables, and then weight the samples by their likelihood, P(y|x), where y is the evidence. A more efficient approach in high dimensions is called Monte Carlo Markov Chain (MCMC), and includes as special cases Gibbs sampling and the Metropolis-Hasting algorithm.

Page 68: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Bounded cutset conditioning. By instantiating subsets of the variables, we can break loops in the graph. Unfortunately, when the cutset is large, this is very slow. By instantiating only a subset of values of the cutset, we can compute lower bounds on the probabilities of interest. Alternatively, we can sample the cutsets jointly, a technique known as block Gibbs sampling.

Parametric approximation methods. These express the intermediate summands in a simpler form, e.g., by approximating them as a product of smaller factors. "Minibuckets" and the Boyen-Koller algorithm fall into this category.

4.6 FUZZY LOGIC

Fuzzy…• We start describing things in a slightly vague and fuzzy manner.

• For instance after seeing a long list of names, you tell your friend that his was cited somewhere near the middle of the list.

• The word near seems to be comprehended effortlessly by the human brain but what of computing systems?

• What does near mean in this context?

• Two names below the middle or five above or...?

• Is there a way we can make these number crunching systems understand this concept?

“Drive slowly”.• Does it mean you should drive at 10, 20 or 20.5 km/hour.

• The answer could be any value or a very different one depending on the context.

• If it is ascertained in a machine that any speed less than or equal to 20km/ hour means slow speed and anything above is fast, then does it mean that 20.1 km/hour (or 20.01 km/hour for that matter) is fast?

• This is an exaggeration in the real world.

Well then what’s fuzzy logic….• Fuzzy logic deals with how we can capture this essence of comprehension and embed it on the

system by allowing for a gradual transition from slow to high speeds.

• This comprehension, as per Lotfi Zadeh, the founder of the fuzzy logic concept, confers a higher machine intelligence quotient to computing systems.

Crisp Set• The conventional machine uses crisp sets to take care of concepts like fast and slow speeds.

• It relates speed to crisp values thereby forming members that either belong to a group or do not belong to it.

Page 69: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

• For example

Slow = {0,5,10,15,20,25,30,35,40} could mean a crisp set that says that when the value of speed is equal to either of those belonging to the set then the speed is categorized as slow.

Problem in Crisp sets• Continuously keep jerking if the speed oscillates in the interval [39,41]

• Situation could eventually cause harm and subsequent damage.

• Requirement: An alternative to a crisp set definition of speed.

Fuzzy Sets• Fuzzy sets introduce a certain amount of vagueness to reduce complexity of comprehension.

• It consists of elements that signify the degree or grade of membership to a fuzzy aspect.

• Membership values usually use closed intervals and denote the sense of belonging of a member of a crisp set to a fuzzy set.

• A crisp set A comprising of elements that signify the ages of a set of people in years.

• A= {2,4,10,15,21,30,35,40,45,60,70}

Ages and their membership to a particular setAge Infant Child Adolescent Young Adult Old

2 1 0 0 1 0 0

4 0.1 0.5 0 1 0 0

10 0 1 0.3 1 0 0

15 0 0.8 1 1 0 0

21 0 0 0.1 1 0.8 0.1

30 0 0 0 0.6 1 0.3

35 0 0 0 0.5 1 0.35

40 0 0 0 0.4 1 0.4

45 0 0 0 0.2 1 0.6

60 0 0 0 0 1 0.8

70 0 0 0 0 1 1

Fuzzy terminology

Page 70: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

• Universe of Discourse (U):

This is defined as the range of all possible values that comprise the input to the fuzzy system.• Fuzzy Set

Any set that empowers its members to have different grades of membership (based on a membership function) in an interval [0,1] is a fuzzy set.

• Membership function

The membership function μA which forms the basis of a fuzzy set is given by μA: U [0,1]

where the closed interval is one that holds real numbers. • Support of a fuzzy set (Sf):

• The support S of a fuzzy set f, in a universal crisp set U is that set which contains all elements of the set U that have a non-zero membership value in f. For instance, the support of the fuzzy set adult is

Sadult = {21,30,35,40,45,60,70}

Fuzzy terminology• Depiction of a fuzzy set:

• A fuzzy set f in a universal crisp set U, is written as

• f = μ1 /s1 + μ2 /s2 + μ3 /s3 + … + μn /sn

• where μi is the membership and si is the corresponding term in the support of f i.e. Sf.

• This is however only a representation and has no algebraic implication (the slash and + signs do not have any meaning).

• Accordingly,

• Old = 0.1/21 +0.3 /30 +0.35/35 + 0.4/40 +0.6/45 +0.8/60 + 1/70

Fuzzy Set Operations• Union: The membership function of the union of two fuzzy sets A and B is defined as the

maximum of the two individual membership functions. It is equivalent to the Boolean OR operation.

μAB = max(μA, μB)• Intersection: The membership function of the intersection of two fuzzy sets A and B is defined

as the minimum of the two individual membership functions and is equivalent to the Boolean AND operation.

μAB = min(μA, μB)• Complement:

The membership function of the Complement of a fuzzy set A is defined as the negation of the specified membership function.. This is equivalent to the Boolean NOT operation μ = μAB = (1-μA)

Page 71: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

• It may be further noted here that the laws of Associativity, Commutativity , Distributivity and De Morgan’s laws hold in fuzzy set theory too.

A Fuzzy System• Uses the concepts of fuzzy logic

• The logic is used when a mathematical model is missing or difficult to model.

A Fuzzy Room Cooler

Fuzzy regions• Temperature: Cold, Cool, Moderate, Warm and Hot

• Fan Speed: Slack, Low, Medium, Brisk, Fast

• Flow-rate: Strong-Negative (SN), Negative (N), Low-Negative (LN), Medium (M), Low-Positive (LP), Positive (P) and High-Positive (HP).

Fuzzy profiles

Page 72: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Some Fuzzy Rules• R1: If temperature is HOT and fan motor speed is SLACK then flow-rate is HIGH-POSITIVE.

• R2: If temperature is HOT and fan motor speed is LOW then flow-rate is HIGH-POSITIVE

• R3: If the temperature is HOT and fan motor speed is MEDIUM then the flow-rate is POSITIVE.

• R4: If the temperature is HOT and fan motor speed is BRISK then the flow-rate is HIGH-POSITIVE.

• R5: If the temperature is WARM and fan motor speed is MEDIUM then the flow-rate is LOW-POSITIVE,

• R6: If the temperature is WARM and fan motor speed is BRISK then the flow-rate is POSITIVE.

Page 73: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

UNIT V

CO.5 Ability to apply learning in problem solving , learning probabilistic models.

5.1 WHAT IS LEARNING

What is Learning? “… changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time.” [Simon, 1983]

Types of Learning

1. Rote Learning

Rote learning technique avoids understanding the inner complexities but focuses on memorizing the material so that it can be recalled by the learner exactly the way it was read or heard.

• Learning by Memorization which avoids understandingthe inner complexities the subject that is being learned;Rote learning instead focuses on memorizing the material so that it can be recalled by the learner exactly the way it was read or heard.

• Learning something by Repeating over and over and over again; saying the same thing and trying to remember how to say it; it does not help us to understand; it helps us to remember, like we learn a poem, or a song, or something like that by rote learning.

2. Learning from Example : Induction

A process of learning by example. The system tries to induce a general rule from a set of observed instances. The learning methods extract rules and patterns out of massive data sets.The learning processes belong to supervised learning, does classification and constructs class definitions, called induction or concept learning.The techniques used for constructing class definitions (or concept leaning) are :

• Winston's Learning program3.1 Winston's Learning

Winston (1975) described a Blocks World Learning program. This program operated in a simple blocks domain. The goal is to construct representation of the definition of concepts in the blocks domain.Example : Concepts such a "house".

Page 74: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

■ Start with input, a line drawing of a blocks world structure. It learned Concepts House, Tent, Arch as :brick (rectangular block) with a wedge (triangular block) suitably placed on top of it, tent – as 2 wedges touching side by side, or an arch – as 2 non-touching bricks supporting a third wedge or brick.

■ The program for Each concept is learned through near miss. A near miss is an object that is not an instance of the concept but a very similar to such instances.

■ The program uses procedures to analyze the drawing and construct a semantic net representation.

■ An example of such an structural for the house is shown below.

Object - house Semantic net

Wedge Brick

■ Node A represents entire structure, which is composed of two parts : node B, a Wedge, and node C, a Brick.Links in network include supported-by, has-part, and isa.

• Winston's Program■ Winston's program followed 3 basic steps in concept formulation:

1. Select one known instance of the concept. Call this the concept definition.

2. Examine definitions of other known instance ofthe concept. Generalize the definition to include them.

3. Examine descriptions of near misses. Restrict the definition to exclude these.

■ Both steps 2 and 3 of this procedure rely heavily on comparison process by which similarities and differences between structures can be detected.

■ Winston's program can be similarly applied to learn other concepts such as "ARCH".

has-partA

has-part

BSupported - by

C

isa isa

Page 75: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

4. Explanation Based Learning (EBL)

Humans appear to learn quite a lot from one example.Human learning is accomplished by examining particular situations and relating them to the background knowledge in the form of known general principles.This kind of learning is called "Explanation Based Learning (EBL)".

4.1 General ApproachEBL is abstracting a general concept from a particular training example.EBL is a technique to formulate general concepts on the basis of a specific training example. EBL analyses the specific training example in terms of domain knowledge and the goal concept. The result of EBL is an explanation structure, that explains why the training example is an instance of the goal concept. The explanation-structure is then used as the basis for formulating the general concept.Thus, EBL provides a way of generalizing a machine-generated explanation of a situation into rules that apply not only to the current situation but to similar ones as well.

5. DiscoverySimon (1966) first proposed the idea that we might explain scientific discovery in computational terms and automate the processes involved on a computer. Project DENDRAL (Feigenbaum 1971) demonstrated this by inferring structures of organic molecules from mass spectra, a problem previously solved only by experienced chemists.Later, a knowledge based program called AM the Automated Mathematician (Lenat 1977) discovered many mathematical concepts.After this, an equation discovery systems called BACON (Langley, 1981) discovered a wide variety of empirical laws such as the ideal gas law. The research continued during the 1980s and 1990s but reduced because the computational biology, bioinformatics and scientific data mining have convinced many researchers to focus on domain-specific methods. But need for research on general principles for scientific reasoning and discovery very much exists.Discovery system AM relied strongly on theory-driven methods of discovery. BACON employed data-driven heuristics to direct its search for empirical laws.

BACON.3 :BACON.3 is a knowledge based system production system that discovers empirical laws. The main heuristics detect constancies and trends in data, and lead to the formulation of hypotheses and the definition of theoretical terms. The program represents information at varying levels of description. The lowest levels correspond to direct observations, while the

Page 76: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

highest correspond to hypotheses that explain everything so far observed. BACON.3 is built on top of BACON.1.

■ It starts with a set of variables for a problem. For example, to derive the ideal gas law, it started with four variables, p, V, n, T.‡ p - gas pressure,‡ V - gas volume,‡ T - gas temperature,‡ n - is the number of moles.

■ Values from experimental data are inputted.

■ BACON holds some constant and try to notice trends in the data.

■ Finally draws inferences. Recall pV/nT = k where k is a constant.

■ BACON has also been applied to Kepler's 3rd law,Ohm's law, conservation of momentum and Joule's law.

• Example :Rediscovering the ideal gas law pV/nT = 8.32, where p is the pressure on a gas, n is the number of moles, T is the temperature and V the volume of the gas. [The step-by-step complete algorithm is not given like previous example, but the procedure is explained below]

■ At the first level of description we hold n = 1 and T = 300 and vary pand V. Choose V to be the dependent variable.

■ At this level, BACON discovers the law pV = 2496.0.

■ Now the program examines this phenomenon :when n = 1 and T = 310 then pV = 2579.2. Similarly, when

n = 1 and T = 320 then pV = 2662.4.■ At this point, BACON has enough information to relate the values of

pV and the temperature T. These terms are linearly related with an intercept of 0, making the ratio pV/T equal to 8.32.

■ Now the discovery system can vary its third independent term. while n = 2, the pV/T is found to be 16.64,while n =3, the pV/T is found to be 24.96.

■ When it compares the values of n and pV/T, BACON finds another linear relation with a zero intercept. The resulting equation, pV/nT = 8.32, is equivalent to the ideal gas law.

Page 77: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

6. Analogy

Learning by analogy means acquiring new knowledge about an input entity by transferring it from a known similar entity.This technique transforms the solutions of problems in one domain to the solutions of the problems in another domain by discovering analogous states and operators in the two domains.Example: Infer by analogy the hydraulics laws that are similar to Kirchoff's laws.

Qa = 3 Qb = 9

Qc = ?Hydraulic Problem

The other similar examples are :■ Pressure Drop is like Voltage Drop■ Hydrogen Atom is like our Solar System :

I1 I2

I3 = I1 + I2

Kirchoff's First LawThe Sun has a greater mass than the Earth and attracts it, causing the Earth to revolve around the Sun. The nucleus also has a greater mass then the electron and attracts it. Therefore it is plausible that the electron also revolves around the nucleus.

Page 78: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

5.2 KNOWLEDGE AND LEARNING

Knowledge in learningThis simple knowledge-free picture of inductive learning persisted until the early 1980s. The

modem approach is to design agents that already know something and are trying to learn some more this may not sound like a terrifically deep insight, but it makes quite a difference to the way we design agents. It might also have some relevance to our theories about how science itself works. The general idea is shown schematically in Fig.5.2

A cumulative learning process uses, and adds to, its stock of background knowledge over time.

5.3 LEARNING IN PROBLEM SOLVING

Learning by Parameter AdjustmentMany programs rely on an evaluation procedure to summarise the state of search etc. Game playing programs provide many examples of this.However, many programs have a static evaluation function.In learning a slight modification of the formulation of the evaluation of the problem is required.Here the problem has an evaluation function that is represented as a polynomial of the form such as:

The t terms a values of features and the c terms are weights.In designing programs it is often difficult to decide on the exact value to give each weight initially.So the basic idea of idea of parameter adjustment is to:

Start with some estimate of the correct weight settings. Modify the weight in the program on the basis of accumulated experiences. Features that appear to be good predictors will have their weights increased and bad ones will

be decreased.Samuel's Checkers programs employed 16 such features at any one time chosen from a pool of 38.

Learning by Macro OperatorsThe basic idea here is similar to Rote Learning:Avoid expensive recomputation

Page 79: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Macro-operators can be used to group a whole series of actions into one.For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We could treat laying the table as on action even though it involves a sequence of actions.The STRIPS problem-solving employed macro-operators in it's learning phase.Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true.STRIPS can achieve ON(A,B) in four steps:UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE), postconditions ON(A,B), ON(C,TABLE) and the four steps as its body.MACROP can now be used in future operation.But it is not very general. The above can be easily generalised with variables used in place of the blocks.

Learning by ChunkingChunking involves similar ideas to Macro Operators and originates from psychological ideas on memory and problem solving.The computational basis is in production systems (studied earlier).SOAR is a system that use production rules to represent its knowledge. It also employs chunking to learn from experience.Basic Outline of SOAR's Method

SOAR solves problems it fires productions these are stored in long term memory. Some firings turn out to be more useful than others. When SOAR detects are useful sequence of firings, it creates chunks. A chunk is essentially a large production that does the work of an entire sequence of smaller

ones. Chunks may be generalised before storing.

5.4 LEARNING FROM EXAMPLE

Learning by Example / Induction: It is similarity based learning. In this, large number of examples are given and machine learns to perform similar actions in similar situations. In case of human beings also, this form of learning is frequently used. When we are children, our teacher tells us so many things by giving examples. Suppose there are two fruits, one is green apple and other a pear. As an adult it is easy to make a difference however, for a child, it might not be easy to differentiate between above two fruits. In such situations, various examples of both the fruits are given to teach the difference.Similarly, in our daily life we see many examples of birds flying. Also, there are examples that when there are clouds in the sky, it rains. Based on these examples we formulate certain rules like, “all birds can fly” and “clouds bring rain”.When we formulate such types of rules and use them to draw conclusions in given situations, we learn the things by induction.Induction means “the inferring of general laws from particular instances”. Thus, inductive learning means, generalization of knowledge gathered from real world examples & use of the same for solving similar problems.

Page 80: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

5.5 LEARNING PROBABILISTIC MODELS

Probabilistic Models A probabilistic model of sensory inputs can: – make optimal decisions under a given loss function – make inferences about missing inputs – generate predictions/fantasies/imagery – communicate the data in an efficient way Probabilistic modeling is equivalent to other views of learning: – information theoretic: finding compact representations of the data – physical analogies: minimising free energy of a corresponding statistical mechanical system Bayes rule — data set — models (or parameters) The probability of a model given data set is: -! " # " $ is the evidence (or likelihood ) is the prior probability of - is the posterior probability of $! %&" ' )( Under very weak and reasonable assumptions, Bayes rule is the only rational and consistent way to manipulate uncertainties/beliefs (Poly ´ a, Cox axioms, etc). Bayes, MAP and ML Bayesian Learning: Assumes a prior over the model parameters.Computes the posterior distribution of the parameters: *+-,/.-01 . Maximum a Posteriori (MAP) Learning: Assumes a prior over the model parameters *+2,31 . Finds a parameter setting that maximises the posterior: *+2,.-01¤4 *+-,51*+"0 .6,51 . Maximum Likelihood (ML) Learning: Does not assume a prior over the model parameters. Finds a parameter setting that maximises the likelihood of the data: *+"0 .7,51

5.6 FORMAL LEARNING THEORY

Formal Learning Theory A device learns a concept if it can, given positive and negative examples, produce an

algorithm that will classify future examples correctly with probability 1/h. The complexity of learning a concept is a function of three factors: the error tolerance (h), the

number of binary features present in the examples (t), and the size of the rule necessary to make the discrimination ( f ).

If the number of training examples required is polynomial in h, t, and f, then the concept is said to be learnable.

Formal Learning Theory (Cont’d)

Conjunctive learning requires log(n) training examples, where n is the number of features.

Conjunctive learning with positive examples only requires about n training examples.

Page 81: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Finite automata are learnable only if the learner is allowed to perform experiments.

UNIT VICO.6 Apply the concept of knowledge engineering, learning, knowledge acquisition, understanding

natural language.

6.1 EXPERT SYSTEMS: FUNDAMENTAL BLOCKSExpert SystemsThe credibility of AI rose to new heights in the minds of individuals and critics when many Expert Systems (ES) were successfully planned, developed and implemented in many challenging areas. As of today, quite a heavy investment is done in this area. The success of these programs in very selected areas involving high technical expertise has left people to explore new avenues.“Expert Systems (ES) are knowledge intensive programs that solve problems in a domain that requires considerable amount of technical expertise”.“An Expert System is a set of programs that manipulates embedded knowledge to solve problems in a specialized domain that normally requires human expertise”.Characteristics of an Expert System:

·        They should solve difficult programs in a domain as good as or better than human experts.·        They should possess vast quantities of domain-specific knowledge to the minute details.·        These systems permit the use of heuristic search process.·        They explain why they ask a question and justify their conclusions.·        They deal with uncertain and irrelevant data.·        They communicate with the users in their own natural language.·        They possess the capacity to cater the individual’s desire.·        They provide extensive facilities for ‘symbolic processing’ rather than ‘numeric processing’.·        A final characteristic is from the point of economists and financial people: They should mint money.

Expert Systems need heavy investment and there should be considerable ‘Return on Investment’ (ROI).

Architecture and Modules of Expert System

Page 82: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The fundamental modules of an expert system are:-·        Knowledge Base·        User Interface·        Inference Engine·        Explanation Facility·        Knowledge Acquisition Facility·        External Interface

1. Knowledge Base: The core module of any expert system is its Knowledge-Base (KB). It is a warehouse of the domain-specific knowledge captured from the human expert via the knowledge acquisition module.   There are many ways of representing the knowledge in the knowledge-base such as logic, semantic nets, frames, scripts, production rules etc.

2. User Interface: User Interface provides the needed facilities for the user to communicate with the system. A user normally would like to have a construction with the system for the following aspects:

·        To get remedies for his problem.·        To know the private knowledge (heuristics) of the system.·        To get some explanations for specific queries.

Presenting a real-world problem to the system for a solution is what is meant in having a consultation. Here, the user-interface provides as much facilities as possible such as menus, graphical interface etc. to make the dialogue user-friendly and lively.

3. Inference Engine: Also called as ‘rule interpreter’ an inference engine (IE), performs the task of matching antecendents from the responses given by the user and firing rules.

      Basically there are two approaches:-      Forward Chaining- This works by matching the existing conditions of the problem (given facts) with

the antecendents of the rule in the knowledge base. Forward chaining is also known as data driven search or antecendent search.

      Backward Chaining- This is a reverse process of forward chaining. Here the rule interpreter tries to match the ‘THEN condition’ instead of the ‘IF condition’. Because of this backward chaining is also called consequent driven or goal driven search.

4. Explanation Facility: Getting answers to specific queries forms the explanation mechanism of the expert system. Basically any user would like to ask the following basic questions ‘why’ & ‘how’.

      Conventional programs do not provide these facilities. Explanation facility helps the user in the following ways:-

·        If the user is a domain expert, it helps in identifying what additional knowledge is needed.·        Enhances the user’s confidence in the system.·        Serves as a tutor in sharing the System’s knowledge with the user.·        Explanation Facility is a part of the user interface that carries out the above tasks.5. Knowledge Acquisition Facility: The major bottleneck in Expert System development is knowledge

acquisition. Knowledge Acquisition facility creates a congenial atmosphere for the expert to share the expertise with the system. KAF creates a congenial atmosphere for the expert to share the expertise with the system.

Page 83: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

6.  External Interface: This provides the communication between the Expert    System and the external environment. When there is a formal consultation, it is done via the user interface. In real time expert systems when they form a part of the closed loop system, it is not proper to expect human intervention every time to feed in the conditions prevailing & get remedies. Moreover, the time gap is too narrow in real time system. The external interface with its sensors gets the minute by minute information about the situation & act accordingly.

6.2 KNOWLEDGE ENGINEERING

Knowledge Engineering is generally known as the field that is responsible for the analysis and design of expert systems and is thus concerned with representing and implementing the expertise of a chosen application domain in a computer system. Research on cognition or cognitive science, on the other hand, is performed as a basic science, mostly within the disciplines of artificial intelligence, psychology and linguistics. It investigates the mental states and processes of humans by modelling them with a computer system and combining analytic and empirical viewpoints. Early on, knowledge acquisition was known as the activity of making explicit the human knowledge that is relevant for performing a task, so that it can be represented and become operational in an expert system. Knowledge acquisition and the field of knowledge engineering are consequently closely related to human cognition, which is studied in cognitive science. The specific relationship between knowledge engineering and cognitive science has changed over the years and therefore needs to be reconsidered in future expert system developments. Although knowledge acquisition activities are at most twenty years old, there is already a respectable history with noticeable successes and some initially disappointing failures to be looked back upon. Actually, more progress was made by the analysis of the failures than with the short term successes.

6.3 KNOWLEDGE ACQUISITION

Knowledge Acquisition Strategies:1. Protocol Analysis: In this method, the expert is asked to think aloud and try to express the mental

process while solving the problem. The protocol, consisting of the knowledge engineer’s observation & expert’s thought process is analyzed at a later stage for specific features of the type of problem. In this method, the knowledge engineer does not interrupt while the expert is on the work.

2.  Interviews & Introspection: This is another method and most commonly used. In this method, the knowledge engineer familiarizes the concepts about the domain and poses questions or problems to the experts who in turn, provide answers or solutions that help in revealing some heuristic knowledge.

3. Observation at site: In this method, the elicitor acts as a passive element and watches the expert in actual action. Procedural knowledge is obtained by this method.

4. Discussion about the problem: In this category there are three methods:-    a) Problem Description- In problem description, the expert is asked to give sample problems, for each

category of answer. This method will help in identifying the foundational characteristics of the problems.

    b) Problem Discussion- Problem discussion method involves discussion about a problem to the domain expert. The needed data, knowledge and procedures evolve by this method. Knowledge of finer granularity emerges from the discussion.

     c) Problem Analysis: The problem analysis part is similar to protocol analysis, wherein the expert is presented with a series of problems and asked to think and find solutions for the same.

Page 84: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

5.  Discussion about the system: This method involves the prototype system that has been developed. Three major methods are:-

     a) Tuning the System: In tuning the system, the domain specific expert is asked to provide a set of classic problems and solutions are obtained from the system. The solutions are then compared with the solutions obtained by the human expert and the system is tuned by adding knowledge of high granularity.

     b) Verifying the system: In verifying the system, the expert is totally explained about the intricacies of the system and is asked to verify the working of the system. This is a tedious task.

     c) Validating the system: In validating the system, the results of the system and that of the expert are given to the outside experts to find out the validity of the solution.

Difficulties in Knowledge Acquisition·        Domain experts store their private knowledge subconsciously. They do not keep a written record of

their heuristics. So, unless and until a problem comes that needs that private piece of knowledge, it remains passive and hidden.

·        Domain experts have the problem of effective communication. Most experts find it difficult to explain their reasoning process. Lack of proper communications makes knowledge acquisition process very tedious and inefficient.

·        Much of the human expertise is basically intuitive which is the capability of skilled pattern recognition. Intuition is very hard to verbalize.

Major Application Areas of Expert Systems:-Ø AnalysisØ ControlØ DesigningØ DiagnosisØ MonitoringØ PlanningØ PredictionØ Repair

Examples of Expert Systems·        DENDRAL does the inferring process of structure elucidation of chemical compounds.·        MYCIN, an expert system for diagnosis of bacterial infections and effectively handles uncertain

data.·        XCON/R1, a system in use at Digital Equipment Corporation for configuring VAX computers.

6.4 KNOWLEDGE BASED SYSTEMS

The typical architecture of an KBS is often described as follows:

Page 85: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

The inference engine and knowledge base are separated because: the reasoning mechanism needs to be as stable as possible; the knowledge base must be able to grow and change, as knowledge is added; this arrangement enables the system to be built from, or converted to, a shell. It is reasonable to produce a richer, more elaborate, description of the typical KBS. A more elaborate description, which still includes the components that are to be found

in almost any real-world system, would look like this:

The system holds a collection of general principles which can potentially be applied to any problem - these are stored in the knowledge base.

The system also holds a collection of specific details that apply to the current problem (including details of how the current reasoning process is progressing) - these are held in working memory.

Both these sorts of information are processed by the inference engine.

Any practical expert system needs an explanatory facility. It is essential that an expert system should be able to explain its reasoning. This is because:

It is not unreasonable to include an expert interface & a knowledge base editor, since any practical KBS is going to need a mechanism for efficiently building and modifying the knowledge base.

Expert systems Substantial Domain KB Applied AI in a very broad sense.

Page 86: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Expert system are complete AI programs.All techniques are exploited.

Set of production rules.

Expert system shells:- adding new knowledge corresponding to the new problem domain - EMYCIN Empty MYCIN derived from MYCIN typically support rules, frames, fms and a

variety of other reasoning mechanisms.

- Shell must provide easy to sue interface between an expert system i.e. written with a shell and

a larger, probably more conventional programming environment.

Explanation :-

People must interact easily

To facilitate this interaction the ES must have the capabilities.

Introduction to Expert System.Expert systems are knowledge intensive programs that solve problems in a domain that

requires considerable amount of technical expertise.

Machine can offer intelligent advice or take an intelligent decision about a processing function.

B Warehouse of the domain specific knowledge captured from the human expert via the knowledge

acquisition module.

Apart from CD, frames & scripts.

Production rulers are extremely popular KR structure today.

If < condition 1>

And < condition 2 >

And < condition n >

Then < action 1 >

And < action 2 >

Inference engine : - called as rule interpreter performs the task of matching antecedents from the

responses given by the user & firing rules.

Theory - Forward chaining trace conclusion KB

Backward chaining

Conflict resolution perform first

sequencing technique

perform the most specifics

most recent policy.

Page 87: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

User interface : - needed facility for the user to communicate with the system.

Explanation facility

Knowledge Acquisition Facility

External interface :- Communication between the ES & the external environment.

user Expert Knowledge Engineer

Problem identification

Decide on the vehicle for Development

Prototype Development

Plan the full scale system

Implementation maintenance evaluation of

the full systemUser

External interface

External interface External interface

Historical Database

Knowledge Engineer

Domain Expert

Maintenance personnel

Page 88: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Personnel involved in Es Development

6.6 UNDERSTANDING NATURAL LANGUAGE

Natural Language Processing (NLP)

Processing written text, using lexical, syntactic & semantic knowledge of the language as well

as the required real world information.

Processing spoken language above + added knowledge about phonology.

Problem : - Incomplete description of the information that they are intended to convey.

Some dogs are outside

Some dogs are on the lawn

Three dogs are on the lawn

Rover, Tripp & Spot ……..

Problem : The same expression means different things in different contexts.

Where’s the water

Problem : No natural language programming can be complete because new words expressions &

meaning can be generated quite freely.

I’ll fax it to you.

Problem : There are lots of way to say same thing .

Mary was born on October 11

Mary Birthday is October 11

* Features of language that make it both.

* Translating from one natural language to another

* Natural language processing includes both understanding & generation as well as other task such as

multilingual translation.

What is language understanding? & What does a sentence mean

Steps in Natural language understanding .

Morphological Analysis : Individual words are analyzed into their components & non word

tokens such as punctuation are separated from the word.

Page 89: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Syntactic Analysis : Linear sequences of words are transformed into structures that show how

the word relate to each other. English syntactic analyzer would reject the sentence “ Boy the go

to the store”.

Semantic Analysis :- Mapping is made between the syntactic structure & objects in the task

domain.

Discourse integration :- Ex – John wanted it depends on the prior discourse context while John

may influence the meaning of later sentences ex ; “He always had”.

Pragmatic Analysis :- The structure representing what was said is reinterpreted to determine

what was actually meant. “Do u know what time it is” request to be told the time.

An English interface to an operating system & the following sentence is typed.

“I want to print Bill’s .init file”

Morphological Analysis :

o Pull apart the word “Bill’s into the proper nound “Bill” & the possessive suffix “s”.

o Recognize “init” as a file extension adjective syntactic analysis parsing.

(unit noun phrases) -- reference markers in parentheses find

Constitutent to which meaning can be assigned.

S(RMI)

NP

PRO

I(RM2)

S(RM3)

VP

V

print

VP

Vwant

NP(RM4)

ADJS

Bill’s (RM5)

NP

ADJS.init

N

file

Page 90: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Fig. Syntactic AnalysisPerception & Action

Perception involved interpreting sights, sounds, smells & touch. Action includes the ability to navigate the world & manipulate objects

Robot Sensors Physical world.

AI Chess moves search/game playing millions of nodes.

* Real & simulated worlds

AI Roboti) I/P symbolic inform English

sentence 8 puzzle.

i) Analog signal 2-D wide image or a

speech waveform.

ii) Require general purpose

computers.

ii) Special H/W for perceiving & affecting

the world.

iii) AI programming can come up with

a optional plan of best first search.

iii) Sensors/effectors limited in precision

always some degree of uncertainty

obstacle stands.

iv) The real world is unpredictable,

dynamic & uncertain - trade off

between devising & executing

iv) React in Real time.

Searching & backtracking can be costly as Robots are operating in Real time.

A Design for an autonomous Robot

Attaching, Sensors & Effectors to existing AI

The physical worldPerception

Cognition

Action

Page 91: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Perception : -through many channels sight, sound, touch, smell taste sensors laser range finders,

speedometers & radar.

Two important sensory channels for humans are vision & spoken language two faculties gather

almost all of the knowledge.

Video camera image

1. Signal processing enhancing the image

2. Measurement analysis.

3. Pattern Recognition : - Classifying an object into a category drawn from a finite set of

possibilities.

4. Image understanding :- Classifying them and building a 3 D model of the scene.

Speech Recognition Natural language understanding the systems usually accept typed i/p not possible for

number of applications.

- Spoken language - more natural form of communication in many human computer

interfaces.

- Design issues in speech systems.

* Speakers dependencies Vs independencies

* Continous Vs isolated word speech.

* Real time Vs offline processing

* Large Vs Small vocabulary.

* Broad Vs Narrow Grammer.

- Action

Mobility & intelligence seem to have evolved together.

Intelligence that puts mobility to effective use.

Nature of mobility in terms of how robots navigate through the world & manipulate

objects.

Planning routs

Page 92: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Start ----- obstacle --- goal

Manipulation

PattersonIV Dealing with inconsistencies & uncertainties

1st order logic.

No way to express uncertain, imprecise hypothetical or vague knowledge

No way to produce new knowledge @ the world.

Intelligent beings are continuously required to make decisions under a veil of uncertainty.

Information available may be contradictors or even unbelievable.

Non monotonic Reasoning : -

In predicate/props knowledge grows monotonically

New facts became known which contradicted & invalidated old knowledge.

Retractions – lead to a shrinkage or non-monotonic growth in the knowledge at times.

Truth Maintenance Systems also knowns as Belief Revision & Revision Maintenance systems.

- TMS maintain consistency of the knowledge

- Gives the inference component the latitude to perform non-monotonic inferences.

- New beliefs available with continue to be consistent & current/

Architecture of the Problem solver with a TMS

It tells the TMS what deductions it has made the TMS, in turn asks questions about current

beliefs & reasons for failures.

Belief Revision :- K.B contained only the proposition P, P Q, and modus ponens. From this

IE concludes Q & add this conclusion to KB.

Later it was learned that TP was appropriate.

It would be added to the KB leads to contradiction.

TMSInference Engine

Knowledge base

tell

ask

Page 93: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Consequently it is necessary to remove to eliminate inconsistency.

But with P now removed Q is no longer a justified belief. It too should be removed. This type

of Believe Revision is the job of TMS. Q is removed but not erased. It a use further P may be

true & P Q may be required.

Depending directed backtracking.

The records are maintained in the form of a dependency network.

Nodes in the network represent KB entries such as premises, conclusions, inference ration and

the like.

Attached to the nodes are justification which represent the inference steps from which the node

was derived.

Premise is a fundamental belief.

Base from which all other currently active nodes can be explained in terms of valid

justifications.

Two types of justificatios.

- Support lists. -- SL

- Conceptual dependencies -- CD

Ex: Cybil as a nonflying bird (an ostrich)

n1 Cybil is a bird (SW ( ) ( ) ) – a premise.

n2 Cybil can fly (SL (n1) (n3)) – unjustified belief.

n3 Cybil cannot fly (SL (n5) (n4) – justified belief.

n4 Cybil has wings (SL ( ) ( ) ) – retracted premise.

n5 Cybil is an ostrich (SL ( ) ( ) ) – a premise.

Suppose it is discovered that Cybil is not an ostrich. There by causing n5 to be retracted. Thus n3

which depends on n5 must also be retracted. Thus in turn changes the status of n2 to be a justified node.

Resultant belief now bird Cybil can fly.

Belief network node meanings.

Premises Assumptions Datum

Propositional logic

Appealing because simple to deal with & a decision procedure for it exists.

Predicate logic: - It provides a way of deducing new statements from old ones. Unfortunately

(goodway of reasoning with the knowledge) – yes

Page 94: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

However unlike propositional logic, it does not possessed decision procedure even an exponential one.

* A proposition in propositional logic takes only two values i.e. either the proposition is TRUE or

FALSE.

Example

1) Rubber is a good conductor of electricity False

2) Diamond is a hard material Tue

A The machine is defective

B The Production is less

The implication is written as.

If the machine is defective then production is less.

A B Implication

True True True i.e. if machine is defective then production is less.

False True True if machine is defective them production is less. No sense

in this statement. Production can be less for a variety of

other reasons.

True False False cannot be admitted hence implication is false.

False False True

Predicate logic or first order logic

Propositional logic works fine in situations where the result is TRUE or FALSE but not both.

However there are many real life situations that cannot be treated this way.

Consider All mammals suckle their young ones. Since elephant is a mammal it suckles its young one.

Propositional logic fails to express them. In order to overcome this deficiency . Predicate logic uses

three additional notions.

These are

- predicates

- Terms

- Quantifiers

* Predicates : - A relation that binds two atoms together.

* Baskar likes aeroplanes

Likes (Baskar, aeroplanes)

* Ravi’s father is Rani’s father

FATHER (father (Ravi), Rani)

Page 95: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Here who’s is Ravi’s father is not explitly stated but it represents a person it

is a term.

A constant, variable, function is a TERM.

Quantifiers:- it is a symbol that permits one to declare or identify the range or scope of the variables in

a logical expression.

Universal quantifiers ( )

Existential quantifier (Ǝ)

If a is a variable then a is read as

i) For all a

ii) For each a

iii) For every a

If b is a variable, then b is read as

i) There exists a b

ii) For some b

iii) For atleast one b

Conversion to clause form

All Romans who know Marcus

either hate Caesar or think that anyone who hates anyone is crazy.

x : [Roman (x) Ʌ know (x, Marcus)] [hate (x, caesar) V ( y : Ǝz : hate (y,z)

Think crazy (x,z))]

Conjuctive Normal form :- [Roman (x) Ʌ know (x, Marcus)] ˅

Hate (x, Caesar) ˅ hate (y,z) ˅think crazy (x,z)

Algorithm : Convert to clause form.

1. Eliminate using the fact that is equivalent to

Ex:

x : [Roman (x) Ʌ know (x, Marcus)] ˅

[hate (x, Caesar) ˅ ( y : (Ǝz : hate (y,z)) ˅ think crazy (x,z)

2. Reduce the scope of each to a single term, using the fact that ( p) = p.

De Morgan’s laws which say that

a b a ˅

(a Ʌ b) - a ˅ b

(a ˅ b) - a Ʌ b

Page 96: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

and

the standard correspondences between quantifiers

[ x : P(x) = Ǝx : P(x) Ǝx : P(x) = x : : P(x) ]

Performing this on steps (1) yields.

x : { [ Roman (x) ˅ know (x, marcus] ˅ [hate (x, Caesar) ˅ ( y : z : hate (y,z)

˅ think crazy (x,z)0}

3) Standardize variables so that each quantifier binds a unique variable.

For ex : x : P(x) ˅ x : Q(x)

Converted to

x : P(x) ˅ y : Q(y)

4) Move all quantifier s to the left of the formula without changing their relative order

x : y : z :

[ Roman (x) ˅ know (x, Marcus)] ˅ [hate (x, Caesar) ˅ ( hate (y,z) ˅ think crazy (x,y)]

The formula is in prenex Normal form.

5) Eliminate existential quantifiers

i) Ǝy : president (y) converted to President (S1)

ii) S1 is a function with no argument that somehow produces a value that satisfied

President.

iii) x : Ǝy : father of (y,x) transformed to x : father of (S2 (x),x)

This generated function is called as Skolem function. Sometimes one with no

arguments are called as skolen constants.

6) Drop the prefix step (4) Remove x : y : z : any variable it sees is universally quantified.

7) Convert the matrix into a Conjuction of disjuncts.

* Exploit associative property

i.e a ˅ (b ˅ c) = (a ˅ b) ˅ c

* distributive property

(a Ʌ b) ˅ c = (a ˅ c) Ʌ (b˅ c)

Ex: (winter Ʌ wearing boots) ˅ (summer Ʌ wearing sandals) becomes.

1) { winter ˅ (summer Ʌ wearing sandals)] Ʌ [wearing boots ˅ (summer Ʌ wearing sandals) }

Page 97: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

2) (winter ˅ summer) Ʌ (winter ˅ wearing sandals) Ʌ (wearing boots ˅ summer) Ʌ (wearing

boots ˅ wearing sandals)

5.4.3 Resolution in Propositional logic

The procedure for producing a proof by resolution of proposition P with respect to a set of

axioms F is the following

Algorithm : - Propositional Resolution.

A few facts in propositional logic

Given Axioms Converted to clause form

P P (1)

(PɅQ) R P ˅ Q ˅ R (2)

(SVT) Q S ˅ Q (3)

T ˅ Q (4)

T T (5)

Fig 5.8 Resolution in Propositional logic want to Prove R

Resolution process:- it takes a set of clauses that are all assumed to be true & based on information it

generates new clauses that represent restrictions on the way each of those original clauses can be made

true.

A contracdiction occurs when a clause becomes so restricted that there is no way it can be true (empty

clause)

2) to be true one of three things must be true P, Q or R But we are assuming that R is true

P ˅ Q ˅ R R assume true

P ˅ Q P but (1) says P is true

Q T ˅ Q but (4) T, Q is true

T T (5)

Page 98: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

Conflict/Construction

Hence assumption that R is true is wrong.

Resolution in Predicate logic.

Two literals are constradictory they are if one of them can be unified with the negation of the other.

Forex : man (x) & man (spot) are contradictory

man (x) & man (spot) can be unified.

This say that man (x) cannot be true for all x if there is known to be some x say spot, for which man (x)

is false. Thus to use resolution for expressions in the predicate logic we use the unification algorithm

to locate pairs of literals that cancel out.

Ex: 1) man (Marcus) 2) man (x1) ˅ mortal (x1)

x1= Marcus, man (Marcus) is false conclude mortal (marcus) is true for some value of x1, man (x1)

might be true making mortal (x1) irrelevant to the truth of the complete clause.

Fig 5.9 A Resolution Proof.

Axioms in clause form:-

1) man (Marcus)

2) Pompeian (Marcus)

3) Pompeian (x1) ˅ Roman (x1)

4) Ruler (Caesar)

5) Roman (x2) ˅ loyal to (x2, Caesar) ˅ hate (x2, Caesar)

6) Loyal to (x3, f1 (x3) )

7) man (x4)˅ ruler (y1) ˅ try assassinate (x4, y1) ˅ loyal to (x4, y1)

8) Try assassinate (Marcus, Caesar)

Prove : hate (Marcus, Caesar)

hate (Marcus, Caesar) (5)

Marcus/x2

Roman (Marcus) ˅ loyal to (Marcus, Caesar) (3)

Loyal to (Marcus, Caesar) ˅ Pompeian (Marcus) (2)

(7) Loyal to (Marcus, Caesar)

Marcus/xy

Caesar/y1 man (Marcus) ˅ ruler (Caesar) ˅ try assassinate (Marcus, Caesar)

Page 99: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

(1)

ruler (Caesar) ˅ try assassinate (Marcus, Caesar) (8)

ruler (Caesar) (4)

Using Resolution with Equality & Reduce

Axioms in clause form

1) man (Marcus)

2) Pompeian (Marcus)

3) Marcus was born in 40 A.D.

Born (M,40)

All men are mortal x : man (x) mortal (x)

4) man (x1) ˅ mortal (x1)

5) All pompeians died when the volcano erupted in 79 A.D.

erupted (volcano, 79) Ʌ x : [Pom (x) died (x,79)]

Pom (x2) ˅ died (x2, 79)

6) Erupted (volcano, 79)

7) mortal (x3) ˅ born (x3, t1) ˅ gt (t2 – t1, 150) ˅ dead (x3, t2)

8) Now = 1991

9) a) Alive means not dead.

x : t : [alive (x,t) dead (x,t)] Ʌ [ dead 9x,t) alive (x,t)]

alive (x4, t3) ˅ dead (x4, t3)

b) dead (x5, t4) ˅ alive (x5, t4)

    10) if someone dies, then he is dead at all later times.

x : t1 : t2 : died (x,t1) Ʌ gt (t2 – t1) dead (x,t2) died (x6, t5) ˅ gt (t6, t5) ˅ dead (x6,t6)

Prove : alive (M, now)

Alive (M,now) 9 (a)

M/x4, now/t3

dead (M, now) 10

Page 100: Chapter 2 - WordPress.com · Web viewSyntactic Analysis : Linear sequences of words are transformed into structures that show how the word relate to each other. English syntactic

M/x6, t6/now

deid (M, t5) ˅ gt (now, t5) (5)

Pomp (M) ˅ gt (now, 79)

Substitute equals.

Pom (M) ˅ gt (1991, 79) reduce

Pom (M) (2)

syedrehan243693206.wordpress.comhttps://syedrehan243693206.wordpress.com/