Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Artificial IntelligenceUNIT I
CO.1 Explain the concept behind problem representation paradigms & its characteristics, production system and defining problem as a state space representation.
1.1 WHAT IS AI? HISTORY & APPLICATIONS
1.2 ARTIFICIAL INTELLIGENCE AS REPRESENTATION & SEARCH
AI Problems
AI is the study to make computers do things which at the moment, people do better fails to
include some areas of potentially very large impact, namely problems that cannot now be solved
well by either computers or people .
AI Problems
Samuel – Wrote a checkers playing program that not only played games with opponents but
also used its experience at those games to improve its later performance.
Computers could perform well at those tasks simply by being fast at exploring a large number
of solution paths & then selecting the best one. it was thought that this process required very little
knowledge & could therefore be programmed easily later this assumption turned out to be false
since no computer is fast enough to overcome the combinational explosions generated by most
problems .
Common sense reasoning: - reasoning of objects, Sequence of actions and consequences. Ex
if you let go of something, it will fall to the floor and may be break.
GPS- General Problem Solver applied to several common sense task Newell, Shaw & Simon.
Perception – Vision, Speech
Natural Language – Understanding, Generation testing
AI Flourishing most as a Practical discipline as opposed to a purely research are primarily the
domains that require only specialized expertise without the assistance commonsense knowledge.
Before embarking on a study of special AI problems & solutions techniques. It is to discuss if not to
answer following :
1) What are our underlying assumption @
2) What kinds of techniques will be useful for solving AI problems?
3) At what level of details if at all we trying to model human intelligence?
4) How will we know when we have succeed in building an intelligent program?
Task Domain of AI
Mundane Tasks Formal Task Expert Task
1) Perception
- Vision
- Speech
2) Natural language
Problem of understanding
- Spoken language is a
perceptual problems
– Generation
- Translation
3) Commonsense reasoning
4) Robot Control
1) Games
- Chess
- Backgammon
- Chekers
- Go
2) Mathematic
- Geometry
- Logic
- Integral calculus
- Providing ppts
1) Engineering - Design- Fault finding- Manufacturing
planning
2) Scientific Analysis3) Medical Diagnosis4) Financial Analysis
The Underlying Assumption:-
Physical Symbol system hypothesis:
- Newell & Simon – describe the physical symbol system as Consist of a set of entities called a
symbol – expressions/patterns another type of entity- symbol structure – composed of no. of
instances/ token related besides these structures, the system also contain a collection of process that
operate on expressions to produce other expressions. Process of creation, modification,
reproduction & destructions it is a machine that produces a through time an evolving collection of
symbol structures.
The physical symbol system hypothesis:-
A physical symbol system has the necessary & sufficient means for general intelligent action
by experimentation hypothesis is only a hypothesis. No way of providing or disprove it on logical
grounds we may find that it is false – bulk of evidence – true but only way to determine its truth is
by experimentation.
Computer- provides the perfect medium for this experimentation since intelligence –
programs to perform selected task such as people.
As it has become increasingly easy to build computing machines so it has become
increasingly possible to conduct empirical investigations of the physical symbol system hypothesis.
Attempt to reduce a particularly human activity the understanding of jokes, to a process of
symbol manipulation is provided in the book mathematics and humor.
Physical symbol system will prove able to model some aspects of human intelligence & nor
others.
Physical symbol system by hypothesis is two fold.
What is an AI technique?
AI problems spans a very broad spec are there any techniques that are appropriate for the
solution of a variety of these problem besides the fact that they manipulate symbol.
How-technique is useful in solving AI tasks.
Intelligence requires knowledge, knowledge possess some properties
It is voluminous.
It is hard to characterize accurately.
It is constantly changing.
It differs from data by being organized in a way that corresponds to the ways it will be used.
For ad to conclude that AI technique is a method that exploits knowledge that should be represented
in such a way that.
Knowledge captures generalizations.
Situation that share important properties are grouped together else more & updating will be
required who must provide it.
It can easily be modified to correct errors & to reflect changes in the world & in our world
view.
It can be used in a great many situation even if it is not totally accurate or complete.
It can be used to help overcome its own sheer _ _ _ _ _ _ _ _ _ _ _ __ _ _ __ .
The range of possibilities that must usually be considered.
There is some degree of independence between problems & problems solving techniques.
Problems & series of approaches for solving each of them.
Ex: Tic tac – toe
Their complexity
Their use of generalizations.
The clarity of their knowledge.
The extensibility of their approach.
Representation of what is called AI techniques.
1 2 3
4 5 6
7 8 9
O – blank – 2
1 – x – 3
2 – 0 – 5
Question Answering
Three important AI techniques.
Search : Frame work in which any direct techniques can be embedded
Use of knowledge.
Abstraction :- Provides away of separating important features & variations from the many
unimportant ones that would otherwise overwhelm any process.
It is not possible give a precise definition of an AI techniques.
The level of the model
What we are trying to do our goal to make intelligent things
Modeling human intelligence
Of easiest way to do things
Syllabus dictionary phrases computers can do non AI problems.
2nd Class of problems.Model human intelligence commuters learn newspaper & answer the Q’s.
2) To enable computers to understand human reasoning.
3) To enable people to understanding computer reasoning.
4) To exploit what knowledge we can learn from people. Clues how to proceed.
5) To test psychological theories of human performance behavior of a paranoid person on a system
perminal.
6) Level of individual newrons.
Human cognitive theories.
Goal of simulating human performance.
Goal of building an intelligent program.
In either way we need a good model of the processes involved in intelligent reasoning.
Field of cognitive science in which psychologists linguistics & computer scientists work together.
Criteria for success.How well we knowledge have succeeded.
What is intelligence?
Alan Turing proposed the followed method of determining whether machine can think.
Turing Test.
2 – people & the machine to be evaluated.
Ask Question by typing
Interrogator ----------------------------- person/computer.
Separate Room knows only A & B.
Aims to determine which is the person & which is the machine.
Goal of machine is to fool the interrogator into believing that it is the person. If machine succeed
them conclude machine thinks.
Chess rating in a same way as human
DENDRAL program to analyzes organize compounds to determine their structure programs that
meets some performance standard for a particular to criteria for success.
1.3 PRODUCTION SYSTEM, BASICS OF PROBLEM SOLVING
*Production system:-
Structure the AI Problems in a way that facilitates describing & performing the search process.
Production system provides such structures a set of rules each consisting of left side – that
determines the applicability of the rule & a right side – that describes the operation to be performed
Knowledge/ database – information must be structured in any appropriate way.
Control strategy: - to match to database & resolve conflict when several a rule applier
Family of general production system interpreters. Basic production system language, such as ops5 *
Act. More complex hybrid system expert system shells knowledge based expert system.
General problem solving architectures like SOAR, a system based on a specific set of cognitively
motivated hypothesis about the nature of problem solving the process of solving the problem can
useful is modeled as a production system.
Control strategies
The first requirement of a good control strategy is that it causes motion.
Ex Water jug problem indefinitely filling the 4 gallon jug.
The second requirement of a good control strategy is that it be systematic.
Strategy causes motion but it is likely to arrive at the same state several times during the process –
exploring a useless sequence of operators several times before we finally find a solution.
To build a system to solve a particular problem – 4 things.
1) Define the problem precisely. Initial situations, Final situations constitute acceptable solution.
2) Analyze the problem.
Important features
3) Isolate & represent the task knowledge.
4) Choose best problem solving technique & apply it them.
Problem classification: -
Generic ctrl strategy i.e. appropriate for solving a problem.
Production system characteristics:-
Diff classes of problems production system are a good way to describe the operations that can be
performed in a search for a solution to a problem.
Two Q’s
1. can production system like problems, be described by the set of characteristics that shed some
light on how they can easily be implemented
2. if so what relationship are there between problems types & the types of production systems
best suited to solving the problems?
Monotonic Production system A.P.S in which the application of a rule never prevents the later
application of another rule that could also have been applied at the time of the list rule was selected.
A non monotonic production system is one in which this is not true.
A partially commutative p.s is a p.s with the property that if the application of a particular
sequence of rules transforms state X into state Y then any permutation of these rules i.e. allowable
also transforms state x into state y
A communicative production system is a p.s that is both monotonic & partially communicative
kinds of problems & kind of production system all problems can solved by all kinds of system.
Monotonic Non Monotonic
Partially Commutative Theorem providing Robot
Navigation
Non Partially commutative Chemical synthesis Bridge.
Pcom- monot Ignorable pbms
Implemented without the ability to backtrack.
Non Monot – Partially commute – useful for problems in which changes occur but can be reversed
& in which order of operation is not critical for (North), south, east, west.
To execute the path is not important
8 puzzles, & block problems are partially commutative.
Non partially commutative – useful in which irreversible changes occur chemical reaction – order is
important in irreversible process it is particularly the first time .
1.4 EXAMPLE-WATER JUG PROBLEM
1.5 PROBLEM REPRESENTATION PARADIGMS
Play chess specify
Starting position of the chess board.
The rulers that define the legal moves.
Board position that represent win implicit goal of not only playing but winning goal state
king is under attack.
8x* array.
Moves can be described as a set of rules two parts.
Left side pattern to be matched against the current board position.
Right side that describe the change to be made to the board position to reflect the move
separate rule roughly 10120 possible board position.
So many rules practical difficulties
No person could ever supply a complete set of such rules. It would take too long & could
certainly not be done without mistake.
No program could easily handle those rules more storing problems.
To minimize the such problems
Use some convenient notations for describing pattern & substitutions
EX:
white pawn at
Square (file e, rank 2)
And
Square (file e, rank 3)
Is empty move pawn from square (file e, rank 2) to square (file e, rank 4)
And
Square (file e, rank 4)
Is empty
Problems of playing chess as a problem of moving around in a state space where each state
corresponds to a legal position of the board. The state space representations form the basic of most
of the AI methods. Its structure corresponds to the structure of problem solving in two important
ways.
It allows for a formal definition of a problem as the need to convert some given situation into
some desired situation using on using a set of permissible operations.
Permit to define a process for solving a problem with a combination of techniques search
reach rule to find some path.
Search is a very important process in the solutions of hard problems for which no more direct
technique are available.
State space representation ex 4 gallon – 3 gallon no marker pumps how u can get exactly 2 gallons
of water into the 4 gallon jug.
State space can be described as the set of ordered pairs of integers(x,y) such that x=0,1,2,3,4 &
y=0,1,2,3 start state (0,0), goal(2,n).
Operators to be used to solve the problem left sides matched against the current state right side is
the change after applying control structure that loops certain rules several ways of making selection
If the 4 gallon jug is not full fill it (Stupid if the jug is full)
Ex Production rules for the water jug problem.
1) (x,y) --> (4,y) Fill the 4 gal jug
2) If x<4
3) (s,y) if x<3 -(x,3) fill the 3 gal jug
4) 3) (X, y) if x>0 (x-d, y) pour some water out of the 4 gal jug.
One solution to the water jug problem.
Rule 3, 4 never gets us any closer to a solution.
Rules that define problem & some knowledge about its solution.
Rule *4, 2) Solution is reached – rule 4
But empty 4 gal & transfer
Capture special ease knowledge
These rules cannot add power the system – (5 or 12) (9 or 11)
In fact depending upon the control strategy the use of rules degrade the performance
Operationalization: -
1st step to solve a problem must be the creation of formal & manipulability description of the
problem itself, Write programs that produce such formal descriptions from informal ones.
Ex. What it means to understand an English sentence our goal is to solve difficult unstructured
problems.
Summarizing
1) Define a state space that contains all the possible configurations of the relevant objects ( &
perhaps some impossible ones)
2) Specify one or more states within that space initial states)
3) Goal states one or more solutions states(Acceptable)
4) Specify rules that describe the actions (operators)
Doing is will live to thought
* What unstated assumptions are present in the informal problem description?
*How general should the rules be?
How much of the work required solving the problem should be recomputed & represented in the
rules.
1.6 DEFINING PROBLEM AS A STATE SPACE REPRESENTATION
.
1.7 PROBLEM CHARACTERISTICS I
Problem Characteristics:-Heuristic search is a very general method applicable to a large class of problems in order to choose
the most appropriate method for a particular problem it is necessary to analyze the problem along
several key dimensions is the problem decomposable into a set of independent smaller or easier sub
problems?
Can solutions steps be ignored or at least undone if they prove unwise?
Is the problems universe predictable?
Is a good solution to the problem obvious without comparison to all other possible solution?
Is the desired solution. a state of the world or a path to a state?
Is a large amount of knowledge absolutely requires solving the problem, or is knowledge important
only to constrain the search?
Can a computer that is simply given the problem required to solve the problem, or is knowledge
important only to constrain the search?
Can a computer that is simply given the problem return the solution? Or will the solution. of the
problem require interaction between, the computer & a person.
1.6.1. Is the problem Decomposable?
Using this technique of problem decomposed we can of ten solve very large problems easily.
Ex: Blocks world problem.
ON (C,A) ON (B,C) and ON (A,B)
Put B on C
Move A to table Put A on B
A Proposed solution for the Blocks
1. Clear (x) (block x has nothing on it)
ON (x, table) [pick up x &put it on the table]
2. Clear (x) and clear (y) ON(x, y) [put x on y]
Goal are underlined states that have been achieved are not underlined.
C
A B
B
C
A
ON (B,C) and ON (A,B)
ON (A,B)ON (B,C)
ON (B,C) Clear (A) ON (A,B)
Clear (A) ON (A,B)
B on C & A on B two separate problems
1. Given the start state putting B on C is simple
2. Second sub goal is not quite simple.
:: Only operators we have allow us to pick up single blocks at a time.
We have to clear of A by removing C before we can pick up A & put it on B.
Which can be done easily however if we now try to combine the two solutions into one solutions
we will fail. The two sub problems & not independent.
1.6.2. Can solution step be ignored or undone?
Theorem Proving – we proceed by first proving a lemma - that we think will be useful eventually
we realize that the lemma is no help at all. Are we in trouble? No. just proceeds from first with
everything true & in memory – lost only effort in moving the blind alley.
8 Puzzle –
In solving if we make a stupid move we change our mind & slide another slide as the space is
already occupied but we can back track & undo the first move.
Mistakes can be recovered but not as easily with Theorem proving. An additional step must
be performed to undo each incorrect step. Whereas no action was required to undo a useless lemma
control structure need top record all steps playing chess.
No backup – simply try to make the best of the current situation & go from there.
Three important classes of problems
Ignorable – (theorem proving) -- Simple ctrl structure
Recoverable (8 puzzle) solution steps can be undone. – Slightly more complicated ctrl
structure that sometime makes mistake.
Irrecoverable (chess) in which solution steps cannot be undone.—great deal of decision is
required.
1.6.3 – Is the universe predictable?
Certain outcome (8 puzzles) plans the game so as to minimize undo uncertain outcome. (Play
bridge) we plan by not with certainty we can know where all cards are & what other players do on
their terms.
Investigate several plans & use probability planning – problem solving without feedback from the
environment.
Planning revise plan uncertain outcome expensive
Ignorable verses recoverable verses irrecoverable uncertain outcome.
Playing bridge – fairly well available accurate estimates of plots.
Controlling a robot arm outcome is uncertain
Helping a lawyer decide how to defend his client against a murder charge.
Cannot even list all the possible outcomes.
1.8 PROBLEM CHARACTERISTICS II
1.6.4 – is a good solution absolute or relative?
Consider the problem of answering questions based on a database of simple facts.
1. Marcus was a man
2. Marcus was a Pompeian
3. Marcus was born in 4 A.D
4. All men are mortal
5. All Pompeian died when the volcano erupted in 79 A.D.
6. No mortal lives longer than 150 years
7. It is now 1991 A.D.
Is Marcus alive?
By representing these facts in a formal language, such as predicate logic & then using formal
influence methods.
OR
It is now 1991 A.D axiom 7
All Pompeian’s died in 79 A.D axiom 5
All Pompeian’s are dead now 7,5
Marcus was a Pompeian 2
Marcus is dead. 11,2
Path is not essential
Ex. Of Traveling salesman problem.
Path is essential to make shorter these two examples illustrate the difference between any path
problem & best path problem & computationally harder to solve.
1.6.5 Is the solution a state or a path?
Consider the problem of finding a consistent interpretation for the sentence.
“The bank president ate a dish of pasta salad with the fork.
Several components each of which in isolation may have more than one interpretation Ambiguity.
Bank financial institution, side of a river but only one has president.
Answer:1,4 (1,4) --- (8)3, 7 (3,7) --- (9)6, (8,6,9) --- (10) dead.
Dish object of the verb eat. It is possible that a dish was eaten. But more likely pasta salad is a salad
containing pasta meaning s can be formed from pairs of nouns.
Ex dog food does not normally contain dog “with the fork” could modify several parts of the
sentence. “With vegetables”, “with her friends”
Solution Only the interpretation itself is required
In contrast to water jug problem final state (2, 0)
So path is required not the final state.
Thus two problems. Natural language understands & water jug problem whose solution is a state of
them word & problems whose solution is a path to a state.
Problem states correspond to situation in the word not sequence of operations.
1.6.6 What is the role of knowledge?
Ex. Chess & news paper story difference in which a lot of knowledge is required.
1.6.7 Does the task require interaction with a person?
* Solitary no interaction data in soln out
*Conversational intermediate communication
Ex theorem providing.
UNIT II
CO.2 Analyse various AI search algorithms (uninformed, informed, heuristic, constraint satisfaction, best-first search, problem reduction
2.1 UNINFORMED SEARCH TECHNIQUES
Breadth first search Generate all the offspring of the roof by applying each of the applicable rules to the initial state for
each leaf node generate all its successors – continue until some rule produces a goal state.
Algorithm
1) Put the initial node on a list START
2) If (START is empty ) or (START =goal) search terminate
3) Remove the first node from START. Call this node A
4) If (A=Goal) terminate search with success
5) Else if node A has successive, generate all of them and add them at the tail of START
6) Go to step 2
Root
A B C
D E F G H I J
Goal Node
Depth first search Goal is reached early if it is on the left hand side of the tree.
Algorithm
1) Put the initial node on a list START
2) If (START is empty) or (START=goal) terminate
3) Remove the first node from START, call this node
4) If(A=goal) terminate search with success
5) Else if node A has successors, generate all of them & add them at the beginning of START
6) Go to step 2.
Root
(0,0)
(4,0)
(4,3) (0,0) (1,3)
(0,3)
(4,3) (0,0) (3,0)
A
D E
B
F H
C
I JG
Goal
Advantage of the Depth – first search
Requires less memory – current paths node are stored. Breadth first search – all the tree generated
so far must be stored Depth first search may find soln. without examining much of the search space
at all in contrast with breadth first search in which all parts of the tree must be examined to level n
before any nodes on level n+1 can be examined. This is particularly significant if many acceptable
solutions exist
Depth first search can stop when one of them is found.
Advantage of the Breadth – first search
Breadth first search will not get trapped exploring a blind alley .Contrast to depth first search which
follows a single unfruitful path.
If there is a solution then Breadth first search is guaranteed to find it.
Longer path never explored until all shorter ones have already been examined. This contrast with
depth first search which find a long path to a solution in one part of the tree.
Ex – Traveling salesman problem:-
Combinatorial Explosion: - To spend more time then willing
If there are N cities then the number of different paths among them is 1.2 …. (N-1) or (N-1).
(0,0)
(4,0)
(4,3)
The time to examine a single path is proportional to N, so the total time required to perform this
search is proportional to N! – A Large number branch & bound technique explore shortest path –
exponential.
2.2 INFORMED HEURISTIC BASED SEARCH
Heuristic search:-
A heuristic is a technique that improves the efficiency of a search process possibly by sacrificing
claims of completeness heuristics are like tour guides – good to the extent that they point in
generally interesting directions; they are bad to the extent that they may miss points of interest to
particular individuals.
Nearest neighbors heuristic: - works by selecting locally superior alternation applying it to traveling
salesman problem.
1) Arbitrarily select a starting city.
2) To select the next city, look at the cities not yet visited, & select the one closet to the current
city go to it next.
3) Repeat step 2 until all cities have been executes in time proportional to N2 a Significant
improvement over N!
4) Error bounds.
Without heuristics –
Ready do we actually need the optimum solution: a good approximation will usually serve very
well? In fact there is some evidence that people when they solve problems are not optimizers but
rather are satisfiers.
Ex- Search for parking space.
Most people stop as soon as they find a fairly good space, even as soon ass they find a fairly good
space, even if there might be a slightly better space up ahead. Although the approximations
produced by heuristics may not be very good in the worst case worst cases rarely arise in the real
world.
Trying to understand why heuristic works, or why it doesn’t work, often leads to a deeper
understanding of the problem.
*Problem solve: - How to solve a given problems use earlier solutions
Polya’s work serves as an excellent guide for people who want to become better problem solvers.
There are major ways in which domain specific, heuristic knowledge can be incorporated into a rule
themselves: ex. Rules in chess playing as a set of sensible moves as determined by the rule writer.
As a heuristic function that evaluates individual problem states & determines how desirable they
are.
AI study of techniques for solving exponentially hard problems in polynomial time by exploiting
knowledge about the problem domain.
Heuristic Search:
Heuristics are approximations used to minimize the searching process Problem for which no
exact algorithms are known & one needs to find an approximate & satisfying solutions. Problems
for which exact solutions are known but computationally infeasible.
Heuristics are numbers which guide the search process.
Following algorithms make use of heuristic evaluation functions
1) Hill Climbing 2) Constraint Satisfaction 3) Best First Search 4) AI Algorithms
5) AO Algorithms 5) Beam search
2.3 GENERATE AND TEST
Generate & Test:-
Algorithm:-
1. Generate a possible solution for some problem space for others it means generating a path
from a start state.
2. Test to see if this is actually a solution by comparing the chosen point or end points of chosen
path to the set of acceptable goal states.
3. Quit if solution is found else go to step.
Long time if the problem space is large is a depth first search since complete solutions must be
generated before testing.
Exhaustive search
Generating solution randomly but no guarantee solution will be found this form is also known as
British museum algorithm
Reference to find an object by searching randomly an object
Depth first three with backtracking
Traverse a graph rather than a tree.
2.4 HILL-CLIMBING
Hill Climbing –
Variant of generate & test in which feedback from the test procedure is used top help the generator
decide which direction to move in the search space. Test procedure responds to only yes/no but if
heuristic function is attached which provides an estimate of how close a given state is to a goal
state. Used when a good heuristic function is available.
Ex you are at unfamiliar city without a map & u wants to get down there. You simply aim for the
tall buildings: - the function is just the distance between the current location & the location of the
tall building & the desirable states are those in which the distance is minimized. Is a good solution
absolute or relative? Absolute solution exists; recognize a good state just by examining it. Relative
solution maximization or minimization (problem)
Algorithm:-
1. Evaluate the initial state if go at then return it & quit, else continue with initial state as the current
state.
2. Loop until a solution is found or until there are no new operators left to be applied in the current
state:-
a. Select an operator that has not yet been applied to the current state & apply it to produce a
new state.
b. Evaluate new state
i. If it is a goal state, then return it & quit
ii. If it is not a goal state but it is better than the current state, then make if the current state,
iii. If it is not better than the current state, then continue in the loop.
Difference is the use of an evaluation function as a way to inject task specific knowledge
into the control process.
Better solution higher heuristic value/ lower heuristic value depending upon the problem.
Steepest Ascent Hill Climbing or gradient search.
Considers all the moves from the current state & select the best one as the next state.
Algorithm:
1. Evaluate the initial state :-
2. Loop until a solution is found or until a complete iteration produces no change to current state:
a. Let SUCC be a state such that any possible successor of the current state will be better than
SUCC.
b. For each operator that applies to the current state do.
i. Apply the operator & generate a new state
ii. Evaluate the new state of not compare it to SUCC it it is better then SUCC to this state. It is
not better leave SUCC if it is better then set SUCC to this state. If it is not better, leave SUCC
alone.
Both algorithm may fail to find a solution algo may terminate not by finding a goal but by getting a
state from which no better states can be generated. This will happen if the program has reached a
local maximum, a plateau or a ridge.
Local maximum: - A state better than its entire neighbor, but is not better than some other states
farther a way.
At a local maximum all moves appear to make things worse, they are particularly frustrating
because they often occur almost within sight of a solution called as foothills .
Plateau: - A flat are of these search space in which a whole set of neighbor states have the same
value. Difficult to determine the best direction in which to move by making local comparisons.
Ridge: - Special kind of local (Maximum are of the search space that is higher than surrounding
areas & that itself has a slope. These are some ways of dealing with these problems, although
these methods are by no means guaranteed. Back track to some earlier node & try going in a
different direction. Maintain a list of paths almost taken & go back to one of them if the path that
was taken leads to a dead end. Make a big jump. Apply two or more rules before doing the test.
This corresponds to moving in several directions at once.
Hill Climbing
Algorithms
1. Put the initial node on a list start
2. If (start is empty) or (start =goal) terminate search.
3. Remove the fist node from start. Call “a”.
4. If (a=goal) terminate search with success.
5. Else if node “a” has successors generate all of they find out how far they are from the goal
node. Sort them by the remaining distance from the goal and add them to the beginning of the
START.
6. Go to step 2.
Root
Search tree for hill climbing procedure
Ex. While listening to music adjusting tune and volume to make it melodies.
Tuning carburetor of a scoter, accelerator is raised to its maximum. Once it is tuned, so that the
engine keeps running for a considerably long period of time.
Problems local maxima, plateaus, ridge, backtracking, big jump trying different paths and choosing
different path: - all neighboring points have same value.
2.5 BEST-FIRST SEARCH
Best First search In hill climbing one move is selected and all the others are rejected, never to be reconsidered.
This produces the straight line behavior that is characteristic of hill climbing.
In best first search one move is selected but the others are kept around so that they can be
revisited later if the selected path becomes less promising.
8 3 7
27 2 2
9
Further best available state is selected in best first search even if that state has a value i.e. is
lower than the value of the state that was just explored. This constrast with hill climbing, which will
stop if there are no successor or states with better values than the current state.
Depth first search Good allows a solution without all competing branches to be expanded.
Breadth first search Good it does not get trapped on dead end paths.
Combining the two is to follow a single path at a time, but switch paths whenever some
competing path looks more promising then the current one Best First search.
BEST First Search Or Graph : Since each of its branches represents an alternative problem solving path.
AO* Algorithm is used for AND/OR graphs searching of AND/OR graph using AO* Algorithms.
Concept of problem reduction using AND/OR trees. A* Algorithm is not adequate for AND/OR are because for AND tree, all branches of it must be scanned to arrive at a solution.
Consider the fig.
A
B C D53
A
B C D53
E F4 6
A
B C D53
6 5
G H4 6
E F
I J 12
A
B C D53
6 5
G H4 6
E F
4A B C D
76
Root
5
We find the minimal is B value is (4) But B forms a parts of the AND graph & hence we have to take into account the other branch of the AND tree. The estimate now has the value 9.
This forces us to rethink about the options & now we choose D because it has the lowest Algorithm.
Algorithm 1) Create initial graph. GRAPH with a single node NODE. Compute the evaluation function value
of NODE.
2) Repeat until NODE is solved or cost reaches a very high value that connot be expanded.
2.1) Select a node NODE 1 from NODE. Keep track of the path.
2.2) Expand NODE 1 by generating its children. For children which are not the ancestors of
NODE 1, evaluate the evaluation function value. If the child node is a terminal one, label it
END_NODE.
2.3) Generate a set of nodes DIFF. NODEs having only NODE1.
2.4) Repeat until DIFF_NODES is empty.
2.4.1) Choose a node choose_NODES is empty DIFF_NODES such that none of the
descendants of choose NODE is in DIFF_NODES
2.6 PROBLEM REDUCTION
Estimate the cost of each node emerging from choose-
NODE. This cost is the total of the evaluation function value & the cost of the arc.
Find a minimal value & mark a connector through which minimum is achieved overwriting the
previous it is different.
If all the output nodes of the marked connector are marked END-NODE label CHOOSE NODE as
OVER.
IF CHOOSE NODE has been marked OVER or the cost has changed, add to set DIFF-NODES all
ancestors of CHOOSE – NODE.
9 38
Goal : Acquire TV set
Goal : Steal TV Set Goal : Earn some money Goal : Buy TV
A
B C D
A
B C D
E F G H I J
(5) (3) (4) 17 9 27
5 10 3 4 15 10
A simple AND OR Graph
2.7 CONSTRAINT SATISFACTION
Constraint satisfaction :-
Lot of constraint in real world even then solutions is found without violating the constraints’ design
problem in manufacturing planning and optimum travel tool.
Cryptarithmetic Problems: Ex. S E N D Here the constraints are
+ M O R E all alphabets have different numeric values
M O N E Y Since addition rules of addition are to be adhered to
Although guessing may still be required, the number of allowable. Guesses is reduced & so the
degree of search is curtailed.
A goal state is any state that has been constrained “enough” where enough must be defined for
each problem.
i.e. each letter is assigned a unique value.
Constraint satisfaction is a two step process.
i) constraints are discovered & propagated as far as possible throughout the system.
ii) if there is still not a solution search begins.
Initially rules for propagating constraints.
M = 1 S + M + C3 cannot be more than 19
S = 8 or 9 S + M + C3 > 9 (to generate a carry)
& M = 1
S + 1 + C3 > 9 & C3 is atmost 1
S + C3 > 8
O= 0 S + 1 + C3 (<=1) must be atleast 10 to generate a carry atmost 11 but M
already =1 so O cannot be 1.
C4 (1) C3 (0) C2(1) C1 (0)
S(8,9) E(12) N(3) D
M(1) O(0) R(9) E(2)
M(1) O(0) N(3) E(2) Y
C4 C3 C2 C1
S E N D
M O R E
M O N E Y
N = E or E+1 depending on C2 but N cannot have same value So N = E + 1 & C2 = 1
C2 = 1 in order for C2 to be 1 the sum N + R + C1 > 9
So N + R > 8
E < > 9 N + R cannot be greater than 18 even with a carry in i.e. G =1.
So E cannot be 9
Assume
Now no more constraints are generated then to make progress from here we have to guess.
Let E be assigned suppose E = 2 since it occurs three times.
Next cycle begins
N = E + 1 = 2 + 1 = 3
R = 8 or 9 N(3) + R + C1 (1,0) = 2 or but since N is already 3 the sum of these non –ve
numbers cannot be less than 3.
R + 3 (0+1) = 12 and R = 8 or 9.
y.2 + D = Y OR 2 + D = 10 + Y e1 is generated.
Again no further constraints is generated a guess is required.
When C1 = 1 --- eventually reach deal end the system backtracks & try C1 = 0
2 + D = 10+y
D = 8 + y D = 8 or 9 already E =2 & y = 0 or i conflict
S; R, D cannot have same values i.e. 8 or 9 .
So C1 cannot be 1 it is realized initially some scarch could have been avoided.
Constraint propagation were not so sophisticated. Depends on reasoning. Sophisticate constraint – in which the specific cause of the inconsistency is identifified & only constraints that depend on that culprit are undone. Others left out. This approach is called Dependency directed backtrack. C1 = 0 2 + D = y
N + R = 10 + E R = 12 – 3 = 9 S = 8
D = 5 y = 6 ; D = 4 y = 7
2.8 Means End Analysis
Means – End Analysis:-
A collection of search strategies that can reason either forward or backward, but for a given
problem, one direction or the other must be chosen. Often however a mixture of the two directions is
appropriate.
Such a mixed strategy would make it possible to solve the major parts of problems first & then
go back and solve the small problems that arise in “gling” the big process together. A technique
known as Means – ends. Analysis is allows us to do that.
Differences between the current state & the goal.
Operator that can reduce the differences is formed.
Operator cannot be applied to current state set up a sub problem of getting to a state to
which it can be applied.
Ex. General Problem Solver.
Backward chaining in which operators are saluted & then sub goal are set up to establish the
preconditions of the operators is called operator sub goaling.
Ex. Mathematical logic as the representation formalism.
Consider the English sentence.
Spot is a dog.
Represented in logic as dog (spot)
Logical representation of the fact that
All dogs have tails.
Vx: dog(x) has tail (x)
Using deductive mechanism of logic new representation object.
Has tail (spot)
Using backward mapping function generate English sentence spot has a tail.
Mapping functions are not one – to – one. In fact they are not even functions but rather many two
sentences.
Ex. “All dogs has a at least one tail or the fact that each dog has several tails.
What facts the sentences represent & then convert those facts into the new representations.
UNIT III
CO.3 Explain the fundamentals of knowledge representation (logic-based, frame-based, semantic nets), inference and theorem proving ,Know how to build simple knowledge-based systems.
3.1 KNOWLEDGE REPRESENTATION ISSUES
Knowledge Representation Issues
Ex best first knowledge.
It becomes clear that particular knowledge representation models allow for more specific more
powerful problem solving mechanisms that operate on them.
Examine specific techniques that can be used for representing & manipulating knowledge
within programs.
Representation & Mapping:-
Large amount of knowledge & mechanisms to manipulate that knowledge to create solutions to
new problems.
-A variety of ways of representing knowledge (facts)
Two different kinds of entities.
Facts :- truths in some relevant world
These are the things we want to represent.
Representations of facts in some chosen formalism.
Things we are actually manipulating. Structuring these entities is as two levels.
The knowledge level, at which facts concluding each agents behavior & current goals are
described.
The symbol level at which representations of objects at the knowledge level are defined in
terms of symbols that can be manipulated by programs.
English understanding English generation
Internal RepresentationsFacts
English Representations
Manipulate I Rs. Of the facts it is given Reasoning programs. These manipulating results inner
structures stored as R fig shows How three kinds of objects relate to each other ? Mapping between
facts & Representations.
Focus on facts, Representations, and on the two mappings that must exist between them.
L as representations mappings.
Forward representation mapping. Maps from facts to representations
Backwards representation mapping goes the other way from representations to facts.
Representation – natural language particularly English sentences.
English Representation of those facts in order to facilitate getting information into and out of the
system.
Mapping functions from English sentences to the representation. We are actually going to use &
from it back to sentences.
Lines represent – attributes.
Boxed nodes – objects & values of attributes of objects.
Arrows – point from an object to its value along the corresponding attribute line slot & filler
structure. Semantic network or a collection of frames.
viewing a Node as a Frame
Base ball player
Is a : Adult male.
Bats : (Equal Nanded)
Height: 6-1
Batting average: 252
Answer to the following queries.
Team (Pee-Wee-Reese) = Brooklyn Dodgers.
This attribute had a value stored explicitly in the knowledge base
Batting – average (Three – finger – Brown ) = .106
- Instance attribute to Pitcher & extract the value stored there – best guesses.
In the face of a lack of more precise information in fact in 1906 Browns Bathing avg was .204
Height (Pee –wee – Reese) = Right.
Bats (three-finger – Brown) = Right. Is a hierarchy – Baseball player?
Rule for computing a vale – (that for handed) as i/p person – Right handed.
Inferential Knowledge:-
Property inheritance is a powerful form of inference.
Procedures some of which reason forward from given facts to conclusion.
Resolution which exploits a proof by contradiction strategy.
Logic provides a powerful structure in which to describe relationships among value it is often
useful to combine this, or some other powerful description language, with is a hierarchy.
Procedural knowledge:-
So far – knowledge have concentrated on relatively static declarative facts.
Another useful kind of knowledge is operational or procedural knowledge procedural knowledge
can be represented in many ways. Simple code –
Code – more powerful since it makes exploits use of the name of the node whose value for handed
is to be formed.
Use of production rules: they are argument with information on how they are to be used.
Important difference is in how the knowledge is used by the procedures that manipulate it.
Procedural knowledge as Rules.
If: ninth inning, and
Score is close, and
Less than 2 outs, and
First base is vacant, and
Batter is better hitter than next batter
Then: walk the batter.
Desired real reasoning
Forward representation backward representation Mapping Mapping
Operation of program
Abstract reasoning process that a program is intended to model.
Final factsInitial facts
Internal representation of initial facts
Internal representation of final facts
Programmers do concrete implications of abstract concepts.
Approaches to knowledge Representation.
A good system for the representation of knowledge in a particular domain should possess the
following four properties:-
Representational adequacy the ability to represent all of the kinds of knowledge that are needed
in that domain.
Inferential Adequacy: - the ability to manipulate the representation structures in such a way as
to derive new structures corresponding to new knowledge inferred from ol.
Inferential Efficiency: - the ability to incorporate into the knowledge structure additional
information that can be used to focus the attention of the inference mechanism in the most
promising directions.
Acquisitioned Efficiency: - the ability to acquire new information easily. The simplest case
involves direct insertion by a person of new knowledge into the database.
No single system that optimizes all of the capabilities for all kinds of knowledge has yet been
found. As a result multiple techniques for knowledge representations exist.
Simple Relational Knowledge
Player Height Weight Bats -
Throws
Very weak inferential capabilities not possible to answer.
Who is the heaviest player but if procedure is provided then these facts will enable the procedure to
compute an answer.
Providing support
Inheritable Knowledge
is a
height
is a height
equal to handed s height
bats
is a is a Batting average
Instance instance -- class membership
Useful form of inference is property inheritance, in which elements of specific classes
inherit attributes & values from more general classes in which they are included.
Object are organized into classes --- must be arranged in a generalization hierarchy.
Issues in knowledge Representation.
Are any attributes of objects so basic that they occur in almost every problem domain? If
there are, we need to make sure that they are handled appropriately in each of the mechanisms we
propose. If such attributes exist, what are they?
Are there any important relationships that exist among attributes of objects
At what level should knowledge be represented? Is there a good set of primitives into which
all knowledge can be broken down? Is it helpful to use such primitives?
How should sets of objects be represented?
Given a large amount of knowledge stored in a database how can relevant parts be accessed
when they are needed?
RightPerson
Adult Male 5.10
Baseball player
6.1
.252
.106 Pitcher Fielder .262
Chicago cubs
Three finger brown
Pee Wee reese Brooklyn Dodgers
Important Attributes: - two attributes.
Instance & is A – they support properly inheritance they represent class membership
& class inclusion or by predicate logic.
Relationship among Attributes.
Attributes themselves are the entities what properties do they have independent of the specific
knowledge they encode for properties.
Inverses: - represents both relationships in a single representation that ignores focus. Team
(Pee-wee-Reese, Brooklyn- Dodgers)
How it is used depends on assertions.
2nd approach is to sue attributes that focus on a single entity but to sue them in pairs. One the
inverse of the other.
Team = Brooklyn –Dodgers
Team members = Pee – Wee – Reese
An is a Hierarchy of attributes – specializations of a attributes height specialization physical
size --- physical attribute generalization.
Techniques for Reasoning about values
Information @ the type of the value
Height – measured in a unit of length.
Constraints on the value often stated in terms of related entities. Age of a
person – cannot be greater then parent age.
Rules for computing the value when it is needed for bats attribute – backwards
Rules also called as if needed rules.
Rules that describe actions that should be taken if a value ever becomes known
called – forward rules -- or if added rules
Single valued attributes :-
Baseball player at any time can have only a single height & be a member of only one team
introduces an explicit notation for temporal interval. If how different values are ever asserted for
the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is
asserted replace the old value.
The Frame Problem:-
How to represent efficiently sequences of problem states that arise from a search process.
Consider the world of household robot facts like on (Plant 12, Table 34) under (Table 34, window
13) and in (Table 34, Room 15)
Too big no back Chair no back, too wide
too high, no back
Stool
Table
sideboard
drawn Desk no knee room
Fig. 4.11 A similarity Net
Thus whole problem of representing the facts that change as well as those that do not is
known as the frame problem.
Frame axioms—in robot world – a table with a plant on it under the window, suppose we move the
table to the centre of the room we must also infer that the plant is now in the center of the room too
but that the window is not.
3.2 FIRST ORDER LOGIC, PREDICATE LOGIC
Using Predicate logic
Representing facts – the language of logic
Logic symbols “–” (material implication) “–” (not) “v” (OR) “Λ” (and) “” (for all) and “”
(there exists)
5.1 Representing simple facts in logic:-
Explore the use of propositional logic as a way of representing the sort of world knowledge
that an AI system might need.
We can easily represent real world facts as logical propositions written as well formed
formulas (wff) in proper logic as shown in fig.
It is raining RAINING
It is sunny SUNNY
It is windy WINDY
It is raining then it is not sunny
Represent the fact stated by the classical sentence
Socrates is a Man SOCRATESMAN
Plato is a man PLATOMAN
The two are totally separate assertion & we would similarities between Socrates and Plato. It would
be much better to represent these facts as:
MAN (SOCRATES) & MAN (PLATO)
Since now the structure of the representation reflects the structure of the knowledge.
But to do that we need to be able to use predicates applied to arguments.
All men are mortal MORTAL MAN
Fails to capture the relationship between any individual being a man & that individual being a
mortal needs variables and quantification tow write know separeat st about the mortality of ever
prosperity wondered aimlessly before found a companion in you.
Representation as a set of wff’s in predicate logic
1. Marcus was a man.
Man (Marcus)
If fails to capture some of the information in the English sentrence, notion of past tense .
2. Marcus was Pompeian.
Pompeian (Marcus)
3. All Pompeian are Romans
Ex : Pompeian (x) Roman (x)
4. Caesar was a ruler
Ruler (Caesar)
5. All Roman were either loyal to caesar or hated him.
(x) : Roman (x) loyal to (x, Caesar) hate (x, Caesar)
Exclusive OR.
(x) : Roman (x)
6. Everyone is loyal to someone
(x) : Ǝy : loyal to (x,y)
Scope of quantifiers or Ǝy : x : loyal to (x,y)?
7. People only try to assassinate rulers they are not loyal to
x: y : person (x) Ʌ ruler (y) Ʌ
Try assassinate (x,y) loyal to (x,y)
8. Marcus tried to assassinate Caesar.
Try assassinate (Marus, Caesar)
Was Marcus loyal to Caesar ?
Loyal to (Marcus, Caesar)
Fig. An attempt to prove loyal to (Marcus, Caesar)
Loyal to (Marcus, Caesar)
↑ (7, substitution)
Person (Marcus) Ʌ ruler (Caesar) Ʌ
Try assassinate (Marcus, Caesar)
↑ (4)
Person (Marcus)
Try assassinate (Marcus, Caesar)
↑ (8)
Person (Marcus)
9. All men are people
x : man (x) person (x)
Three important issues in converting English sentences into logical statements & then using
those statement to deduce new ones.
Many English statements are ambiguous choosing the correct interpretation may be difficult.
Choice of representation simple representation.
Even in very simple situations, a set of sentences is unlikely to contain all the information
necessary to reason about the topic at hand.
Representing Instance & Is a Relationships. Captured the relationships they are used to express namely class membership & class
inclusion.
Ist part already represented in the representations class membership is represented with unary
predicates (such as Roman) each of which corresponds to a class. Asserting that P(x) is true is
equivalent to asserting that x is an instance (or element) of P.
2nd Class use the instance is a binary one and the 1st argument is an object & 2nd is ______ to
which the object belongs do not use an explicit is a predicate instead sublass relationship – (3)
3rd Contains representation that use both the instance & is a predicate explicitly is a simplifies
the use of (3) but it uses an extra axioms (6)
This additional axiom describes ho an instance relation & is a relation can be combined to
derive a new instance relation.
Fig. Three way of Representing class membership.
1) Man (Marcus) 1) instance (Marcus, man)
2) Pompeian (Marcus) 2) instance (Marcus, Pomp)
3) x : Pomp (x) Roman (x) 3) x : instance (x, Pomp) inst (x, Roman)
4) Ruler (Caesar) 4) instance (Caesar, ruler)
5) x : Roman (x) loyal to (x, Caesar) 5) x : instance (x, Roman)
˅ hate (x, Caesar) loyal to (x, Caesar) ˅ hate (x, Caesar)
1. Inst (Marcus, man)
2. inst (Mar, Pampion)
3. Is a (Pompeian, Roman)
4. Instance (Caesar, ruler)
5. x : instance (x, Roman) loyal to (x, Caesar) ˅ hate (x, Caesar)
6. x : y : z : instance (x,y) Ʌ is (y,2) instance (x, 2)
Forward Reason :
- Branching factor is great - Use some heuristic rules for deciding which answer is more likely & then try to prove that
one first - If it fails efforts is loosed.- Simpley try both answers simultaneously & stop one effort is successful.
3.3 Unification
UNIFICATION ALGORITHM
In propsoitional logic it is easy to determine that two literals can not both be true at the same time. Simply look for L and ~L . In predicate logic, this matching process is more complicated, since bindings of variables must be considered.
For example man (john) and man(john) is a contradiction while man (john) and man(Himalayas) is not. Thus in order to determine contradictions we need a matching procedure that compares two literals and discovers whether there exist a set of substitutions that makes them identical . There is a recursive procedure that does this matching . It is called Unification algorithm. In Unification algorithm each literal is represented as a list, where first element is the name of a predicate and the remaining elements are arguments. The argument may be a single element (atom) or may be another list. For example we can have literals as
( tryassassinate Marcus Caesar)
( tryassassinate Marcus (ruler of Rome))
To unify two literals , first check if their first elements re same. If so proceed. Otherwise they can not be unified. For example the literals
( try assassinate Marcus Caesar)
( hate Marcus Caesar)
Can not be Unfied. The unification algorithm recursively matches pairs of elements, one pair at a time. The matching rules are :
i) Different constants , functions or predicates can not match, whereas identical ones can.
ii) A variable can match another variable , any constant or a function or predicate expression, subject to the condition that the function or [predicate expression must not contain any instance of the variable being matched (otherwise it will lead to infinite recursion).
iii) The substitution must be consistent. Substituting y for x now and then z for x later is inconsistent. (a substitution y for x written as y/x)
The Unification algorithm is listed below as a procedure UNIFY (L1, L2). It returns a list representing the composition of the substitutions that were performed during the match. An
empty list NIL indicates that a match was found without any substitutions. If the list contains a single value F, it indicates that the unification procedure failed.
UNIFY (L1, L2)
1. if L1 or L2 is an atom part of same thing do
(a) if L1 or L2 are identical then return NIL
(b) else if L1 is a variable then do
(i) if L1 occurs in L2 then return F else return (L2/L1)
© else if L2 is a variable then do
(i) if L2 occurs in L1 then return F else return (L1/L2)
else return F.
2. If length (L!) is not equal to length (L2) then return F.
3. Set SUBST to NIL
( at the end of this procedure , SUBST will contain all the substitutions used to unify L1 and L2).
4. For I = 1 to number of elements in L1 do
i) call UNIFY with the i th element of L1 and I’th element of L2, putting the result in S
ii) if S = F then return F
iii) if S is not equal to NIL then do
(A) apply S to the remainder of both L1 and L2
(B) SUBST := APPEND (S, SUBST) return SUBST.
3.4 STRUCTURED KNOWLEDGE REPRESENTATION
Strong Slot & Filler Structures: -No hard & fast rules @ what kinds of objects & links are good in general for knowledge
representation. CD, scripts & cyc embody specific notions of what types of objects & relations are
permitted.
10.1 Conceptual Dependency : - is a theory of how to represent the kind of knowledge about events
that is usually contained in natural language sentences.
The goal is to represent knowledge that
Facilitates drawing inferences from the sentences
Rman
I
Is independent of the language in which the sentences were originally stated.
Conceptual primitives that can be combined to form the meanings of words in any particular
language.
Structure & a specific set of primitives.
I gave the man a book.
I ATRANS book
Arrows indicate direction of dependency
Double arrow indicates two way link between actor & action.
P indicates past tense.
ATRANS is one of the primitive acts used by the theory it indicates transfer of possession
O indicates the object case relation
R indicates the recipient case relation.
A set of primitive acts. Actions are built.
ATRANS Transfer of an abstract relationship (e.g. Give)
PTRANS Transfer of the physical location of an object (e.g. go)
PROPEL Application of physical force to an object (e.g. push)
MOVE Movement of a body part by its owner (e.g. kick)
GRASP Grasping of an object by an actor (e.g. clutch)
INGEST ingestion of an object by an animal (e.g. eat)
EXPEL Expulsion of something from the body of an animal. (e.g. cry)
MTRANS Transfer of mental information (e.g. tell)
MBUILD Building new information out of odd (e.g. decide)
SPEAK Production of sounds (e.g. say)
ATTEND Focusing of a sense organ toward a stimulus (e.g. listen)
2nd Set – Set of allowable dependencies among the conceptualizations described in a sentence.
ACTS - Actions
PPs - objects (picture procedure)
AAs - modifiers of actions (action aiders)
PAs - modifiers of PPs (Picture aiders)
The set of conceptual tenses
P past ? interrogative
F future / negative
P O
P
Pass by
P O
I
o
field
bag
Health (-10)
yesterday
P
Bob
gunI
t transistion nil present
ts start transmission delta timeless
tf finished transition C conditional
K continuing
1) John ran John PTRANS
2) John is tall John height (>average)
3) John is a doctor John doctor
4) A nice boy boy
nice
5) John’s dog boy
nice
6) John pushed the car John PROPEL car
John7) John took the book from Mary John ATRANS
Mary
book
* I gave the man a book
8) John ate ice cream with a spoon John INGEST do
ice creamspoon
9) John fertilized the field John PTRANS
Fertilizedsize > x
10) The plants grew plants size = x
11) Bill shot Bob Bill PROPEL bullet
Bob
12) John ran yesterday John PTRANS
13) While going home I saw a frog I PTRANS t
I MTRANS frog
P
o
P
P
o
P
Home
I
O D
cp
eye
O D
woods
14) I heard a frog in the woods MTRANS frog
Ex :
1) Ram ate the hot cake Ram INGEST
Hot cake
2) Rajiv will go to Delhi Rajiv PTRANS
3) Arjun saw laxmi on the Arjun MTRANSHill with a telescope
4) Shyam qualified the exam Shyam PTRANS
In the first class
3.5 BACKWARD CHAINING ,RESOLUTION
P
O
Ocp
ears
R
P
O
Delhi
Rajiv
D
I
P
O
Cp
eyes
R
hill
laxmi Arjun
Use/do
O
telescope
Exam
Shyam
R
First class
3.6 RESOLUTION
Resolution :
Precisely one of winter & the winter will be true at any point if winter is true then cold must
be true to guaranteed the truth of the 2nd clause.
A proof procedure that carried out in a single operation the variety of processes involved in
reasoning with statements in predicate logic.
It produces proof by refutation given any two clauses A and B if there is a literal P1 in A
which has a complementary literal P2 in B delete P1 & P2. From A & B and construct a disjunction
the remaining clauses.
The clauses so constructed called the resolvent of A and B.
1) A : P ˅ Q ˅ R
B : - P ˅ Q ˅ R
C : - Q ˅ R
A B
Q ˅ R C
R
2) A : P ˅ Q ˅ R
B : - P ˅ R
C : - Q
D : - R
Example : Winter ˅ summer Winter ˅ cold Summer ˅ cold new clause
* Theorem Proving inferred using resolution from old.
Two methods i) Start with the given axioms use the rules of inference & prove ii) Prove that the negation of the result is not true.
Given that a) Physician (Bhaskar) ….. (2)
b) x : Physician (x) knows surgery (x) ……(3) Method 2
Knows _surgery (Bhaskar) ……(1) Equation (3)
Physician (x) knows surgery (x) ……(4)
It gives substitute x = Bhaskar - Physician (x) knows surgery (x) …..(6)
It contradicts assumption that was made.
3.7 SEMANTIC NETS
Semantic Networks.
Semantic Network is a structure for representing knowledge as a pattern of interconnected
nodes and arcs. It is also defined as a graphical representation of knowledge.
The objects under consideration serves as nodes & the relationships with another node give the
arcs.
Nodes represent
A B
X : Q ˅ R C
Y : R D
Z : Nil
Entities, Attributes, States or Events Arcs in the network give the relationship between the
nodes & Labels on the arc specify what type of relationship actually exists.
Weak slot and filler structures. Knowledge is structural as a set of entities and their attributes.
“Knowledge poor” structures “weak”
Is a & instance relations.
A semantic Network
Question : What is the connection between the Brooklyn Dodgers & Blue.
Has part
Uniform color
team
Mammal
Person
Pee-wee ReeseBlue Brooklyn Dodgers
Nose
is a
instance
has
is a
Scooter Two - wheeler Motor – bike
Brakes Moving – vehicles Engine
Electrical system Fuel - system
is a is a
has
has has
This kind of reasonin exploits one of the important advantages that slot & filler
structures have over purely logical representation because of entity based organization.
* John gave the book to Marry
John is taller than Bill
Partitioned Semantic Nets
The dog bit the mail carrier nodes d,b, & m represent a particulare dog a particular biting & a
particular mail carrier. A single net with no partitioning.
agent
object
Give
EV 7 John B K 23
Bookinstance
instance
Mary
Beneficiary
John
H1 H2
Bill
Height
Greater than
Height
dogs
d b M
Male carrier
bite
is ais a is a
assailant victim
Every dog has bitten a mail carrier …. (b)
x : Dog (x) Ǝ y : Mail carrier (y) Ʌ Bite (x,y)
Node g stands for the assertion given aboveg is an instance of the special class gs of general statements about the world (i.e. those with universal quantifiers)Every element of gs has too attributes a form which states the relation being asserted & one or more connections.
For every dog d there exist ability event b & a mail carrier m.Every dog in town has bitten the constable. C lies outside existential quantifier. Thus it is not viewed as an existentially quantified variable whose value may depend on the value of d.
b M
Male carrier
bitedogs
d
is ais a is a
assailant victim
Gs
g
is a
bite
is a
Gs
g b C
constablesdogs
d
is a
victim
SA
assailant
Town dogs
is ais aform
S1
Every dog has bitten every mail carrier
Space S1 is included in SA
3.8 FRAMES
FRAMES :- means of representing common sense knowledge. Knowledge is organized into small packets called “Frames”. All frames of a given situation constitute the system.
A frame can be defined as a d at a structure that has slots for various objects & a collection of frames consist of expectation for a given situation. Frame are used to represent two types of knowledge viz. declarative/factual and procedural, declarative & procedural Frames: -
A frame that merely contains description about objects is call a declarative type/factual situational frame.
Name of the frame
Slots in the frame
Name : Computer Centre
9gs
b M
Male carrier
bitedogs
d
is ais a is a
assailant victim
form
is a
SA
S1
A/c Stationary cupboard
Computer Dumb terminals
Printer
Frames which have procedural knowledge embedded in it are called action procedure frames. The action frame has the following slots.
Actor slot which holds information @ who is performing the activity. Source Slot hold information from where the action has to begin. Destination slot holds information about the place where action has to end. Task slot This generates the necessary sub frames required to perform the operation.
Linking of procedural sub frames
Name : Cleaning the ict of carburetor
Actor
Object
Source DestinationScooter Scooter
Task 1 Task 2 Task 3Remove
CarburetorClean Nozzle Fix
Carburetor
Name : Remove Carburetor
Actor Object
Source Destination
Task 1 Task 2 Task 3Remove
CarburetorClean Nozzle Fix Carburetor
Expert Carburetor
Scooter Scooter
Expert
3.9 SCRIPTS, ONTOLOGY.
Scripts : -
A mechanisms for representing knowledge about common sequences of events.
A script is a structure that describes a stereotyped sequence of events in a particular content
consist of slots contains values/default values.
Components of a script
Entry conditions – conditions before the events described in the script can occur.
Result – conditions that will in general be true after the events described in the script have occurred.
Props - slots representing objects that are involved in the event described in the script.
Roles – Slots representing people who are envolved in the events described in the script.
Track – The specific variation on a more general pattern that is represented by this particular script.
Scenes – The actual sequences of events that occur.
Pseudo form of a restaurant script
Script : Going to a restaurant
Props : Food
Tables
Menu
Money
Roles : Owner
Customer
Waiter
Cashier
Scene1 : Entering the restaurant.
Enters the restaurant.
scans the tables chooses the best one.
decides to sit there.
goes there.
occupies the seat.
Entry conditions
Customer is hungry
Customer has money
Owner has food.
Scene 2: Ordering the food.
Customer asks for menu.
Waiter brings it.
Customer glances it.
Chooses what to eat.
Orders that item.
Results :
Customer is hungry.
Owner has more money.
Customer has less money.
Owner has less food.
Scene 3 :
Eating the food.
Waiter brings the food.
Customer eats it.
UNIT IV
CO.4 Demonstrate working knowledge of reasoning in the presence of incomplete and/or uncertain information by applying Bayesian Networks and Fuzzy Logic.
4.1 HANDING UNCERTAIN KNOWLEDGE
UncertaintyDefinition: Uncertainty means that many of the simplifications that are possible with deductive inference are no longer valid.
Why does uncertainty arise? Agents almost never have access to the whole truth about their environment. Agents cannot find a categorical answer. Uncertainty can also arise because of incompleteness, incorrectness in agents understanding of
properties of environment.
To act rationally under uncertainty we must be able to evaluate how likely certain things are. With FOL a fact F is only useful if it is known to be true or false. But we need to be able to evaluate how likely it is that F is true. By weighing likelihoods of events (probabilities) we can develop mechanisms for acting
rationally under uncertainty.
4.2 RATIONAL DECISIONS, BASICS OF PROBABILITY
Probabilistic reasoningUsing logic to represent and reason we canrepresent knowledge about the world with facts and rules, like the following ones:
bird(tweety). fly(X) :- bird(X).
We can also use a theorem-prover to reason about the world and deduct new facts about the world, for e.g.,
?- fly(tweety). Yes
However, this often does not work outside of toy domains - non-tautologous certain rules are hard to find.
A way to handle knowledge representation in real problems is to extend logic by using certainty factors.
In other words, replace IF condition THEN fact with IF condition with certainty x THEN fact with certainty f(x)
Unfortunately cannot really adapt logical inference to probabilistic inference, since the latter is not context-free.
Replacing rules with conditional probabilities makes inferencing simpler.
Replace smoking -> lung cancer or lotsofconditions, smoking -> lung cancer with P(lung cancer | smoking) = 0.6
Uncertainty is represented explicitly and quantitatively within probability theory, a formalism that has been developed over centuries.
A probabilistic model describes the world in terms of a set S of possible states - the sample space. We don’t know the true state ofthe world, so we (somehow) come up with a probability distribution over S which gives the probability of any state being the true one. The world usually described by a set of variables or attributes.
Default Reasoning & the closed world assumption.
-- uncertainty as a result of incomplete knowledge.
-- plausible default assumptions. Pat @ 20 yr old normally assume that
-- Learned that Pat has suffered from blackouts.
-- U will be forced to revise your beliefs.
Expressed as a (x) : Mb1 (x) ……. Mb2 (x)
c (x)
a(x) precondition wff for the conclusion wff c(x)
M is the consistency operator & the bi (x) are conditions each of which must be separately consistent
with the KB for the conclusion c (x) to hold.
Symbolic Reasoning under uncertainty.
Introduction to Non monotonic Reasoning.
ABC Murder story.
Abbott, Babbit & Cabot be suspects in a murder case
- Uncertain fuzzy & often changing knowledge
- Non monotonic Reasoning :- axioms and/or the rules of inference are extended to make it
possible to reason with incomplete information.
These systems preserve however the property that any given moment a statement is
either believed to be tree, to be false or not believed to be either.
- Statistical Reasoning :- Representation is allowed to have numeric measure of certainty.
Abbot has an alibi, in the register of a respected hotel.
Babbit has a alibi, for his brother in law testified that babbit was visiting him in Brooklyn at the
time.
Cabot pleads alibi too, claiming to have been watching a ski meet in the cat skills.
Belief
1) The Abbott did not commit the crime.
2) The Babbitt did not.
3) That Abbott or Babbitt or Cabot did.
Goodluck. Cabot documents his alibi caught by television in the sidelines at the skimees.
A new belief thrust upon us.
4) That cabot did not.
- Technique for maintaining several parallel belief spaces.
- Conventional Reasoning systems 1st order logic are designed to work with information that has
three important properties.
i) It is complete with respect to the domain of interest.
ii) It is consistent.
iii) New facts can be added as they become available – monotonicity.
Nonmonotonic reasoning systems on the other hand are designed to be able to solve problems in
which all of these ppts are missing.
No reason to suspect the crime then assume he didn’t
Make clear the distinction between .
It is known that >P
It is not known whether P.
Predicate logic 1st instance 2nd as well a system.
Any inference that depends on the lock of some piece of knowledge a non monotonic inference.
- Inferences based on lack of knowledge.
- A new assertions are made.
- Defeasible nonmonotic inference may be defeated (rendered invalid)
N.M.R. doesn’t share this ppt T entails trust then T combined with N also entails W.
- How can a knowledge base be updated properly.
Valid sets of justification.
Abott is in town this week & so is available to testify but if we wait until next week he may be
out of town.
- Techniques for maintaining valid sets of justifications.
- How can knowledge be used to help resolve conflicts when there are several inconsistent non-
monotonic inferences that could be drawn.
- Contradiction to resolve.
- Locally consistent globally inconsistent no options to believe all of them at once.
Statistical Reasoning
- Problem in which genuine randomness in the world – playing card.
- Likelihood of various outcomes exploit it.
- 2nd class of Problem - No randomness behaves normally – unless somekind of exception –
common sense tasks : for which statistical function as summaries of the world.
- Numerical summary that tells us how often an exceptions of some sort can be expected to
occur.
* Probability & Baye’s Theorem.
- Collect evidence &
- Modify its behavior on the basis of the evidence.
Bayesian statistics is a statistical theory of a evidence
Conditional pbt P (H/E)
Pbt of hypothesis - H given that E evidence observed.
For which we require
Prior pbt of H & the extent to which E provides evidence of H.
Define universe.
Then let
P (Hi/E) – pbt that hypothesis Hi is true given E
P (E /Hi) - pbt that we will observe evidence E given that hypothesis i is true.
P (Hi) - a prior pbt that hypothesis i is a true in the absence of any specific evidence.
K - no. of possible hypothesis.
Bayes theorem then states that
P (Hi/E) = (E /Hi) . P (Hi)
Σkn=1 P (E/Hn).)(Hn)
* Examining the geological evidence at a particular location to determine whether that would be a
good place to dig to find a desired mineral
Minerals Copper, Uranium.
* Medical diagnosis problem.
S : Patient has spots.
M : Patient has measles.
F : Patient has fever.
- Conditional pbts that arises from their conjuction.
- Given a prior body of evidence e & some new observations E we need to compute
P (H/E, e) = P (H/E). P(e/E,H) joints pbts.
P(e/E)
Bayes theorem intractable for several reasons.
Knowledge acquisition problems in surmountable.
- Too many plots – substantial empirical evidence – people are poor pbt estimation.
Space to store pbts too large.
Time required to compute pbts too large.
Despite these problems. B.T. attractive basis for an uncertain reasoning system.
* Certain factos & Rule based system
MYCIN Attemps to recommend appropriate therapies with bacterial infections.
Certain by fact in the rules consequent.
Rule
If
i) Stain of organism is gram positive.
ii) Morphology Coccus &
iii) Growth conformation clumps.
Action
Then identify of the organism is staphylo coccus
Q. Certainty factor – measure between 0 to 1
If 0 evidence fails to support the hypothesis measure.
Cf = evidence either supports or denies a hypothesis.
- Strong independence assumptions that make it relatively easy to use.
- Assumptions create dangers if rules are not written carefully.
S : Sprinkler was on last night
W : Grass is wet
R : it rained last night
Rules:-
If the sprinkler was on last night then there is suggestive evidence (O.S) that the grass will be
wet this morning.
C.F. – 0.8 sprinkler suggest wet.
- 0.72 wet suggest rain.
Believe that it rained because we believe the sprinkler was on.
Danger whenever justification of a belief are important to determining its consequences.
Need to know – why we believe the grass is wet.
Bayesian Networks :-
CF as a mechanism for reducing the complexity of a Bayesian reasoning system.
B-N /ws preserve the formalism & rely instead on the modularity of the world.
constraints networks
ways of representing knowledge as sets of constraints.
2 ways propositions can influence the likelihood of each other
i) Causes influence the likelihood of their symptoms.
ii) Observing a symptom affects the likelihood of all of its possible causes.
Bayesian Network make a clear distinction @ two influences.
Representing causality uniformly
A graph contains an additional node corresponding to the propositional variable that tell us whether it
is currently a raining season.
- More information is needed to use as probabilistic reasoning.
- Pbt tablers are provided.
- The pbt of rain on given night is 0.9
> rain is 0.1
- Need a mechanism for computing the influence of any arbitrary node on any other.
Fig conditional pbts for a Bayesian Network
Attribute pbt
P(wet/sprinkler, Rain) 0.95
P(wet/7sprinkler, Rain) 0.9
(Rain) 0.8
P(wet/7sprinkler, Rain)
(Rain) 0.1
4.3 AXIOMS OF PROBABILITY
Review of probabilityGiven a set U (universe), a probability function is a function defined over the subsets of U that maps each subset to the real numbers and that satisfies the Axioms of Probability
1. Pr(U) = 12. Pr(A) ∈[0,1]3. Pr(A ∪B) = Pr(A) + Pr(B) –Pr(A ∩B)
Note if A ∩B = {} then Pr(A ∪B) = Pr(A) + Pr(B)
Sprinkler
Wet
Rain Sprinkler
Wet
Rain
Rainy season
The primitives in probabilistic reasoning are random variables. Just like primitives in Propositional Logic are propositions. A random variable is not in fact a variable, but a function from a sample space S to another space, often the real numbers. For example, let the random variable Sum (representing outcome of two die throws) be defined thus: Sum(die1, die2) = die1 +die2
Each random variable has an associated probability distribution determined by the underlying distribution on the sample space
Continuing our example: P(Sum = 2) = 1/36, P(Sum = 3) = 2/36, . . . , P(Sum = 12) = 1/36
Consider the probabilistic model of the fictitious medical expert system mentioned before. The sample space is described by 8 binary valued variables. Visit to Asia? A Tuberculosis? T Either tub. or lung cancer? E Lung cancer? L Smoking? S Bronchitis? B Dyspnoea? D Positive X-ray? X
There are 28= 256 events in the sample space. Each event is determined by a joint instantiation of all of the variables. S = {(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f), (A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t), . . . (A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t)}
Since S is defined in terms of joint instantiations, any distribution defined on it is called a joint distribution. ll underlying distributions will be joint distributions in this module. The variables {A, T, E, L, S, B, D, X} are in fact random variables, which ‘project’ values. L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f) = f L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t) = f L(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t) = t
Each of the random variables { A, T, E, L, S, B, D, X } has its own distribution, determined by the underlying joint distribution. This is known as the margin distribution. For example, the distribution for L is denoted P(L), and this distribution is defined by the two probabilities P(L = f) and P(L = t). For example,
P(L = f) = P(A = f, T = f,E = f,L = f,S = f,B = f,D = f,X = f) + P(A = f, T = f,E = f,L = f,S = f,B = f,D = f,X = t) + P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f) . . . P (A = t, T = t,E = t,L = f, S = t,B = t,D = t,X = t)
P (L) is an example of a marginal distribution.
Here’s a joint distribution over two binary value variables A and B.
We get the marginal distribution over B by simply adding up the different possible values of A for any value of B (and put the result in the “margin”).
In general, given a joint distribution over a set of variables, we can get the marginal distribution over a subset by simply summing out those variables not in the subset.
In the medical expert system case, we can get the marginal distribution over, say, A, D by simply summing out the other variables:
However, computing marginals is not an easy task always. For example,
P(A = t, D = f) = P(A = t, T = f,E = f,L = f,S = f,B = f,D = f,X = f) + P(A = t, T = f,E = f,L = f,S = f,B = f,D = f,X = t) + P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = f) + P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = t) . . . P(A = t, T = t,E = t,L = t, S = t,B = t,D = f,X = t)
This has 64 summands! Each of whose value needs to be estimated from empirical data. For the estimates to be of good quality, each of the instances that appear in the summands should appear sufficiently large number of times in the empirical data. Often such a large amount of data is not available.
However, computation can be simplified for certain special but common conditions. This is the condition of independence of variables.
Two random variables A and B are independent iff
P(A,B) = P(A)P(B)
i.e. can get the joint from the marginals
This is quite a strong statement: It means for any value x of A and any value y of B
P(A = x, B = y) = P(A = x)P(B = y)
Note that the independence of two random variables is a property of a the underlying
probability distribution. We can have
Conditional probability is defined as:
It means for any value x of A and any value y of B
If A and B are independent then
Conditional probabilities can represent causal relationships in both directions. From cause to (probable) effects
From effect to (probable) cause
4.4 BAYE’S RULE AND CONDITIONAL INDEPENDENCE , BAYESIAN NETWORKS
Bayesian NetworksRepresentation and Syntax
Bayes nets (BN) (also referred to as Probabilistic Graphical Models and Bayesian Belief Networks) are directed acyclic graphs (DAGs) where each node represents a random variable. The intuitive meaning of an arrow from a parent to a child is that the parent directly influences the child. These influences are quantified by conditional probabilities.
BNs are graphical representations of joint distributions. The BN for the medical expert system mentioned previously represents a joint distribution over 8 binary random variables {A,T,E,L,S,B,D,X}.
Fig.4.1 Bayesian Network for Medical Expert System
Conditional Probability TablesEach node in a Bayesian net has an associated conditional probability table or CPT. (Assume
all random variables have only a finite number of possible values). This gives the probability values for the random variable at the node conditional on values for its parents. Here is a part of one of the CPTs from the medical expert system network.
If a node has no parents, then the CPT reduces to a table giving the marginal distribution on that random variable.
Consider another example, in which all nodes are binary, i.e., have two possible values, which we will denote by T (true) and F (false).
Fig.4.2: Conditional Probability TablesWe see that the event "grass is wet" (W=true) has two possible causes: either the water
sprinkler is on (S=true) or it is raining (R=true). The strength of this relationship is shown in the table. For example, we see that Pr(W=true | S=true, R=false) = 0.9 (second row), and hence, Pr(W=false | S=true, R=false) = 1 - 0.9 = 0.1, since each row must sum to one. Since the C node has no parents, its CPT specifies the prior probability that it is cloudy (in this case, 0.5). (Think of C as representing the season: if it is a cloudy season, it is less likely that the sprinkler is on and more likely that the rain is on.)
Semantics of Bayesian Networks The simplest conditional independence relationship encoded in a Bayesian network can be
stated as follows: a node is independent of its ancestors given its parents, where the ancestor/parent relationship is with respect to some fixed topological ordering of the nodes. In the sprinkler example above, by the chain rule of probability, the joint probability of all the nodes in the graph above is,
P(C, S, R, W) = P(C) * P (S|C) * P(R|C,S) * P(W|C,S,R)
By using conditional independence relationships, we can rewrite this as
P(C, S, R, W) = P(C) * P(S|C) * P(R|C) * P(W|S,R)
Where we were allowed to simplify the third term because R is independent of S given its parent C, and the last term because W is independent of C given its parents S and R. We can see that the conditional independence relationships allow us to represent the joint more compactly. Here the savings are minimal, but in general, if we had n binary nodes, the full joint would require O(2^n) space to represent, but the factored form would require O(n 2^k) space to represent, where k is the maximum fan-in of a node. And fewer parameters make learning easier.
The intuitive meaning of an arrow from a parent to a child is that the parent directly influences the child. The direction of this influence is often taken to represent casual influence. The conditional probabilities give the strength of causal influence. A 0 or 1 in a CPT represents a deterministic influence.
Decomposing Joint distributionsA joint distribution can always be broken down into a product of conditional probabilities
using repeated applications of the product rule.
We can order the variables however we like:
Conditional Independence in Bayes Net A Bayes net represents the assumption that each node is conditionally independent of all its
non-descendants given its parents.So for example,
Fig.4.3: Example for Conditional Bayes Net
Note that, a node is NOT independent of its descendants given its parents. Generally,
Variable ordering in Bayes NetThe conditional independence assumptions expressed by a Bayes net allow a compact
representation of the joint distribution. First note that the Bayes net imposes a partial order on nodes: X <= Y iff X is a descendant of Y. We can always break down the joint so that the conditional probability factor for a node only has non-descendants in the condition.
Fig.4.4: Variable Ordering in Bayes Net
The Joint Distribution as a Product of CPTs Because each node is conditionally independent of all its non descendants given its parents,
and because we can write the joint appropriately we have:
So the CPTs determine the full joint distribution. In short,
Bayesian Networks allow a compact representation of the probability distributions. An unstructured table representation of the “medical expert system” joint would require 28−1 = 255 numbers. With the structure imposed by the conditional independence assumptions this reduces to 18 numbers. Structure also allows efficient inference — of which more later.
Conditional Independence and d-separation in a Bayesian NetworkWe can have conditional independence relations between sets of random variables. In the
Medical Expert System Bayesian net, {X, D} is independent of {A, T, L, S} given {E, B} which means:P(X, D | E, B) = P(X,D | E, B, A, T, L, S) equivalently . . . P(X, D, A, T, L, S | E, B) = P(A,T, L, S | E, B)P(X, D | E, B)
We need a way of checking for these conditional independence relations
Conditional independence can be checked using the d-separation property of the Bayes net directed acyclic graph. d-separation is short for direction-dependent separation.
Fig.4.5: Conditional Independence and d-separation in a Bayesian Network
If E d-separates X and Y then X and Y are conditionally independent given E.
E d-separates X and Y if every undirected path from a node in X to a node in Y is blocked given E.
Defining d-separation:
A path is blocked given a set of nodes E if there is a node Z on the path for which one of these three conditions holds:
1. Z is in E and Z has one arrow on the path coming in and one arrow going out. 2. Z is in E and Z has both path arrows leading out. 3. Neither Z nor any descendant of Z is in E, and both path arrows lead in to Z.
Building a Bayes Net: The Family Out? Example
We start with a natural language description of the situation to be modeled:
I want to know if my family is at home as I approach the house. Often my wife leaves the light on if she goes out, but also sometimes if she is expecting a guest. When nobody is in home the dog is put in backyard, but he is also put there when he has bowl trouble. If the dog is in backyard, I will hear her barking, but I may be confused some other dog is barking.
Building the Bayes net involves the following steps.
We build Bayes net to get probabilities concerning what we don’t know given what we do know. What we don’t know is not observable. These are called hypothesis events – we need to know what the hypothesis events in a problem are?
Recall that a Bayesian network is composed of related (random) variables, and that a variable incorporates an exhaustive set of mutually exclusive events - one of its events is true. How shall we represent two hypothesis events in a problem?
Variables whose values are observable and which are relevant to the hypothesis events are called information variables. What are the information variables in a problem?
In this problem we have three variables, what is the causal structure between them? Actually, the whole notion of ‘cause’ let alone ‘determining causal structure’ is very controversial. Often (but not always) your intuitive notion of causality will help you.
Sometimes we need mediating variables which are neither information variables nor hypothesis variables to represent causal structures.
Learning of Bayesian Network Parameters One needs to specify two things to describe a BN: the graph topology (structure) and the
parameters of each CPT. It is possible to learn both of these from data. However, learning structure is much harder than learning parameters. Also learning some of the nodes are hidden, or we have missing data, is much harder than when everything is observed.
This gives rise to 4 cases:
We discuss only the first case only.
Known structure, full observability
We assume that the goal of learning in this case is to find the values of the parameters of each CPT which maximizes the likelihood of the training data, which contains N cases (assumed to be independent). The normalized log-likelihood of the training set D is a sum of terms, one for each
We see that the log-likelihood scoring function decomposes according to the structure of the graph, and hence we can maximize the contribution to the log-likelihood of each node independently (assuming the parameters in each node are independent of the other nodes). In cases where N is small compared to the number of parameters that require fitting, we can use a numerical prior to regularize problem. In this case, we call the estimates Maximum A Posterori (MAP) estimates, as opposed to Maximum Likelihood (ML) estimates.
Consider estimating the Conditional Probability Table for the W node. If we have a set of training data, we can just count the number of times the grass is wet when it is raining and the sprinkler is on, N(W=1,S=1,R=1), the number of times the grass is wet when it is sprinkler is off. N(W=1,S=1,R=1),etc, Given these counts (which are the sufficient statistics), we can find the Maximum Likelihood Estimate of the CPT as follows:
where the denominator is N(S=s,R=r) = N(W=0,S=s,R=r) + N(W=1,S=s,R=r). Thus "learning" just amounts to counting (in the case of multinomial distributions). For Gaussian nodes, we can compute the sample mean and variance, and use linear regression to estimate the weight matrix. For other kinds of distributions, more complex procedures are necessary.
As is well known from the HMM literature, ML estimates of CPTs are prone to sparse data problems, which can be solved by using (mixtures of) Dirichlet priors (pseudo counts). This results in a Maximum A Posteriori (MAP) estimate. For Gaussians, we can use a Wishart prior, etc.
4.5 EXACT AND APPROXIMATE INFERENCE IN BAYESIAN NETWORKS
Inferencing in Bayesian Networks Exact Inference The basic inference problem in BNs is described as follows: Given 1. A Bayesian network BN 2. Evidence e - an instantiation of some of the variables in BN (e can be empty) 3. A query variable Q
Compute P(Q|e) - the (marginal)conditional distribution over Q
Given what we do know, compute distribution over what we do not. Four categories of inferencing tasks are usually encountered. 1. Diagnostic Inferences (from effects to causes)
Given that John calls, what is the probability of burglary? i.e. Find P(B|J) 2. Causal Inferences (from causes to effects) Given Burglary, what isthe probability that John calls, i.e. P(J|B) Mary calls, i.e. P(M|B) 3. Intercausal Inferences (between causes of a common event)
Given alarm, what is the probability of burglary? i.e. P(B|A) Now given Earthquake, what is the probability of burglary? i.e. P(B|AE) 4. Mixed Inferences (some causes and some effects known) Given John calls and no Earth quake, what is the probability of Alarm, i.e. P(A|J,~E)
We will demonstrate below the inferencing procedure for BNs. As an example consider the following linear BN without any apriori evidence.
Consider computing all the marginals (with no evidence). P(A) is given, and
We don't need any conditional independence assumption for this.
For example, suppose A, B are binary then we have
Now,
P(B) (the marginal distribution over B) was not given originally. . . but we just computed it in the last step, so we’re OK (assuming we remembered to store P(B) somewhere).
If C were not independent of A given B, we would have a CPT for P(C|A,B) not P(C|B).Note that we had to wait for P(B) before P(C) was calculable.
If each node has k values, and the chain has n nodes this algorithm has complexity O(nk2). Summing over the joint has complexity O(kn).
Complexity can be reduced by more efficient summation by “pushing sums into products”.
Approximate Inferencing in Bayesian Networks Many real models of interest, have large number of nodes, which makes exact inference very
slow. Exact inference is NP-hard in the worst case.) We must therefore resort to approximation techniques. Unfortunately, approximate inference is #P-hard, but we can nonetheless come up with approximations which often work well in practice. Below is a list of the major techniques.
Variational methods. The simplest example is the mean-field approximation, which exploits the law of large numbers to approximate large sums of random variables by their means. In particular, we essentially decouple all the nodes, and introduce a new parameter, called a variational parameter, for each node, and iteratively update these parameters so as to minimize the cross-entropy (KL distance) between the approximate and true probability distributions. Updating the variational parameters becomes a proxy for inference. The mean-field approximation produces a lower bound on the likelihood. More sophisticated methods are possible, which give tighter lower (and upper) bounds.
Sampling (Monte Carlo) methods. The simplest kind is importance sampling, where we draw random samples x from P(X), the (unconditional) distribution on the hidden variables, and then weight the samples by their likelihood, P(y|x), where y is the evidence. A more efficient approach in high dimensions is called Monte Carlo Markov Chain (MCMC), and includes as special cases Gibbs sampling and the Metropolis-Hasting algorithm.
Bounded cutset conditioning. By instantiating subsets of the variables, we can break loops in the graph. Unfortunately, when the cutset is large, this is very slow. By instantiating only a subset of values of the cutset, we can compute lower bounds on the probabilities of interest. Alternatively, we can sample the cutsets jointly, a technique known as block Gibbs sampling.
Parametric approximation methods. These express the intermediate summands in a simpler form, e.g., by approximating them as a product of smaller factors. "Minibuckets" and the Boyen-Koller algorithm fall into this category.
4.6 FUZZY LOGIC
Fuzzy…• We start describing things in a slightly vague and fuzzy manner.
• For instance after seeing a long list of names, you tell your friend that his was cited somewhere near the middle of the list.
• The word near seems to be comprehended effortlessly by the human brain but what of computing systems?
• What does near mean in this context?
• Two names below the middle or five above or...?
• Is there a way we can make these number crunching systems understand this concept?
“Drive slowly”.• Does it mean you should drive at 10, 20 or 20.5 km/hour.
• The answer could be any value or a very different one depending on the context.
• If it is ascertained in a machine that any speed less than or equal to 20km/ hour means slow speed and anything above is fast, then does it mean that 20.1 km/hour (or 20.01 km/hour for that matter) is fast?
• This is an exaggeration in the real world.
Well then what’s fuzzy logic….• Fuzzy logic deals with how we can capture this essence of comprehension and embed it on the
system by allowing for a gradual transition from slow to high speeds.
• This comprehension, as per Lotfi Zadeh, the founder of the fuzzy logic concept, confers a higher machine intelligence quotient to computing systems.
Crisp Set• The conventional machine uses crisp sets to take care of concepts like fast and slow speeds.
• It relates speed to crisp values thereby forming members that either belong to a group or do not belong to it.
• For example
Slow = {0,5,10,15,20,25,30,35,40} could mean a crisp set that says that when the value of speed is equal to either of those belonging to the set then the speed is categorized as slow.
Problem in Crisp sets• Continuously keep jerking if the speed oscillates in the interval [39,41]
• Situation could eventually cause harm and subsequent damage.
• Requirement: An alternative to a crisp set definition of speed.
Fuzzy Sets• Fuzzy sets introduce a certain amount of vagueness to reduce complexity of comprehension.
• It consists of elements that signify the degree or grade of membership to a fuzzy aspect.
• Membership values usually use closed intervals and denote the sense of belonging of a member of a crisp set to a fuzzy set.
• A crisp set A comprising of elements that signify the ages of a set of people in years.
• A= {2,4,10,15,21,30,35,40,45,60,70}
Ages and their membership to a particular setAge Infant Child Adolescent Young Adult Old
2 1 0 0 1 0 0
4 0.1 0.5 0 1 0 0
10 0 1 0.3 1 0 0
15 0 0.8 1 1 0 0
21 0 0 0.1 1 0.8 0.1
30 0 0 0 0.6 1 0.3
35 0 0 0 0.5 1 0.35
40 0 0 0 0.4 1 0.4
45 0 0 0 0.2 1 0.6
60 0 0 0 0 1 0.8
70 0 0 0 0 1 1
Fuzzy terminology
• Universe of Discourse (U):
This is defined as the range of all possible values that comprise the input to the fuzzy system.• Fuzzy Set
Any set that empowers its members to have different grades of membership (based on a membership function) in an interval [0,1] is a fuzzy set.
• Membership function
The membership function μA which forms the basis of a fuzzy set is given by μA: U [0,1]
where the closed interval is one that holds real numbers. • Support of a fuzzy set (Sf):
• The support S of a fuzzy set f, in a universal crisp set U is that set which contains all elements of the set U that have a non-zero membership value in f. For instance, the support of the fuzzy set adult is
Sadult = {21,30,35,40,45,60,70}
Fuzzy terminology• Depiction of a fuzzy set:
• A fuzzy set f in a universal crisp set U, is written as
• f = μ1 /s1 + μ2 /s2 + μ3 /s3 + … + μn /sn
• where μi is the membership and si is the corresponding term in the support of f i.e. Sf.
• This is however only a representation and has no algebraic implication (the slash and + signs do not have any meaning).
• Accordingly,
• Old = 0.1/21 +0.3 /30 +0.35/35 + 0.4/40 +0.6/45 +0.8/60 + 1/70
Fuzzy Set Operations• Union: The membership function of the union of two fuzzy sets A and B is defined as the
maximum of the two individual membership functions. It is equivalent to the Boolean OR operation.
μAB = max(μA, μB)• Intersection: The membership function of the intersection of two fuzzy sets A and B is defined
as the minimum of the two individual membership functions and is equivalent to the Boolean AND operation.
μAB = min(μA, μB)• Complement:
The membership function of the Complement of a fuzzy set A is defined as the negation of the specified membership function.. This is equivalent to the Boolean NOT operation μ = μAB = (1-μA)
• It may be further noted here that the laws of Associativity, Commutativity , Distributivity and De Morgan’s laws hold in fuzzy set theory too.
A Fuzzy System• Uses the concepts of fuzzy logic
• The logic is used when a mathematical model is missing or difficult to model.
A Fuzzy Room Cooler
Fuzzy regions• Temperature: Cold, Cool, Moderate, Warm and Hot
• Fan Speed: Slack, Low, Medium, Brisk, Fast
• Flow-rate: Strong-Negative (SN), Negative (N), Low-Negative (LN), Medium (M), Low-Positive (LP), Positive (P) and High-Positive (HP).
Fuzzy profiles
Some Fuzzy Rules• R1: If temperature is HOT and fan motor speed is SLACK then flow-rate is HIGH-POSITIVE.
• R2: If temperature is HOT and fan motor speed is LOW then flow-rate is HIGH-POSITIVE
• R3: If the temperature is HOT and fan motor speed is MEDIUM then the flow-rate is POSITIVE.
• R4: If the temperature is HOT and fan motor speed is BRISK then the flow-rate is HIGH-POSITIVE.
• R5: If the temperature is WARM and fan motor speed is MEDIUM then the flow-rate is LOW-POSITIVE,
• R6: If the temperature is WARM and fan motor speed is BRISK then the flow-rate is POSITIVE.
UNIT V
CO.5 Ability to apply learning in problem solving , learning probabilistic models.
5.1 WHAT IS LEARNING
What is Learning? “… changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time.” [Simon, 1983]
Types of Learning
1. Rote Learning
Rote learning technique avoids understanding the inner complexities but focuses on memorizing the material so that it can be recalled by the learner exactly the way it was read or heard.
• Learning by Memorization which avoids understandingthe inner complexities the subject that is being learned;Rote learning instead focuses on memorizing the material so that it can be recalled by the learner exactly the way it was read or heard.
• Learning something by Repeating over and over and over again; saying the same thing and trying to remember how to say it; it does not help us to understand; it helps us to remember, like we learn a poem, or a song, or something like that by rote learning.
2. Learning from Example : Induction
A process of learning by example. The system tries to induce a general rule from a set of observed instances. The learning methods extract rules and patterns out of massive data sets.The learning processes belong to supervised learning, does classification and constructs class definitions, called induction or concept learning.The techniques used for constructing class definitions (or concept leaning) are :
• Winston's Learning program3.1 Winston's Learning
Winston (1975) described a Blocks World Learning program. This program operated in a simple blocks domain. The goal is to construct representation of the definition of concepts in the blocks domain.Example : Concepts such a "house".
■ Start with input, a line drawing of a blocks world structure. It learned Concepts House, Tent, Arch as :brick (rectangular block) with a wedge (triangular block) suitably placed on top of it, tent – as 2 wedges touching side by side, or an arch – as 2 non-touching bricks supporting a third wedge or brick.
■ The program for Each concept is learned through near miss. A near miss is an object that is not an instance of the concept but a very similar to such instances.
■ The program uses procedures to analyze the drawing and construct a semantic net representation.
■ An example of such an structural for the house is shown below.
Object - house Semantic net
Wedge Brick
■ Node A represents entire structure, which is composed of two parts : node B, a Wedge, and node C, a Brick.Links in network include supported-by, has-part, and isa.
• Winston's Program■ Winston's program followed 3 basic steps in concept formulation:
1. Select one known instance of the concept. Call this the concept definition.
2. Examine definitions of other known instance ofthe concept. Generalize the definition to include them.
3. Examine descriptions of near misses. Restrict the definition to exclude these.
■ Both steps 2 and 3 of this procedure rely heavily on comparison process by which similarities and differences between structures can be detected.
■ Winston's program can be similarly applied to learn other concepts such as "ARCH".
has-partA
has-part
BSupported - by
C
isa isa
4. Explanation Based Learning (EBL)
Humans appear to learn quite a lot from one example.Human learning is accomplished by examining particular situations and relating them to the background knowledge in the form of known general principles.This kind of learning is called "Explanation Based Learning (EBL)".
4.1 General ApproachEBL is abstracting a general concept from a particular training example.EBL is a technique to formulate general concepts on the basis of a specific training example. EBL analyses the specific training example in terms of domain knowledge and the goal concept. The result of EBL is an explanation structure, that explains why the training example is an instance of the goal concept. The explanation-structure is then used as the basis for formulating the general concept.Thus, EBL provides a way of generalizing a machine-generated explanation of a situation into rules that apply not only to the current situation but to similar ones as well.
5. DiscoverySimon (1966) first proposed the idea that we might explain scientific discovery in computational terms and automate the processes involved on a computer. Project DENDRAL (Feigenbaum 1971) demonstrated this by inferring structures of organic molecules from mass spectra, a problem previously solved only by experienced chemists.Later, a knowledge based program called AM the Automated Mathematician (Lenat 1977) discovered many mathematical concepts.After this, an equation discovery systems called BACON (Langley, 1981) discovered a wide variety of empirical laws such as the ideal gas law. The research continued during the 1980s and 1990s but reduced because the computational biology, bioinformatics and scientific data mining have convinced many researchers to focus on domain-specific methods. But need for research on general principles for scientific reasoning and discovery very much exists.Discovery system AM relied strongly on theory-driven methods of discovery. BACON employed data-driven heuristics to direct its search for empirical laws.
BACON.3 :BACON.3 is a knowledge based system production system that discovers empirical laws. The main heuristics detect constancies and trends in data, and lead to the formulation of hypotheses and the definition of theoretical terms. The program represents information at varying levels of description. The lowest levels correspond to direct observations, while the
highest correspond to hypotheses that explain everything so far observed. BACON.3 is built on top of BACON.1.
■ It starts with a set of variables for a problem. For example, to derive the ideal gas law, it started with four variables, p, V, n, T.‡ p - gas pressure,‡ V - gas volume,‡ T - gas temperature,‡ n - is the number of moles.
■ Values from experimental data are inputted.
■ BACON holds some constant and try to notice trends in the data.
■ Finally draws inferences. Recall pV/nT = k where k is a constant.
■ BACON has also been applied to Kepler's 3rd law,Ohm's law, conservation of momentum and Joule's law.
• Example :Rediscovering the ideal gas law pV/nT = 8.32, where p is the pressure on a gas, n is the number of moles, T is the temperature and V the volume of the gas. [The step-by-step complete algorithm is not given like previous example, but the procedure is explained below]
■ At the first level of description we hold n = 1 and T = 300 and vary pand V. Choose V to be the dependent variable.
■ At this level, BACON discovers the law pV = 2496.0.
■ Now the program examines this phenomenon :when n = 1 and T = 310 then pV = 2579.2. Similarly, when
n = 1 and T = 320 then pV = 2662.4.■ At this point, BACON has enough information to relate the values of
pV and the temperature T. These terms are linearly related with an intercept of 0, making the ratio pV/T equal to 8.32.
■ Now the discovery system can vary its third independent term. while n = 2, the pV/T is found to be 16.64,while n =3, the pV/T is found to be 24.96.
■ When it compares the values of n and pV/T, BACON finds another linear relation with a zero intercept. The resulting equation, pV/nT = 8.32, is equivalent to the ideal gas law.
6. Analogy
Learning by analogy means acquiring new knowledge about an input entity by transferring it from a known similar entity.This technique transforms the solutions of problems in one domain to the solutions of the problems in another domain by discovering analogous states and operators in the two domains.Example: Infer by analogy the hydraulics laws that are similar to Kirchoff's laws.
Qa = 3 Qb = 9
Qc = ?Hydraulic Problem
The other similar examples are :■ Pressure Drop is like Voltage Drop■ Hydrogen Atom is like our Solar System :
I1 I2
I3 = I1 + I2
Kirchoff's First LawThe Sun has a greater mass than the Earth and attracts it, causing the Earth to revolve around the Sun. The nucleus also has a greater mass then the electron and attracts it. Therefore it is plausible that the electron also revolves around the nucleus.
5.2 KNOWLEDGE AND LEARNING
Knowledge in learningThis simple knowledge-free picture of inductive learning persisted until the early 1980s. The
modem approach is to design agents that already know something and are trying to learn some more this may not sound like a terrifically deep insight, but it makes quite a difference to the way we design agents. It might also have some relevance to our theories about how science itself works. The general idea is shown schematically in Fig.5.2
A cumulative learning process uses, and adds to, its stock of background knowledge over time.
5.3 LEARNING IN PROBLEM SOLVING
Learning by Parameter AdjustmentMany programs rely on an evaluation procedure to summarise the state of search etc. Game playing programs provide many examples of this.However, many programs have a static evaluation function.In learning a slight modification of the formulation of the evaluation of the problem is required.Here the problem has an evaluation function that is represented as a polynomial of the form such as:
The t terms a values of features and the c terms are weights.In designing programs it is often difficult to decide on the exact value to give each weight initially.So the basic idea of idea of parameter adjustment is to:
Start with some estimate of the correct weight settings. Modify the weight in the program on the basis of accumulated experiences. Features that appear to be good predictors will have their weights increased and bad ones will
be decreased.Samuel's Checkers programs employed 16 such features at any one time chosen from a pool of 38.
Learning by Macro OperatorsThe basic idea here is similar to Rote Learning:Avoid expensive recomputation
Macro-operators can be used to group a whole series of actions into one.For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We could treat laying the table as on action even though it involves a sequence of actions.The STRIPS problem-solving employed macro-operators in it's learning phase.Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true.STRIPS can achieve ON(A,B) in four steps:UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE), postconditions ON(A,B), ON(C,TABLE) and the four steps as its body.MACROP can now be used in future operation.But it is not very general. The above can be easily generalised with variables used in place of the blocks.
Learning by ChunkingChunking involves similar ideas to Macro Operators and originates from psychological ideas on memory and problem solving.The computational basis is in production systems (studied earlier).SOAR is a system that use production rules to represent its knowledge. It also employs chunking to learn from experience.Basic Outline of SOAR's Method
SOAR solves problems it fires productions these are stored in long term memory. Some firings turn out to be more useful than others. When SOAR detects are useful sequence of firings, it creates chunks. A chunk is essentially a large production that does the work of an entire sequence of smaller
ones. Chunks may be generalised before storing.
5.4 LEARNING FROM EXAMPLE
Learning by Example / Induction: It is similarity based learning. In this, large number of examples are given and machine learns to perform similar actions in similar situations. In case of human beings also, this form of learning is frequently used. When we are children, our teacher tells us so many things by giving examples. Suppose there are two fruits, one is green apple and other a pear. As an adult it is easy to make a difference however, for a child, it might not be easy to differentiate between above two fruits. In such situations, various examples of both the fruits are given to teach the difference.Similarly, in our daily life we see many examples of birds flying. Also, there are examples that when there are clouds in the sky, it rains. Based on these examples we formulate certain rules like, “all birds can fly” and “clouds bring rain”.When we formulate such types of rules and use them to draw conclusions in given situations, we learn the things by induction.Induction means “the inferring of general laws from particular instances”. Thus, inductive learning means, generalization of knowledge gathered from real world examples & use of the same for solving similar problems.
5.5 LEARNING PROBABILISTIC MODELS
Probabilistic Models A probabilistic model of sensory inputs can: – make optimal decisions under a given loss function – make inferences about missing inputs – generate predictions/fantasies/imagery – communicate the data in an efficient way Probabilistic modeling is equivalent to other views of learning: – information theoretic: finding compact representations of the data – physical analogies: minimising free energy of a corresponding statistical mechanical system Bayes rule — data set — models (or parameters) The probability of a model given data set is: -! " # " $ is the evidence (or likelihood ) is the prior probability of - is the posterior probability of $! %&" ' )( Under very weak and reasonable assumptions, Bayes rule is the only rational and consistent way to manipulate uncertainties/beliefs (Poly ´ a, Cox axioms, etc). Bayes, MAP and ML Bayesian Learning: Assumes a prior over the model parameters.Computes the posterior distribution of the parameters: *+-,/.-01 . Maximum a Posteriori (MAP) Learning: Assumes a prior over the model parameters *+2,31 . Finds a parameter setting that maximises the posterior: *+2,.-01¤4 *+-,51*+"0 .6,51 . Maximum Likelihood (ML) Learning: Does not assume a prior over the model parameters. Finds a parameter setting that maximises the likelihood of the data: *+"0 .7,51
5.6 FORMAL LEARNING THEORY
Formal Learning Theory A device learns a concept if it can, given positive and negative examples, produce an
algorithm that will classify future examples correctly with probability 1/h. The complexity of learning a concept is a function of three factors: the error tolerance (h), the
number of binary features present in the examples (t), and the size of the rule necessary to make the discrimination ( f ).
If the number of training examples required is polynomial in h, t, and f, then the concept is said to be learnable.
Formal Learning Theory (Cont’d)
Conjunctive learning requires log(n) training examples, where n is the number of features.
Conjunctive learning with positive examples only requires about n training examples.
Finite automata are learnable only if the learner is allowed to perform experiments.
UNIT VICO.6 Apply the concept of knowledge engineering, learning, knowledge acquisition, understanding
natural language.
6.1 EXPERT SYSTEMS: FUNDAMENTAL BLOCKSExpert SystemsThe credibility of AI rose to new heights in the minds of individuals and critics when many Expert Systems (ES) were successfully planned, developed and implemented in many challenging areas. As of today, quite a heavy investment is done in this area. The success of these programs in very selected areas involving high technical expertise has left people to explore new avenues.“Expert Systems (ES) are knowledge intensive programs that solve problems in a domain that requires considerable amount of technical expertise”.“An Expert System is a set of programs that manipulates embedded knowledge to solve problems in a specialized domain that normally requires human expertise”.Characteristics of an Expert System:
· They should solve difficult programs in a domain as good as or better than human experts.· They should possess vast quantities of domain-specific knowledge to the minute details.· These systems permit the use of heuristic search process.· They explain why they ask a question and justify their conclusions.· They deal with uncertain and irrelevant data.· They communicate with the users in their own natural language.· They possess the capacity to cater the individual’s desire.· They provide extensive facilities for ‘symbolic processing’ rather than ‘numeric processing’.· A final characteristic is from the point of economists and financial people: They should mint money.
Expert Systems need heavy investment and there should be considerable ‘Return on Investment’ (ROI).
Architecture and Modules of Expert System
The fundamental modules of an expert system are:-· Knowledge Base· User Interface· Inference Engine· Explanation Facility· Knowledge Acquisition Facility· External Interface
1. Knowledge Base: The core module of any expert system is its Knowledge-Base (KB). It is a warehouse of the domain-specific knowledge captured from the human expert via the knowledge acquisition module. There are many ways of representing the knowledge in the knowledge-base such as logic, semantic nets, frames, scripts, production rules etc.
2. User Interface: User Interface provides the needed facilities for the user to communicate with the system. A user normally would like to have a construction with the system for the following aspects:
· To get remedies for his problem.· To know the private knowledge (heuristics) of the system.· To get some explanations for specific queries.
Presenting a real-world problem to the system for a solution is what is meant in having a consultation. Here, the user-interface provides as much facilities as possible such as menus, graphical interface etc. to make the dialogue user-friendly and lively.
3. Inference Engine: Also called as ‘rule interpreter’ an inference engine (IE), performs the task of matching antecendents from the responses given by the user and firing rules.
Basically there are two approaches:- Forward Chaining- This works by matching the existing conditions of the problem (given facts) with
the antecendents of the rule in the knowledge base. Forward chaining is also known as data driven search or antecendent search.
Backward Chaining- This is a reverse process of forward chaining. Here the rule interpreter tries to match the ‘THEN condition’ instead of the ‘IF condition’. Because of this backward chaining is also called consequent driven or goal driven search.
4. Explanation Facility: Getting answers to specific queries forms the explanation mechanism of the expert system. Basically any user would like to ask the following basic questions ‘why’ & ‘how’.
Conventional programs do not provide these facilities. Explanation facility helps the user in the following ways:-
· If the user is a domain expert, it helps in identifying what additional knowledge is needed.· Enhances the user’s confidence in the system.· Serves as a tutor in sharing the System’s knowledge with the user.· Explanation Facility is a part of the user interface that carries out the above tasks.5. Knowledge Acquisition Facility: The major bottleneck in Expert System development is knowledge
acquisition. Knowledge Acquisition facility creates a congenial atmosphere for the expert to share the expertise with the system. KAF creates a congenial atmosphere for the expert to share the expertise with the system.
6. External Interface: This provides the communication between the Expert System and the external environment. When there is a formal consultation, it is done via the user interface. In real time expert systems when they form a part of the closed loop system, it is not proper to expect human intervention every time to feed in the conditions prevailing & get remedies. Moreover, the time gap is too narrow in real time system. The external interface with its sensors gets the minute by minute information about the situation & act accordingly.
6.2 KNOWLEDGE ENGINEERING
Knowledge Engineering is generally known as the field that is responsible for the analysis and design of expert systems and is thus concerned with representing and implementing the expertise of a chosen application domain in a computer system. Research on cognition or cognitive science, on the other hand, is performed as a basic science, mostly within the disciplines of artificial intelligence, psychology and linguistics. It investigates the mental states and processes of humans by modelling them with a computer system and combining analytic and empirical viewpoints. Early on, knowledge acquisition was known as the activity of making explicit the human knowledge that is relevant for performing a task, so that it can be represented and become operational in an expert system. Knowledge acquisition and the field of knowledge engineering are consequently closely related to human cognition, which is studied in cognitive science. The specific relationship between knowledge engineering and cognitive science has changed over the years and therefore needs to be reconsidered in future expert system developments. Although knowledge acquisition activities are at most twenty years old, there is already a respectable history with noticeable successes and some initially disappointing failures to be looked back upon. Actually, more progress was made by the analysis of the failures than with the short term successes.
6.3 KNOWLEDGE ACQUISITION
Knowledge Acquisition Strategies:1. Protocol Analysis: In this method, the expert is asked to think aloud and try to express the mental
process while solving the problem. The protocol, consisting of the knowledge engineer’s observation & expert’s thought process is analyzed at a later stage for specific features of the type of problem. In this method, the knowledge engineer does not interrupt while the expert is on the work.
2. Interviews & Introspection: This is another method and most commonly used. In this method, the knowledge engineer familiarizes the concepts about the domain and poses questions or problems to the experts who in turn, provide answers or solutions that help in revealing some heuristic knowledge.
3. Observation at site: In this method, the elicitor acts as a passive element and watches the expert in actual action. Procedural knowledge is obtained by this method.
4. Discussion about the problem: In this category there are three methods:- a) Problem Description- In problem description, the expert is asked to give sample problems, for each
category of answer. This method will help in identifying the foundational characteristics of the problems.
b) Problem Discussion- Problem discussion method involves discussion about a problem to the domain expert. The needed data, knowledge and procedures evolve by this method. Knowledge of finer granularity emerges from the discussion.
c) Problem Analysis: The problem analysis part is similar to protocol analysis, wherein the expert is presented with a series of problems and asked to think and find solutions for the same.
5. Discussion about the system: This method involves the prototype system that has been developed. Three major methods are:-
a) Tuning the System: In tuning the system, the domain specific expert is asked to provide a set of classic problems and solutions are obtained from the system. The solutions are then compared with the solutions obtained by the human expert and the system is tuned by adding knowledge of high granularity.
b) Verifying the system: In verifying the system, the expert is totally explained about the intricacies of the system and is asked to verify the working of the system. This is a tedious task.
c) Validating the system: In validating the system, the results of the system and that of the expert are given to the outside experts to find out the validity of the solution.
Difficulties in Knowledge Acquisition· Domain experts store their private knowledge subconsciously. They do not keep a written record of
their heuristics. So, unless and until a problem comes that needs that private piece of knowledge, it remains passive and hidden.
· Domain experts have the problem of effective communication. Most experts find it difficult to explain their reasoning process. Lack of proper communications makes knowledge acquisition process very tedious and inefficient.
· Much of the human expertise is basically intuitive which is the capability of skilled pattern recognition. Intuition is very hard to verbalize.
Major Application Areas of Expert Systems:-Ø AnalysisØ ControlØ DesigningØ DiagnosisØ MonitoringØ PlanningØ PredictionØ Repair
Examples of Expert Systems· DENDRAL does the inferring process of structure elucidation of chemical compounds.· MYCIN, an expert system for diagnosis of bacterial infections and effectively handles uncertain
data.· XCON/R1, a system in use at Digital Equipment Corporation for configuring VAX computers.
6.4 KNOWLEDGE BASED SYSTEMS
The typical architecture of an KBS is often described as follows:
The inference engine and knowledge base are separated because: the reasoning mechanism needs to be as stable as possible; the knowledge base must be able to grow and change, as knowledge is added; this arrangement enables the system to be built from, or converted to, a shell. It is reasonable to produce a richer, more elaborate, description of the typical KBS. A more elaborate description, which still includes the components that are to be found
in almost any real-world system, would look like this:
The system holds a collection of general principles which can potentially be applied to any problem - these are stored in the knowledge base.
The system also holds a collection of specific details that apply to the current problem (including details of how the current reasoning process is progressing) - these are held in working memory.
Both these sorts of information are processed by the inference engine.
Any practical expert system needs an explanatory facility. It is essential that an expert system should be able to explain its reasoning. This is because:
It is not unreasonable to include an expert interface & a knowledge base editor, since any practical KBS is going to need a mechanism for efficiently building and modifying the knowledge base.
Expert systems Substantial Domain KB Applied AI in a very broad sense.
Expert system are complete AI programs.All techniques are exploited.
Set of production rules.
Expert system shells:- adding new knowledge corresponding to the new problem domain - EMYCIN Empty MYCIN derived from MYCIN typically support rules, frames, fms and a
variety of other reasoning mechanisms.
- Shell must provide easy to sue interface between an expert system i.e. written with a shell and
a larger, probably more conventional programming environment.
Explanation :-
People must interact easily
To facilitate this interaction the ES must have the capabilities.
Introduction to Expert System.Expert systems are knowledge intensive programs that solve problems in a domain that
requires considerable amount of technical expertise.
Machine can offer intelligent advice or take an intelligent decision about a processing function.
B Warehouse of the domain specific knowledge captured from the human expert via the knowledge
acquisition module.
Apart from CD, frames & scripts.
Production rulers are extremely popular KR structure today.
If < condition 1>
And < condition 2 >
And < condition n >
Then < action 1 >
And < action 2 >
Inference engine : - called as rule interpreter performs the task of matching antecedents from the
responses given by the user & firing rules.
Theory - Forward chaining trace conclusion KB
Backward chaining
Conflict resolution perform first
sequencing technique
perform the most specifics
most recent policy.
User interface : - needed facility for the user to communicate with the system.
Explanation facility
Knowledge Acquisition Facility
External interface :- Communication between the ES & the external environment.
user Expert Knowledge Engineer
Problem identification
Decide on the vehicle for Development
Prototype Development
Plan the full scale system
Implementation maintenance evaluation of
the full systemUser
External interface
External interface External interface
Historical Database
Knowledge Engineer
Domain Expert
Maintenance personnel
Personnel involved in Es Development
6.6 UNDERSTANDING NATURAL LANGUAGE
Natural Language Processing (NLP)
Processing written text, using lexical, syntactic & semantic knowledge of the language as well
as the required real world information.
Processing spoken language above + added knowledge about phonology.
Problem : - Incomplete description of the information that they are intended to convey.
Some dogs are outside
Some dogs are on the lawn
Three dogs are on the lawn
Rover, Tripp & Spot ……..
Problem : The same expression means different things in different contexts.
Where’s the water
Problem : No natural language programming can be complete because new words expressions &
meaning can be generated quite freely.
I’ll fax it to you.
Problem : There are lots of way to say same thing .
Mary was born on October 11
Mary Birthday is October 11
* Features of language that make it both.
* Translating from one natural language to another
* Natural language processing includes both understanding & generation as well as other task such as
multilingual translation.
What is language understanding? & What does a sentence mean
Steps in Natural language understanding .
Morphological Analysis : Individual words are analyzed into their components & non word
tokens such as punctuation are separated from the word.
Syntactic Analysis : Linear sequences of words are transformed into structures that show how
the word relate to each other. English syntactic analyzer would reject the sentence “ Boy the go
to the store”.
Semantic Analysis :- Mapping is made between the syntactic structure & objects in the task
domain.
Discourse integration :- Ex – John wanted it depends on the prior discourse context while John
may influence the meaning of later sentences ex ; “He always had”.
Pragmatic Analysis :- The structure representing what was said is reinterpreted to determine
what was actually meant. “Do u know what time it is” request to be told the time.
An English interface to an operating system & the following sentence is typed.
“I want to print Bill’s .init file”
Morphological Analysis :
o Pull apart the word “Bill’s into the proper nound “Bill” & the possessive suffix “s”.
o Recognize “init” as a file extension adjective syntactic analysis parsing.
(unit noun phrases) -- reference markers in parentheses find
Constitutent to which meaning can be assigned.
S(RMI)
NP
PRO
I(RM2)
S(RM3)
VP
V
VP
Vwant
NP(RM4)
ADJS
Bill’s (RM5)
NP
ADJS.init
N
file
Fig. Syntactic AnalysisPerception & Action
Perception involved interpreting sights, sounds, smells & touch. Action includes the ability to navigate the world & manipulate objects
Robot Sensors Physical world.
AI Chess moves search/game playing millions of nodes.
* Real & simulated worlds
AI Roboti) I/P symbolic inform English
sentence 8 puzzle.
i) Analog signal 2-D wide image or a
speech waveform.
ii) Require general purpose
computers.
ii) Special H/W for perceiving & affecting
the world.
iii) AI programming can come up with
a optional plan of best first search.
iii) Sensors/effectors limited in precision
always some degree of uncertainty
obstacle stands.
iv) The real world is unpredictable,
dynamic & uncertain - trade off
between devising & executing
iv) React in Real time.
Searching & backtracking can be costly as Robots are operating in Real time.
A Design for an autonomous Robot
Attaching, Sensors & Effectors to existing AI
The physical worldPerception
Cognition
Action
Perception : -through many channels sight, sound, touch, smell taste sensors laser range finders,
speedometers & radar.
Two important sensory channels for humans are vision & spoken language two faculties gather
almost all of the knowledge.
Video camera image
1. Signal processing enhancing the image
2. Measurement analysis.
3. Pattern Recognition : - Classifying an object into a category drawn from a finite set of
possibilities.
4. Image understanding :- Classifying them and building a 3 D model of the scene.
Speech Recognition Natural language understanding the systems usually accept typed i/p not possible for
number of applications.
- Spoken language - more natural form of communication in many human computer
interfaces.
- Design issues in speech systems.
* Speakers dependencies Vs independencies
* Continous Vs isolated word speech.
* Real time Vs offline processing
* Large Vs Small vocabulary.
* Broad Vs Narrow Grammer.
- Action
Mobility & intelligence seem to have evolved together.
Intelligence that puts mobility to effective use.
Nature of mobility in terms of how robots navigate through the world & manipulate
objects.
Planning routs
Start ----- obstacle --- goal
Manipulation
PattersonIV Dealing with inconsistencies & uncertainties
1st order logic.
No way to express uncertain, imprecise hypothetical or vague knowledge
No way to produce new knowledge @ the world.
Intelligent beings are continuously required to make decisions under a veil of uncertainty.
Information available may be contradictors or even unbelievable.
Non monotonic Reasoning : -
In predicate/props knowledge grows monotonically
New facts became known which contradicted & invalidated old knowledge.
Retractions – lead to a shrinkage or non-monotonic growth in the knowledge at times.
Truth Maintenance Systems also knowns as Belief Revision & Revision Maintenance systems.
- TMS maintain consistency of the knowledge
- Gives the inference component the latitude to perform non-monotonic inferences.
- New beliefs available with continue to be consistent & current/
Architecture of the Problem solver with a TMS
It tells the TMS what deductions it has made the TMS, in turn asks questions about current
beliefs & reasons for failures.
Belief Revision :- K.B contained only the proposition P, P Q, and modus ponens. From this
IE concludes Q & add this conclusion to KB.
Later it was learned that TP was appropriate.
It would be added to the KB leads to contradiction.
TMSInference Engine
Knowledge base
tell
ask
Consequently it is necessary to remove to eliminate inconsistency.
But with P now removed Q is no longer a justified belief. It too should be removed. This type
of Believe Revision is the job of TMS. Q is removed but not erased. It a use further P may be
true & P Q may be required.
Depending directed backtracking.
The records are maintained in the form of a dependency network.
Nodes in the network represent KB entries such as premises, conclusions, inference ration and
the like.
Attached to the nodes are justification which represent the inference steps from which the node
was derived.
Premise is a fundamental belief.
Base from which all other currently active nodes can be explained in terms of valid
justifications.
Two types of justificatios.
- Support lists. -- SL
- Conceptual dependencies -- CD
Ex: Cybil as a nonflying bird (an ostrich)
n1 Cybil is a bird (SW ( ) ( ) ) – a premise.
n2 Cybil can fly (SL (n1) (n3)) – unjustified belief.
n3 Cybil cannot fly (SL (n5) (n4) – justified belief.
n4 Cybil has wings (SL ( ) ( ) ) – retracted premise.
n5 Cybil is an ostrich (SL ( ) ( ) ) – a premise.
Suppose it is discovered that Cybil is not an ostrich. There by causing n5 to be retracted. Thus n3
which depends on n5 must also be retracted. Thus in turn changes the status of n2 to be a justified node.
Resultant belief now bird Cybil can fly.
Belief network node meanings.
Premises Assumptions Datum
Propositional logic
Appealing because simple to deal with & a decision procedure for it exists.
Predicate logic: - It provides a way of deducing new statements from old ones. Unfortunately
(goodway of reasoning with the knowledge) – yes
However unlike propositional logic, it does not possessed decision procedure even an exponential one.
* A proposition in propositional logic takes only two values i.e. either the proposition is TRUE or
FALSE.
Example
1) Rubber is a good conductor of electricity False
2) Diamond is a hard material Tue
A The machine is defective
B The Production is less
The implication is written as.
If the machine is defective then production is less.
A B Implication
True True True i.e. if machine is defective then production is less.
False True True if machine is defective them production is less. No sense
in this statement. Production can be less for a variety of
other reasons.
True False False cannot be admitted hence implication is false.
False False True
Predicate logic or first order logic
Propositional logic works fine in situations where the result is TRUE or FALSE but not both.
However there are many real life situations that cannot be treated this way.
Consider All mammals suckle their young ones. Since elephant is a mammal it suckles its young one.
Propositional logic fails to express them. In order to overcome this deficiency . Predicate logic uses
three additional notions.
These are
- predicates
- Terms
- Quantifiers
* Predicates : - A relation that binds two atoms together.
* Baskar likes aeroplanes
Likes (Baskar, aeroplanes)
* Ravi’s father is Rani’s father
FATHER (father (Ravi), Rani)
Here who’s is Ravi’s father is not explitly stated but it represents a person it
is a term.
A constant, variable, function is a TERM.
Quantifiers:- it is a symbol that permits one to declare or identify the range or scope of the variables in
a logical expression.
Universal quantifiers ( )
Existential quantifier (Ǝ)
If a is a variable then a is read as
i) For all a
ii) For each a
iii) For every a
If b is a variable, then b is read as
i) There exists a b
ii) For some b
iii) For atleast one b
Conversion to clause form
All Romans who know Marcus
either hate Caesar or think that anyone who hates anyone is crazy.
x : [Roman (x) Ʌ know (x, Marcus)] [hate (x, caesar) V ( y : Ǝz : hate (y,z)
Think crazy (x,z))]
Conjuctive Normal form :- [Roman (x) Ʌ know (x, Marcus)] ˅
Hate (x, Caesar) ˅ hate (y,z) ˅think crazy (x,z)
Algorithm : Convert to clause form.
1. Eliminate using the fact that is equivalent to
Ex:
x : [Roman (x) Ʌ know (x, Marcus)] ˅
[hate (x, Caesar) ˅ ( y : (Ǝz : hate (y,z)) ˅ think crazy (x,z)
2. Reduce the scope of each to a single term, using the fact that ( p) = p.
De Morgan’s laws which say that
a b a ˅
(a Ʌ b) - a ˅ b
(a ˅ b) - a Ʌ b
and
the standard correspondences between quantifiers
[ x : P(x) = Ǝx : P(x) Ǝx : P(x) = x : : P(x) ]
Performing this on steps (1) yields.
x : { [ Roman (x) ˅ know (x, marcus] ˅ [hate (x, Caesar) ˅ ( y : z : hate (y,z)
˅ think crazy (x,z)0}
3) Standardize variables so that each quantifier binds a unique variable.
For ex : x : P(x) ˅ x : Q(x)
Converted to
x : P(x) ˅ y : Q(y)
4) Move all quantifier s to the left of the formula without changing their relative order
x : y : z :
[ Roman (x) ˅ know (x, Marcus)] ˅ [hate (x, Caesar) ˅ ( hate (y,z) ˅ think crazy (x,y)]
The formula is in prenex Normal form.
5) Eliminate existential quantifiers
i) Ǝy : president (y) converted to President (S1)
ii) S1 is a function with no argument that somehow produces a value that satisfied
President.
iii) x : Ǝy : father of (y,x) transformed to x : father of (S2 (x),x)
This generated function is called as Skolem function. Sometimes one with no
arguments are called as skolen constants.
6) Drop the prefix step (4) Remove x : y : z : any variable it sees is universally quantified.
7) Convert the matrix into a Conjuction of disjuncts.
* Exploit associative property
i.e a ˅ (b ˅ c) = (a ˅ b) ˅ c
* distributive property
(a Ʌ b) ˅ c = (a ˅ c) Ʌ (b˅ c)
Ex: (winter Ʌ wearing boots) ˅ (summer Ʌ wearing sandals) becomes.
1) { winter ˅ (summer Ʌ wearing sandals)] Ʌ [wearing boots ˅ (summer Ʌ wearing sandals) }
2) (winter ˅ summer) Ʌ (winter ˅ wearing sandals) Ʌ (wearing boots ˅ summer) Ʌ (wearing
boots ˅ wearing sandals)
5.4.3 Resolution in Propositional logic
The procedure for producing a proof by resolution of proposition P with respect to a set of
axioms F is the following
Algorithm : - Propositional Resolution.
A few facts in propositional logic
Given Axioms Converted to clause form
P P (1)
(PɅQ) R P ˅ Q ˅ R (2)
(SVT) Q S ˅ Q (3)
T ˅ Q (4)
T T (5)
Fig 5.8 Resolution in Propositional logic want to Prove R
Resolution process:- it takes a set of clauses that are all assumed to be true & based on information it
generates new clauses that represent restrictions on the way each of those original clauses can be made
true.
A contracdiction occurs when a clause becomes so restricted that there is no way it can be true (empty
clause)
2) to be true one of three things must be true P, Q or R But we are assuming that R is true
P ˅ Q ˅ R R assume true
P ˅ Q P but (1) says P is true
Q T ˅ Q but (4) T, Q is true
T T (5)
Conflict/Construction
Hence assumption that R is true is wrong.
Resolution in Predicate logic.
Two literals are constradictory they are if one of them can be unified with the negation of the other.
Forex : man (x) & man (spot) are contradictory
man (x) & man (spot) can be unified.
This say that man (x) cannot be true for all x if there is known to be some x say spot, for which man (x)
is false. Thus to use resolution for expressions in the predicate logic we use the unification algorithm
to locate pairs of literals that cancel out.
Ex: 1) man (Marcus) 2) man (x1) ˅ mortal (x1)
x1= Marcus, man (Marcus) is false conclude mortal (marcus) is true for some value of x1, man (x1)
might be true making mortal (x1) irrelevant to the truth of the complete clause.
Fig 5.9 A Resolution Proof.
Axioms in clause form:-
1) man (Marcus)
2) Pompeian (Marcus)
3) Pompeian (x1) ˅ Roman (x1)
4) Ruler (Caesar)
5) Roman (x2) ˅ loyal to (x2, Caesar) ˅ hate (x2, Caesar)
6) Loyal to (x3, f1 (x3) )
7) man (x4)˅ ruler (y1) ˅ try assassinate (x4, y1) ˅ loyal to (x4, y1)
8) Try assassinate (Marcus, Caesar)
Prove : hate (Marcus, Caesar)
hate (Marcus, Caesar) (5)
Marcus/x2
Roman (Marcus) ˅ loyal to (Marcus, Caesar) (3)
Loyal to (Marcus, Caesar) ˅ Pompeian (Marcus) (2)
(7) Loyal to (Marcus, Caesar)
Marcus/xy
Caesar/y1 man (Marcus) ˅ ruler (Caesar) ˅ try assassinate (Marcus, Caesar)
(1)
ruler (Caesar) ˅ try assassinate (Marcus, Caesar) (8)
ruler (Caesar) (4)
Using Resolution with Equality & Reduce
Axioms in clause form
1) man (Marcus)
2) Pompeian (Marcus)
3) Marcus was born in 40 A.D.
Born (M,40)
All men are mortal x : man (x) mortal (x)
4) man (x1) ˅ mortal (x1)
5) All pompeians died when the volcano erupted in 79 A.D.
erupted (volcano, 79) Ʌ x : [Pom (x) died (x,79)]
Pom (x2) ˅ died (x2, 79)
6) Erupted (volcano, 79)
7) mortal (x3) ˅ born (x3, t1) ˅ gt (t2 – t1, 150) ˅ dead (x3, t2)
8) Now = 1991
9) a) Alive means not dead.
x : t : [alive (x,t) dead (x,t)] Ʌ [ dead 9x,t) alive (x,t)]
alive (x4, t3) ˅ dead (x4, t3)
b) dead (x5, t4) ˅ alive (x5, t4)
10) if someone dies, then he is dead at all later times.
x : t1 : t2 : died (x,t1) Ʌ gt (t2 – t1) dead (x,t2) died (x6, t5) ˅ gt (t6, t5) ˅ dead (x6,t6)
Prove : alive (M, now)
Alive (M,now) 9 (a)
M/x4, now/t3
dead (M, now) 10
M/x6, t6/now
deid (M, t5) ˅ gt (now, t5) (5)
Pomp (M) ˅ gt (now, 79)
Substitute equals.
Pom (M) ˅ gt (1991, 79) reduce
Pom (M) (2)
syedrehan243693206.wordpress.comhttps://syedrehan243693206.wordpress.com/