Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CS344: Introduction to Artificial Intelligence
(associated lab: CS386)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture–1 to 6: Introduction; Fuzzy sets and logic; inverted pendulum
7th, 8th, 9th 14th, 15th , 21st Jan, 2013
Basic Facts Faculty instructor: Dr. Pushpak Bhattacharyya
(www.cse.iitb.ac.in/~pb) TAship: Kashyap, Bibek, Samiulla, Lahari, Jayaprakash, Nikhil,
Kritika and [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Course home page www.cse.iitb.ac.in/~cs344-2013 (will be up soon)
Venue: LCC 02, opp KR bldg 1 hour lectures 3 times a week: Mon-10.30, Tue-11.30, Thu-
8.30 (slot 2)
Perspective
AI Perspective (post-web)
Planning
ComputerVision
NLP
ExpertSystems
Robotics
Search, Reasoning,Learning
IR
From WikipediaArtificial intelligence (AI) is the intelligence of machines and the branch of
computer science that aims to create it. Textbooks define the field as "the study and design of intelligent agents"[1] where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success.[2] John McCarthy, who coined the term in 1956,[3] defines it as "the science and engineering of making intelligent machines."[4]
The field was founded on the claim that a central property of humans, intelligence—the sapience of Homo sapiens—can be so precisely described that it can be simulated by a machine.[5] This raises philosophical issues about the nature of the mind and limits of scientific hubris, issues which have been addressed by myth, fiction and philosophy since antiquity.[6] Artificial intelligence has been the subject of optimism,[7] but has also suffered setbacks[8] and, today, has become an essential part of the technology industry, providing the heavy lifting for many of the most difficult problems in computer science.[9]
AI research is highly technical and specialized, deeply divided into subfields that often fail to communicate with each other.[10] Subfields have grown up around particular institutions, the work of individual researchers, the solution of specific problems, longstanding differences of opinion about how AI should be done and the application of widely differing tools. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects.[11] General intelligence (or "strong AI") is still a long-term goal of (some) research.[12]
Topics to be covered (1/2) Search
General Graph Search, A*, Admissibility, Monotonicity Iterative Deepening, α-β pruning, Application in game playing
Logic Formal System, axioms, inference rules, completeness, soundness and
consistency Propositional Calculus, Predicate Calculus, Fuzzy Logic, Description
Logic, Web Ontology Language Knowledge Representation
Semantic Net, Frame, Script, Conceptual Dependency Machine Learning
Decision Trees, Neural Networks, Support Vector Machines, Self Organization or Unsupervised Learning
Topics to be covered (2/2) Evolutionary Computation
Genetic Algorithm, Swarm Intelligence Probabilistic Methods
Hidden Markov Model, Maximum Entropy Markov Model, Conditional Random Field
IR and AI Modeling User Intention, Ranking of Documents, Query Expansion,
Personalization, User Click Study Planning
Deterministic Planning, Stochastic Methods Man and Machine
Natural Language Processing, Computer Vision, Expert Systems Philosophical Issues
Is AI possible, Cognition, AI and Rationality, Computability and AI, Creativity
Foundational Points
Church Turing Hypothesis Anything that is computable is computable
by a Turing Machine Conversely, the set of functions computed
by a Turing Machine is the set of ALL and ONLY computable functions
Turing MachineFinite State Head (CPU)
Infinite Tape (Memory)
Foundational Points (contd)
Physical Symbol System Hypothesis (Newel and Simon) For Intelligence to emerge it is enough to
manipulate symbols
Foundational Points (contd)
Society of Mind (Marvin Minsky) Intelligence emerges from the interaction
of very simple information processing units Whole is larger than the sum of parts!
Foundational Points (contd)
Limits to computability Halting problem: It is impossible to
construct a Universal Turing Machine that given any given pair <M, I> of Turing Machine M and input I, will decide if M halts on I
What this has to do with intelligent computation? Think!
Foundational Points (contd)
Limits to Automation Godel Theorem: A “sufficiently powerful”
formal system cannot be BOTH complete and consistent
“Sufficiently powerful”: at least as powerful as to be able to capture Peano’s Arithmetic
Sets limits to automation of reasoning
Foundational Points (contd)
Limits in terms of time and Space NP-complete and NP-hard problems: Time
for computation becomes extremely large as the length of input increases
PSPACE complete: Space requirement becomes extremely large
Sets limits in terms of resources
Two broad divisions of Theoretical CS
Theory A Algorithms and Complexity
Theory B Formal Systems and Logic
AI as the forcing function
Time sharing system in OS Machine giving the illusion of attending
simultaneously with several people Compilers
Raising the level of the machine for better man machine interface
Arose from Natural Language Processing (NLP) NLP in turn called the forcing function for AI
Allied DisciplinesPhilosophy Knowledge Rep., Logic, Foundation of
AI (is AI possible?)Maths Search, Analysis of search algos, logic
Economics Expert Systems, Decision Theory, Principles of Rational Behavior
Psychology Behavioristic insights into AI programs
Brain Science Learning, Neural Nets
Physics Learning, Information Theory & AI, Entropy, Robotics
Computer Sc. & Engg. Systems for AI
Goal of Teaching the course
Concept building: firm grip on foundations, clear ideas
Coverage: grasp of good amount of material, advances
Inspiration: get the spirit of AI, motivation to take up further work
Resources Main Text:
Artificial Intelligence: A Modern Approach by Russell & Norvik, Pearson, 2003.
Other Main References: Principles of AI - Nilsson AI - Rich & Knight Knowledge Based Systems – Mark Stefik
Journals AI, AI Magazine, IEEE Expert, Area Specific Journals e.g, Computational Linguistics
Conferences IJCAI, AAAI
Positively attend lectures!
Grading
Midsem Endsem Paper reading (possibly seminar) Quizzes
Modeling Human Reasoning
Fuzzy Logic
Fuzzy Logic tries to capture the human ability of reasoning with imprecise information
Works with imprecise statements such as:In a process control situation, “If the
temperature is moderate and the pressure is high, then turn the knob slightly right”
The rules have “Linguistic Variables”, typically adjectives qualified by adverbs (adverbs are hedges).
Alternatives to fuzzy logic model human reasoning (1/2)
Non-numerical Non monotonic Logic
Negation by failure (“innocent unless proven guilty”)
Abduction (PQ AND Q gives P)
Modal Logic New operators beyond AND, OR, IMPLIES,
Quantification etc.
Naïve Physics
Abduction Example
Ifthere is rain (P)
Thenthere will be no picnic (Q)
Abductive reasoning:Observation: There was no picnic(Q)Conclude : There was rain(P); in absence of any other evidence
Alternatives to fuzzy logic model human reasoning (2/2)
Numerical Fuzzy Logic Probability Theory
Bayesian Decision Theory
Possibility Theory Uncertainty Factor based on
Dempster Shafer Evidence Theory (e.g. yellow_eyesjaundice; 0.3)
Linguistic Variables Fuzzy sets are named
by Linguistic Variables (typically adjectives).
Underlying the LV is a numerical quantityE.g. For ‘tall’ (LV), ‘height’ is numerical quantity.
Profile of a LV is the plot shown in the figure shown alongside.
µtall(h)
1 2 3 4 5 60
height h
1
0.4
4.5
Example Profiles
µrich(w)
wealth w
µpoor(w)
wealth w
Example Profiles
µA (x)
x
µA (x)
x
Profile representingmoderate (e.g. moderately rich)
Profile representingextreme
Concept of Hedge Hedge is an intensifier Example:
LV = tall, LV1 = very tall, LV2 = somewhat tall
‘very’ operation: µvery tall(x) = µ2
tall(x) ‘somewhat’ operation:µsomewhat tall(x) = √(µtall(x))
1
0h
µtall(h)
somewhat tall tall
very tall
An ExampleControlling an inverted pendulum:
θ dtd /.
= angular velocity
Motor i=current
The goal: To keep the pendulum in vertical position (θ=0)in dynamic equilibrium. Whenever the pendulum departs from vertical, a torque is produced by sending a current ‘i’
Controlling factors for appropriate current
Angle θ, Angular velocity θ.
Some intuitive rules
If θ is +ve small and θ. is –ve small
then current is zero
If θ is +ve small and θ. is +ve small
then current is –ve medium
-ve med
-ve small
Zero
+ve small
+ve med
-ve med
-ve small Zero +ve
small+ve med
+ve med
+ve small
-ve small
-ve med
-ve small
+ve small
Zero
Zero
Zero
Region of interest
Control Matrix
θ. θ
Each cell is a rule of the form
If θ is <> and θ. is <>
then i is <>
4 “Centre rules”
1. if θ = = Zero and θ. = = Zero then i = Zero
2. if θ is +ve small and θ. = = Zero then i is –ve small
3. if θ is –ve small and θ.= = Zero then i is +ve small
4. if θ = = Zero and θ. is +ve small then i is –ve small
5. if θ = = Zero and θ. is –ve small then i is +ve small
Linguistic variables
1. Zero
2. +ve small
3. -ve small
Profiles
-ε +εε2-ε2
-ε3 ε3
+ve small-ve small
1
Quantity (θ, θ., i)
zero
Inference procedure
1. Read actual numerical values of θ and θ.
2. Get the corresponding µ values µZero, µ(+ve small), µ(-ve small). This is called FUZZIFICATION
3. For different rules, get the fuzzy I-values from the R.H.S of the rules.
4. “Collate” by some method and get ONE current value. This is called DEFUZZIFICATION
5. Result is one numerical value of ‘i’.
if θ is Zero and dθ/dt is Zero then i is Zeroif θ is Zero and dθ/dt is +ve small then i is –ve smallif θ is +ve small and dθ/dt is Zero then i is –ve smallif θ +ve small and dθ/dt is +ve small then i is -ve medium
-ε +εε2-ε2
-ε3 ε3
+ve small-ve small
1
Quantity (θ, θ., i)
zero
Rules Involved
Suppose θ is 1 radian and dθ/dt is 1 rad/secµzero(θ =1)=0.8 (say)Μ+ve-small(θ =1)=0.4 (say)µzero(dθ/dt =1)=0.3 (say)µ+ve-small(dθ/dt =1)=0.7 (say)
-ε +εε2-ε2
-ε3 ε3
+ve small-ve small
1
Quantity (θ, θ., i)
zero
Fuzzification
1rad
1 rad/sec
Suppose θ is 1 radian and dθ/dt is 1 rad/secµzero(θ =1)=0.8 (say)µ +ve-small(θ =1)=0.4 (say)µzero(dθ/dt =1)=0.3 (say)µ+ve-small(dθ/dt =1)=0.7 (say)
Fuzzification
if θ is Zero and dθ/dt is Zero then i is Zeromin(0.8, 0.3)=0.3
hence µzero(i)=0.3if θ is Zero and dθ/dt is +ve small then i is –ve small
min(0.8, 0.7)=0.7hence µ-ve-small(i)=0.7
if θ is +ve small and dθ/dt is Zero then i is –ve smallmin(0.4, 0.3)=0.3
hence µ-ve-small(i)=0.3if θ +ve small and dθ/dt is +ve small then i is -ve medium
min(0.4, 0.7)=0.4hence µ-ve-medium(i)=0.4
-ε +ε-ε2
-ε3
-ve small
1zero
Finding i
0.4
0.3Possible candidates:
i=0.5 and -0.5 from the “zero” profile and µ=0.3i=-0.1 and -2.5 from the “-ve-small” profile and µ=0.3i=-1.7 and -4.1 from the “-ve-small” profile and µ=0.3
-4.1-2.5
-ve small-ve medium
0.7
-ε +ε
-ve smallzero
Defuzzification: Finding iby the centroid method
Possible candidates:i is the x-coord of the centroid of the areas given by the
blue trapezium, the green trapeziums and the black trapezium
-4.1-2.5
-ve medium
Required i valueCentroid of three trapezoids
Fuzzy Sets
Theory of Fuzzy Sets
Intimate connection between logic and set theory. Given any set ‘S’ and an element ‘e’, there is a very
natural predicate, µs(e) called as the belongingness predicate.
The predicate is such that,µs(e) = 1, iff e ∈ S
= 0, otherwise For example, S = {1, 2, 3, 4}, µs(1) = 1 and µs(5) =
0 A predicate P(x) also defines a set naturally.
S = {x | P(x) is true}For example, even(x) defines S = {x | x is even}
Fuzzy Set Theory (contd.)
Fuzzy set theory starts by questioning the fundamental assumptions of set theory viz., the belongingness predicate, µ, value is 0 or 1.
Instead in Fuzzy theory it is assumed that,µs(e) = [0, 1]
Fuzzy set theory is a generalization of classical set theory aka called Crisp Set Theory.
In real life, belongingness is a fuzzy concept.Example: Let, T = “tallness”
µT (height=6.0ft ) = 1.0µT (height=3.5ft) = 0.2
An individual with height 3.5ft is “tall” with a degree 0.2
Representation of Fuzzy setsLet U = {x1,x2,…..,xn}|U| = n
The various sets composed of elements from U are presented as points on and inside the n-dimensional hypercube. The crisp sets are the corners of the hypercube.
(1,0)(0,0)
(0,1) (1,1)
x1
x2
x1
x2
(x1,x2)
A(0.3,0.4)
µA(x1)=0.3
µA(x2)=0.4
Φ
U={x1,x2}
A fuzzy set A is represented by a point in the n-dimensional space as the point {µA(x1), µA(x2),……µA(xn)}
Degree of fuzzinessThe centre of the hypercube is the most fuzzyset. Fuzziness decreases as one nears the corners
Measure of fuzzinessCalled the entropy of a fuzzy set
),(/),()( farthestSdnearestSdSE
Entropy
Fuzzy set Farthest corner
Nearest corner
(1,0)(0,0)
(0,1) (1,1)
x1
x2
d(A, nearest)
d(A, farthest)
(0.5,0.5)A
Definition
Distance between two fuzzy sets
|)()(|),(21
121 is
n
iis xxSSd
L1 - norm
Let C = fuzzy set represented by the centre point
d(c,nearest) = |0.5-1.0| + |0.5 – 0.0|
= 1
= d(C,farthest)
=> E(C) = 1
Definition
Cardinality of a fuzzy set
n
iis xsm
1)()( (generalization of cardinality of
classical sets)
Union, Intersection, complementation, subset hood
)(1)( xx ssc
Uxxxx ssss )),(),(max()(2121
Uxxxx ssss )),(),(min()(2121
Example of Operations on Fuzzy Set Let us define the following:
Universe U={X1 ,X2 ,X3} Fuzzy sets
A={0.2/X1 , 0.7/X2 , 0.6/X3} and
B={0.7/X1 ,0.3/X2 ,0.5/X3}
Then Cardinality of A and B are computed as follows:Cardinality of A=|A|=0.2+0.7+0.6=1.5Cardinality of B=|B|=0.7+0.3+0.5=1.5
While distance between A and B d(A,B)=|0.2-0.7)+|0.7-0.3|+|0.6-0.5|=1.0What does the cardinality of a fuzzy set mean? In crisp sets it
means the number of elements in the set.
Example of Operations on Fuzzy Set (cntd.)
Universe U={X1 ,X2 ,X3} Fuzzy sets A={0.2/X1 ,0.7/X2 ,0.6/X3} and B={0.7/X1 ,0.3/X2
,0.5/X3}
A U B= {0.7/X1, 0.7/X2, 0.6/X3}
A ∩ B= {0.2/X1, 0.3/X2, 0.5/X3}
Ac = {0.8/X1, 0.3/X2, 0.4/X3}
Laws of Set Theory
• The laws of Crisp set theory also holds for fuzzy set theory (verify them)
• These laws are listed below:– Commutativity: A U B = B U A– Associativity: A U ( B U C )=( A U B ) U C– Distributivity: A U ( B ∩ C )=( A ∩ C ) U ( B ∩ C)
A ∩ ( B U C)=( A U C) ∩( B U C)– De Morgan’s Law: (A U B) C= AC ∩ BC
(A ∩ B) C= AC U BC
Distributivity Property Proof Let Universe U={x1,x2,…xn}
pi =µAU(B∩C)(xi)=max[µA(xi), µ(B∩C)(xi)]= max[µA(xi), min(µB(xi),µC(xi))]
qi =µ(AUB) ∩(AUC)(xi)=min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]
Distributivity Property Proof Case I: 0<µC<µB<µA<1
pi = max[µA(xi), min(µB(xi),µC(xi))]= max[µA(xi), µC(xi)]=µA(xi)
qi =min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]= min[µA(xi), µA(xi)]=µA(xi)
Case II: 0<µC<µA<µB<1pi = max[µA(xi), min(µB(xi),µC(xi))]
= max[µA(xi), µC(xi)]=µA(xi)qi =min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]
= min[µB(xi), µA(xi)]=µA(xi)Prove it for rest of the 4 cases.
Note on definition by extension and intension
S1 = {xi|xi mod 2 = 0 } – Intension
S2 = {0,2,4,6,8,10,………..} – extension
How to define subset hood?
Meaning of fuzzy subset
Suppose, following classical set theory we say
if
Consider the n-hyperspace representation of A and B
AB
xxx AB )()(
(1,1)
(1,0)(0,0)
(0,1)
x1
x2
A. B1
.B2
.B3
Region where )()( xx AB
This effectively means
CRISPLY
P(A) = Power set of A
Eg: Suppose
A = {0,1,0,1,0,1,…………….,0,1} – 104 elements
B = {0,0,0,1,0,1,……………….,0,1} – 104 elements
Isn’t with a degree? (only differs in the 2nd element)
)(APB
AB
Subset operator is the “odd man” out AUB, A∩B, Ac are all “Set Constructors” while
A B is a Boolean Expression or predicate. According to classical logic
In Crisp Set theory A B is defined as x xA xB
So, in fuzzy set theory A B can be defined as x µA(x) µB(x)
Zadeh’s definition of subsethood goes against the grain of fuzziness theory
Another way of defining A B is as follows:
x µA(x) µB(x)
But, these two definitions imply that µP(B)(A)=1where P(B) is the power set of B
Thus, these two definitions violate the fuzzy principle that every belongingness except Universe is fuzzy
Fuzzy definition of subsetMeasured in terms of “fit violation”, i.e. violating the condition
Degree of subset hood S(A,B)= 1- degree of superset
=
m(B) = cardinality of B
=
)()( xx AB
)(
))()(,0max(1
Bm
xxx
AB
x
B x)(
We can show that
Exercise 1:
Show the relationship between entropy and subset hood
Exercise 2:
Prove that
),()( cc AAAASAE
)(/)(),( BmBAmABS
Subset hood of B in A
Fuzzy sets to fuzzy logic
Forms the foundation of fuzzy rule based system or fuzzy expert system
Expert System
Rules are of the form
If
then
Ai
Where Cis are conditions
Eg: C1=Colour of the eye yellow
C2= has fever
C3=high bilurubin
A = hepatitis
nCCC ...........21
In fuzzy logic we have fuzzy predicates
Classical logic
P(x1,x2,x3…..xn) = 0/1
Fuzzy Logic
P(x1,x2,x3…..xn) = [0,1]
Fuzzy OR
Fuzzy AND
Fuzzy NOT
))(),(max()()( yQxPyQxP
))(),(min()()( yQxPyQxP
)(1)(~ xPxP
Fuzzy Implication Many theories have been advanced and many
expressions exist The most used is Lukasiewitz formula t(P) = truth value of a proposition/predicate. In
fuzzy logic t(P) = [0,1] t( ) = min[1,1 -t(P)+t(Q)]QP
Lukasiewitz definition of implication
Fuzzy Inferencing
Two methods of inferencing in classical logic Modus Ponens
Given p and pq, infer q
Modus Tolens Given ~q and pq, infer ~p
How is fuzzy inferencing done?
A look at reasoning
Deduction: p, pq|- q Induction: p1, p2, p3, …|- for_all p Abduction: q, pq|- p Default reasoning: Non-monotonic
reasoning: Negation by failure If something cannot be proven, its
negation is asserted to be true E.g., in Prolog
Fuzzy Modus Ponens in terms of truth values
Given t(p)=1 and t(pq)=1, infer t(q)=1 In fuzzy logic,
given t(p)>=a, 0<=a<=1 and t(p>q)=c, 0<=c<=1 What is t(q)
How much of truth is transferred over the channel
p q
Lukasiewitz formulafor Fuzzy Implication
t(P) = truth value of a proposition/predicate. In fuzzy logic t(P) = [0,1]
t( ) = min[1,1 -t(P)+t(Q)]QP
Lukasiewitz definition of implication
Use Lukasiewitz definition t(pq) = min[1,1 -t(p)+t(q)] We have t(p->q)=c, i.e., min[1,1 -t(p)+t(q)]=c Case 1: c=1 gives 1 -t(p)+t(q)>=1, i.e., t(q)>=a Otherwise, 1 -t(p)+t(q)=c, i.e., t(q)>=c+a-1 Combining, t(q)=max(0,a+c-1) This is the amount of truth transferred over the
channel pq
Two equations consistent
These two equations are consistent with each other
1 2
( , ) 1 ( , )max(0, ( ) ( ))
1 where { , ,..., }( )
( ( ) ( )) min(1,1 ( ( )) ( ( )))
i
i
B i A ix U
nB i
x U
B i A i B i A i
Sub B A Sup B Ax x
U x x xx
t x x t x t x
Proof
Let us consider two crisp sets A and B
1 U
AB
2 U
BA
3 U
A B
4 U
AB
Proof (contd…)
Case I:
So,
( ) 1 only when ( ) 1 , ( ) ( ) 0A i B i B i A ix x So x x
max(0, ( ) ( ))( , ) 1
( )
01 1( )
i
i
i
B i A ix U
B ix U
B ix U
x xSub B A
x
x
Proof (contd…)
Thus, in case I these two equations are consistent with each other (prove for other cases)
( ) ( ) 0( ( ) ( )) min(1,1 ( ( ( )) ( ( ))))
min(1,1 ( )) 1
B i A i
B i A i B i A i
Since x xL t x x t x t x
ve
Proof of the fact that S(B,A) is consistent with crisp set theory
Case II (of the figure):
So,
( ) 1 if, ( ) 1 , ( ) ( ) 0B i A i B i A ix x So x x
max(0, ( ) ( ))( , ) 1
( )i
i
B i A ix U
B ix U
x xSub B A
x
( ) ( ) 0( )( , ) 1 1 ,( )
0 ( , ) 1
B i A i
A i
B i
x xxSub B Ax
Sub B A
Proof (contd…)
Thus, in case II also, these two equations are consistent with each other.
Proof of case III
Case III:
So,
In This case, ( ) ( ) 0 for ( )and ( ) ( ) 0 otherwise
B i A i i
B i A i
x x x B Ax x
max(0, ( ) ( ))( , ) 1
( )i
i
B i A ix U
B ix U
x xSub B A
x
Proof (contd…)( ) ( ) 0 only for ( )
| | | |( , ) 1| | | |
B i A i ix x x B AB A A BS B A
B B
Hence case III is also consistent across classical set theory and fuzzy set theory
Proof of case IV Case IV:
So,
In This case, ( ) ( ) 1 for , ( ) ( ) 1 for
( ) ( ) 0 otherwise
B i A i i
B i A i i
B i A i
x x x Ax x x B
and x x
max(0, ( ) ( ))( , ) 1
( )i
i
B i A ix U
B ix U
x xSub B A
x
Proof (contd…)( ) ( ) 1 only for
<=0 otherwise,| |( , ) 1 0| |
B i A i ix x x Band
BS B AB
Thus, in case IV also, these two equations are consistent with each other.
Hence we can say that these two equations are consistent with each other in general.