CS344: Introduction to Artificial Intelligence (associated ...pb/cs344-2013/cs344-lect1to6-introduction-and-fuzzy-logic...Artificial intelligence (AI) is the intelligence of machines

CS344: Introduction to Artificial Intelligence

(associated lab: CS386)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lecture–1 to 6: Introduction; Fuzzy sets and logic; inverted pendulum

7th, 8th, 9th 14th, 15th , 21st Jan, 2013

Basic Facts Faculty instructor: Dr. Pushpak Bhattacharyya

(www.cse.iitb.ac.in/~pb) TAship: Kashyap, Bibek, Samiulla, Lahari, Jayaprakash, Nikhil,

Kritika and [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Course home page www.cse.iitb.ac.in/~cs344-2013 (will be up soon)

Venue: LCC 02, opp KR bldg 1 hour lectures 3 times a week: Mon-10.30, Tue-11.30, Thu-

8.30 (slot 2)

Perspective

AI Perspective (post-web)

Planning

ComputerVision

NLP

ExpertSystems

Robotics

Search, Reasoning,Learning

IR

From WikipediaArtificial intelligence (AI) is the intelligence of machines and the branch of

computer science that aims to create it. Textbooks define the field as "the study and design of intelligent agents"[1] where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success.[2] John McCarthy, who coined the term in 1956,[3] defines it as "the science and engineering of making intelligent machines."[4]

The field was founded on the claim that a central property of humans, intelligence—the sapience of Homo sapiens—can be so precisely described that it can be simulated by a machine.[5] This raises philosophical issues about the nature of the mind and limits of scientific hubris, issues which have been addressed by myth, fiction and philosophy since antiquity.[6] Artificial intelligence has been the subject of optimism,[7] but has also suffered setbacks[8] and, today, has become an essential part of the technology industry, providing the heavy lifting for many of the most difficult problems in computer science.[9]

AI research is highly technical and specialized, deeply divided into subfields that often fail to communicate with each other.[10] Subfields have grown up around particular institutions, the work of individual researchers, the solution of specific problems, longstanding differences of opinion about how AI should be done and the application of widely differing tools. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects.[11] General intelligence (or "strong AI") is still a long-term goal of (some) research.[12]

Topics to be covered (1/2) Search

General Graph Search, A*, Admissibility, Monotonicity Iterative Deepening, α-β pruning, Application in game playing

Logic Formal System, axioms, inference rules, completeness, soundness and

consistency Propositional Calculus, Predicate Calculus, Fuzzy Logic, Description

Logic, Web Ontology Language Knowledge Representation

Semantic Net, Frame, Script, Conceptual Dependency Machine Learning

Decision Trees, Neural Networks, Support Vector Machines, Self Organization or Unsupervised Learning

Topics to be covered (2/2) Evolutionary Computation

Genetic Algorithm, Swarm Intelligence Probabilistic Methods

Hidden Markov Model, Maximum Entropy Markov Model, Conditional Random Field

IR and AI Modeling User Intention, Ranking of Documents, Query Expansion,

Personalization, User Click Study Planning

Deterministic Planning, Stochastic Methods Man and Machine

Natural Language Processing, Computer Vision, Expert Systems Philosophical Issues

Is AI possible, Cognition, AI and Rationality, Computability and AI, Creativity

Foundational Points

Church Turing Hypothesis Anything that is computable is computable

by a Turing Machine Conversely, the set of functions computed

by a Turing Machine is the set of ALL and ONLY computable functions

Turing MachineFinite State Head (CPU)

Infinite Tape (Memory)

Foundational Points (contd)

Physical Symbol System Hypothesis (Newel and Simon) For Intelligence to emerge it is enough to

manipulate symbols


Society of Mind (Marvin Minsky) Intelligence emerges from the interaction

of very simple information processing units Whole is larger than the sum of parts!


Limits to computability Halting problem: It is impossible to

construct a Universal Turing Machine that given any given pair <M, I> of Turing Machine M and input I, will decide if M halts on I

What this has to do with intelligent computation? Think!


Limits to Automation Godel Theorem: A “sufficiently powerful”

formal system cannot be BOTH complete and consistent

“Sufficiently powerful”: at least as powerful as to be able to capture Peano’s Arithmetic

Sets limits to automation of reasoning


Limits in terms of time and Space NP-complete and NP-hard problems: Time

for computation becomes extremely large as the length of input increases

PSPACE complete: Space requirement becomes extremely large

Sets limits in terms of resources

Two broad divisions of Theoretical CS

Theory A Algorithms and Complexity

Theory B Formal Systems and Logic

AI as the forcing function

Time sharing system in OS Machine giving the illusion of attending

simultaneously with several people Compilers

Raising the level of the machine for better man machine interface

Arose from Natural Language Processing (NLP) NLP in turn called the forcing function for AI

Allied DisciplinesPhilosophy Knowledge Rep., Logic, Foundation of

AI (is AI possible?)Maths Search, Analysis of search algos, logic

Economics Expert Systems, Decision Theory, Principles of Rational Behavior

Psychology Behavioristic insights into AI programs

Brain Science Learning, Neural Nets

Physics Learning, Information Theory & AI, Entropy, Robotics

Computer Sc. & Engg. Systems for AI

Goal of Teaching the course

Concept building: firm grip on foundations, clear ideas

Coverage: grasp of good amount of material, advances

Inspiration: get the spirit of AI, motivation to take up further work

Resources Main Text:

Artificial Intelligence: A Modern Approach by Russell & Norvik, Pearson, 2003.

Other Main References: Principles of AI - Nilsson AI - Rich & Knight Knowledge Based Systems – Mark Stefik

Journals AI, AI Magazine, IEEE Expert, Area Specific Journals e.g, Computational Linguistics

Conferences IJCAI, AAAI

Positively attend lectures!

Grading

Midsem Endsem Paper reading (possibly seminar) Quizzes

Modeling Human Reasoning

Fuzzy Logic

Fuzzy Logic tries to capture the human ability of reasoning with imprecise information

Works with imprecise statements such as:In a process control situation, “If the

temperature is moderate and the pressure is high, then turn the knob slightly right”

The rules have “Linguistic Variables”, typically adjectives qualified by adverbs (adverbs are hedges).

Alternatives to fuzzy logic model human reasoning (1/2)

Non-numerical Non monotonic Logic

Negation by failure (“innocent unless proven guilty”)

Abduction (PQ AND Q gives P)

Modal Logic New operators beyond AND, OR, IMPLIES,

Quantification etc.

Naïve Physics

Abduction Example

Ifthere is rain (P)

Thenthere will be no picnic (Q)

Abductive reasoning:Observation: There was no picnic(Q)Conclude : There was rain(P); in absence of any other evidence

Alternatives to fuzzy logic model human reasoning (2/2)

Numerical Fuzzy Logic Probability Theory

Bayesian Decision Theory

Possibility Theory Uncertainty Factor based on

Dempster Shafer Evidence Theory (e.g. yellow_eyesjaundice; 0.3)

Linguistic Variables Fuzzy sets are named

by Linguistic Variables (typically adjectives).

Underlying the LV is a numerical quantityE.g. For ‘tall’ (LV), ‘height’ is numerical quantity.

Profile of a LV is the plot shown in the figure shown alongside.

µtall(h)

1 2 3 4 5 60

height h

1

0.4

4.5

Example Profiles

µrich(w)

wealth w

µpoor(w)

wealth w

Example Profiles

µA (x)

x

µA (x)

x

Profile representingmoderate (e.g. moderately rich)

Profile representingextreme

Concept of Hedge Hedge is an intensifier Example:

LV = tall, LV1 = very tall, LV2 = somewhat tall

‘very’ operation: µvery tall(x) = µ2

tall(x) ‘somewhat’ operation:µsomewhat tall(x) = √(µtall(x))

1

0h

µtall(h)

somewhat tall tall

very tall

An ExampleControlling an inverted pendulum:

θ dtd /.

= angular velocity

Motor i=current

The goal: To keep the pendulum in vertical position (θ=0)in dynamic equilibrium. Whenever the pendulum departs from vertical, a torque is produced by sending a current ‘i’

Controlling factors for appropriate current

Angle θ, Angular velocity θ.

Some intuitive rules

If θ is +ve small and θ. is –ve small

then current is zero

If θ is +ve small and θ. is +ve small

then current is –ve medium

-ve med

-ve small

Zero

+ve small

+ve med

-ve med

-ve small Zero +ve

small+ve med

+ve med

+ve small

-ve small

-ve med

-ve small

+ve small

Zero

Zero

Zero

Region of interest

Control Matrix

θ. θ

Each cell is a rule of the form

If θ is <> and θ. is <>

then i is <>

4 “Centre rules”

1. if θ = = Zero and θ. = = Zero then i = Zero

2. if θ is +ve small and θ. = = Zero then i is –ve small

3. if θ is –ve small and θ.= = Zero then i is +ve small

4. if θ = = Zero and θ. is +ve small then i is –ve small

5. if θ = = Zero and θ. is –ve small then i is +ve small

Linguistic variables

1. Zero

2. +ve small

3. -ve small

Profiles

-ε +εε2-ε2

-ε3 ε3

+ve small-ve small

1

Quantity (θ, θ., i)

zero

Inference procedure

1. Read actual numerical values of θ and θ.

2. Get the corresponding µ values µZero, µ(+ve small), µ(-ve small). This is called FUZZIFICATION

3. For different rules, get the fuzzy I-values from the R.H.S of the rules.

4. “Collate” by some method and get ONE current value. This is called DEFUZZIFICATION

5. Result is one numerical value of ‘i’.

if θ is Zero and dθ/dt is Zero then i is Zeroif θ is Zero and dθ/dt is +ve small then i is –ve smallif θ is +ve small and dθ/dt is Zero then i is –ve smallif θ +ve small and dθ/dt is +ve small then i is -ve medium

-ε +εε2-ε2

-ε3 ε3

+ve small-ve small

1


zero

Rules Involved

Suppose θ is 1 radian and dθ/dt is 1 rad/secµzero(θ =1)=0.8 (say)Μ+ve-small(θ =1)=0.4 (say)µzero(dθ/dt =1)=0.3 (say)µ+ve-small(dθ/dt =1)=0.7 (say)

-ε +εε2-ε2

-ε3 ε3

+ve small-ve small

1


zero

Fuzzification

1rad

1 rad/sec

Suppose θ is 1 radian and dθ/dt is 1 rad/secµzero(θ =1)=0.8 (say)µ +ve-small(θ =1)=0.4 (say)µzero(dθ/dt =1)=0.3 (say)µ+ve-small(dθ/dt =1)=0.7 (say)

Fuzzification

if θ is Zero and dθ/dt is Zero then i is Zeromin(0.8, 0.3)=0.3

hence µzero(i)=0.3if θ is Zero and dθ/dt is +ve small then i is –ve small

min(0.8, 0.7)=0.7hence µ-ve-small(i)=0.7

if θ is +ve small and dθ/dt is Zero then i is –ve smallmin(0.4, 0.3)=0.3

hence µ-ve-small(i)=0.3if θ +ve small and dθ/dt is +ve small then i is -ve medium

min(0.4, 0.7)=0.4hence µ-ve-medium(i)=0.4

-ε +ε-ε2

-ε3

-ve small

1zero

Finding i

0.4

0.3Possible candidates:

i=0.5 and -0.5 from the “zero” profile and µ=0.3i=-0.1 and -2.5 from the “-ve-small” profile and µ=0.3i=-1.7 and -4.1 from the “-ve-small” profile and µ=0.3

-4.1-2.5

-ve small-ve medium

0.7

-ε +ε

-ve smallzero

Defuzzification: Finding iby the centroid method

Possible candidates:i is the x-coord of the centroid of the areas given by the

blue trapezium, the green trapeziums and the black trapezium

-4.1-2.5

-ve medium

Required i valueCentroid of three trapezoids

Fuzzy Sets

Theory of Fuzzy Sets

Intimate connection between logic and set theory. Given any set ‘S’ and an element ‘e’, there is a very

natural predicate, µs(e) called as the belongingness predicate.

The predicate is such that,µs(e) = 1, iff e ∈ S

= 0, otherwise For example, S = {1, 2, 3, 4}, µs(1) = 1 and µs(5) =

0 A predicate P(x) also defines a set naturally.

S = {x | P(x) is true}For example, even(x) defines S = {x | x is even}

Fuzzy Set Theory (contd.)

Fuzzy set theory starts by questioning the fundamental assumptions of set theory viz., the belongingness predicate, µ, value is 0 or 1.

Instead in Fuzzy theory it is assumed that,µs(e) = [0, 1]

Fuzzy set theory is a generalization of classical set theory aka called Crisp Set Theory.

In real life, belongingness is a fuzzy concept.Example: Let, T = “tallness”

µT (height=6.0ft ) = 1.0µT (height=3.5ft) = 0.2

An individual with height 3.5ft is “tall” with a degree 0.2

Representation of Fuzzy setsLet U = {x1,x2,…..,xn}|U| = n

The various sets composed of elements from U are presented as points on and inside the n-dimensional hypercube. The crisp sets are the corners of the hypercube.

(1,0)(0,0)

(0,1) (1,1)

x1

x2

x1

x2

(x1,x2)

A(0.3,0.4)

µA(x1)=0.3

µA(x2)=0.4

Φ

U={x1,x2}

A fuzzy set A is represented by a point in the n-dimensional space as the point {µA(x1), µA(x2),……µA(xn)}

Degree of fuzzinessThe centre of the hypercube is the most fuzzyset. Fuzziness decreases as one nears the corners

Measure of fuzzinessCalled the entropy of a fuzzy set

),(/),()( farthestSdnearestSdSE

Entropy

Fuzzy set Farthest corner

Nearest corner

(1,0)(0,0)

(0,1) (1,1)

x1

x2

d(A, nearest)

d(A, farthest)

(0.5,0.5)A

Definition

Distance between two fuzzy sets

|)()(|),(21

121 is

n

iis xxSSd

L1 - norm

Let C = fuzzy set represented by the centre point

d(c,nearest) = |0.5-1.0| + |0.5 – 0.0|

= 1

= d(C,farthest)

=> E(C) = 1

Definition

Cardinality of a fuzzy set

n

iis xsm

1)()( (generalization of cardinality of

classical sets)

Union, Intersection, complementation, subset hood

)(1)( xx ssc

Uxxxx ssss )),(),(max()(2121

Uxxxx ssss )),(),(min()(2121

Example of Operations on Fuzzy Set Let us define the following:

Universe U={X1 ,X2 ,X3} Fuzzy sets

A={0.2/X1 , 0.7/X2 , 0.6/X3} and

B={0.7/X1 ,0.3/X2 ,0.5/X3}

Then Cardinality of A and B are computed as follows:Cardinality of A=|A|=0.2+0.7+0.6=1.5Cardinality of B=|B|=0.7+0.3+0.5=1.5

While distance between A and B d(A,B)=|0.2-0.7)+|0.7-0.3|+|0.6-0.5|=1.0What does the cardinality of a fuzzy set mean? In crisp sets it

means the number of elements in the set.

Example of Operations on Fuzzy Set (cntd.)

Universe U={X1 ,X2 ,X3} Fuzzy sets A={0.2/X1 ,0.7/X2 ,0.6/X3} and B={0.7/X1 ,0.3/X2

,0.5/X3}

A U B= {0.7/X1, 0.7/X2, 0.6/X3}

A ∩ B= {0.2/X1, 0.3/X2, 0.5/X3}

Ac = {0.8/X1, 0.3/X2, 0.4/X3}

Laws of Set Theory

• The laws of Crisp set theory also holds for fuzzy set theory (verify them)

• These laws are listed below:– Commutativity: A U B = B U A– Associativity: A U ( B U C )=( A U B ) U C– Distributivity: A U ( B ∩ C )=( A ∩ C ) U ( B ∩ C)

A ∩ ( B U C)=( A U C) ∩( B U C)– De Morgan’s Law: (A U B) C= AC ∩ BC

(A ∩ B) C= AC U BC

Distributivity Property Proof Let Universe U={x1,x2,…xn}

pi =µAU(B∩C)(xi)=max[µA(xi), µ(B∩C)(xi)]= max[µA(xi), min(µB(xi),µC(xi))]

qi =µ(AUB) ∩(AUC)(xi)=min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]

Distributivity Property Proof Case I: 0<µC<µB<µA<1

pi = max[µA(xi), min(µB(xi),µC(xi))]= max[µA(xi), µC(xi)]=µA(xi)

qi =min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]= min[µA(xi), µA(xi)]=µA(xi)

Case II: 0<µC<µA<µB<1pi = max[µA(xi), min(µB(xi),µC(xi))]

= max[µA(xi), µC(xi)]=µA(xi)qi =min[max(µA(xi), µB(xi)), max(µA(xi), µC(xi))]

= min[µB(xi), µA(xi)]=µA(xi)Prove it for rest of the 4 cases.

Note on definition by extension and intension

S1 = {xi|xi mod 2 = 0 } – Intension

S2 = {0,2,4,6,8,10,………..} – extension

How to define subset hood?

Meaning of fuzzy subset

Suppose, following classical set theory we say

if

Consider the n-hyperspace representation of A and B

AB

xxx AB )()(

(1,1)

(1,0)(0,0)

(0,1)

x1

x2

A. B1

.B2

.B3

Region where )()( xx AB

This effectively means

CRISPLY

P(A) = Power set of A

Eg: Suppose

A = {0,1,0,1,0,1,…………….,0,1} – 104 elements

B = {0,0,0,1,0,1,……………….,0,1} – 104 elements

Isn’t with a degree? (only differs in the 2nd element)

)(APB

AB

Subset operator is the “odd man” out AUB, A∩B, Ac are all “Set Constructors” while

A B is a Boolean Expression or predicate. According to classical logic

In Crisp Set theory A B is defined as x xA xB

So, in fuzzy set theory A B can be defined as x µA(x) µB(x)

Zadeh’s definition of subsethood goes against the grain of fuzziness theory

Another way of defining A B is as follows:

x µA(x) µB(x)

But, these two definitions imply that µP(B)(A)=1where P(B) is the power set of B

Thus, these two definitions violate the fuzzy principle that every belongingness except Universe is fuzzy

Fuzzy definition of subsetMeasured in terms of “fit violation”, i.e. violating the condition

Degree of subset hood S(A,B)= 1- degree of superset

=

m(B) = cardinality of B

=

)()( xx AB

)(

))()(,0max(1

Bm

xxx

AB

x

B x)(

We can show that

Exercise 1:

Show the relationship between entropy and subset hood

Exercise 2:

Prove that

),()( cc AAAASAE

)(/)(),( BmBAmABS

Subset hood of B in A

Fuzzy sets to fuzzy logic

Forms the foundation of fuzzy rule based system or fuzzy expert system

Expert System

Rules are of the form

If

then

Ai

Where Cis are conditions

Eg: C1=Colour of the eye yellow

C2= has fever

C3=high bilurubin

A = hepatitis

nCCC ...........21

In fuzzy logic we have fuzzy predicates

Classical logic

P(x1,x2,x3…..xn) = 0/1

Fuzzy Logic

P(x1,x2,x3…..xn) = [0,1]

Fuzzy OR

Fuzzy AND

Fuzzy NOT

))(),(max()()( yQxPyQxP

))(),(min()()( yQxPyQxP

)(1)(~ xPxP

Fuzzy Implication Many theories have been advanced and many

expressions exist The most used is Lukasiewitz formula t(P) = truth value of a proposition/predicate. In

fuzzy logic t(P) = [0,1] t( ) = min[1,1 -t(P)+t(Q)]QP

Lukasiewitz definition of implication

Fuzzy Inferencing

Two methods of inferencing in classical logic Modus Ponens

Given p and pq, infer q

Modus Tolens Given ~q and pq, infer ~p

How is fuzzy inferencing done?

A look at reasoning

Deduction: p, pq|- q Induction: p1, p2, p3, …|- for_all p Abduction: q, pq|- p Default reasoning: Non-monotonic

reasoning: Negation by failure If something cannot be proven, its

negation is asserted to be true E.g., in Prolog

Fuzzy Modus Ponens in terms of truth values

Given t(p)=1 and t(pq)=1, infer t(q)=1 In fuzzy logic,

given t(p)>=a, 0<=a<=1 and t(p>q)=c, 0<=c<=1 What is t(q)

How much of truth is transferred over the channel

p q

Lukasiewitz formulafor Fuzzy Implication

t(P) = truth value of a proposition/predicate. In fuzzy logic t(P) = [0,1]

t( ) = min[1,1 -t(P)+t(Q)]QP

Lukasiewitz definition of implication

Use Lukasiewitz definition t(pq) = min[1,1 -t(p)+t(q)] We have t(p->q)=c, i.e., min[1,1 -t(p)+t(q)]=c Case 1: c=1 gives 1 -t(p)+t(q)>=1, i.e., t(q)>=a Otherwise, 1 -t(p)+t(q)=c, i.e., t(q)>=c+a-1 Combining, t(q)=max(0,a+c-1) This is the amount of truth transferred over the

channel pq

Two equations consistent

These two equations are consistent with each other

1 2

( , ) 1 ( , )max(0, ( ) ( ))

1 where { , ,..., }( )

( ( ) ( )) min(1,1 ( ( )) ( ( )))

i

i

B i A ix U

nB i

x U

B i A i B i A i

Sub B A Sup B Ax x

U x x xx

t x x t x t x

Proof

Let us consider two crisp sets A and B

1 U

AB

2 U

BA

3 U

A B

4 U

AB

Proof (contd…)

Case I:

So,

( ) 1 only when ( ) 1 , ( ) ( ) 0A i B i B i A ix x So x x

max(0, ( ) ( ))( , ) 1

( )

01 1( )

i

i

i

B i A ix U

B ix U

B ix U

x xSub B A

x

x

Proof (contd…)

Thus, in case I these two equations are consistent with each other (prove for other cases)

( ) ( ) 0( ( ) ( )) min(1,1 ( ( ( )) ( ( ))))

min(1,1 ( )) 1

B i A i

B i A i B i A i

Since x xL t x x t x t x

ve

Proof of the fact that S(B,A) is consistent with crisp set theory

Case II (of the figure):

So,

( ) 1 if, ( ) 1 , ( ) ( ) 0B i A i B i A ix x So x x

max(0, ( ) ( ))( , ) 1

( )i

i

B i A ix U

B ix U

x xSub B A

x

( ) ( ) 0( )( , ) 1 1 ,( )

0 ( , ) 1

B i A i

A i

B i

x xxSub B Ax

Sub B A

Proof (contd…)

Thus, in case II also, these two equations are consistent with each other.

Proof of case III

Case III:

So,

In This case, ( ) ( ) 0 for ( )and ( ) ( ) 0 otherwise

B i A i i

B i A i

x x x B Ax x

max(0, ( ) ( ))( , ) 1

( )i

i

B i A ix U

B ix U

x xSub B A

x

Proof (contd…)( ) ( ) 0 only for ( )

| | | |( , ) 1| | | |

B i A i ix x x B AB A A BS B A

B B

Hence case III is also consistent across classical set theory and fuzzy set theory

Proof of case IV Case IV:

So,

In This case, ( ) ( ) 1 for , ( ) ( ) 1 for

( ) ( ) 0 otherwise

B i A i i

B i A i i

B i A i

x x x Ax x x B

and x x

max(0, ( ) ( ))( , ) 1

( )i

i

B i A ix U

B ix U

x xSub B A

x

Proof (contd…)( ) ( ) 1 only for

<=0 otherwise,| |( , ) 1 0| |

B i A i ix x x Band

BS B AB

Thus, in case IV also, these two equations are consistent with each other.

Hence we can say that these two equations are consistent with each other in general.