Chapter17 Learning

8/8/2019 Chapter17 Learning

1/30

Learning

Chapter 17: Rich & knight


2/30

Learning

What is Learning?

Rote learning

Learning by taking advice

Learning in problem solving Learning from examples

Induction

Explanation based learning

Discovery analogy Formal learning theory

Neural net learning and genetic learning


3/30

What is Learning?

Most often heard criticisms of AI is that machines cannotbe called intelligent until they are able to learn to do newthings and adapt to new situations, rather than simplydoing as they are told to do.

Some critics of AI have been saying that computerscannot learn!

Definitions of Learning: changes in the system that areadaptive in the sense that they enable the system to dothe same task or tasks drawn from the same populationmore efficiently and more effectively the next time.

Learning covers a wide range of phenomenon: Skill refinement : Practice makes skills improve. More you play

tennis, better you get

Knowledge acquisition: Knowledge is generally acquired throughexperience


4/30

Various learning mechanisms

Simple storing of computed information or rote learning,is the most basic learning activity.

Many computer programs ie., database systems can besaid to learn in this sense although most people would

not call such simple storage learning. Another way we learn if through taking advice from

others. Advice taking is similar to rote learning, but high-level advice may not be in a form simple enough for aprogram to use directly in problem solving.

People also learn through their own problem-solvingexperience.

Learning from examples : we often learn to classifythings in the world without being given explicit rules.

Learning from examples usually involves a teacher whohelps us classify things by correcting us when we arewrong.


5/30

Rote Learning

When a computer stores a piece of data, it is performing arudimentary form of learning.

In case of data caching, we store computed values so that we do nothave to recompute them later.

When computation is more expensive than recall, this strategy can

save a significant amount of time. Caching has been used in AI programs to produce some surprising

performance improvements.

Such caching is known as rote learning.

Rote learning does not involve any sophisticated problem-solvingcapabilities.

It shows the need for some capabilities required of complex learningsystems such as: Organized Storage of information

Generalization


6/30

Learning by taking Advice

A computer can do very little without a program for it to run.

When a programmer writes a series of instructions into a computer,a rudimentary kind of learning is taking place: The programmer issort of a teacher and the computer is a sort of student.

After being programmed, the computer is now able to do something

it previously could not. Executing a program may not be such a simple matter.

Suppose the program is written in high level language such asProlog, some interpreter or compiler must intervene to change theteachers instructions into code that the machine can executedirectly.

People process advice in an analogous way. In chess, the advice fight for control of the center of the board is

useless unless the player can translate the advice into concretemoves and plans. A computer program might make use of theadvice by adjusting its static evaluation function to include a factorbased on the number of center squares attacked by its own pieces.


7/30

Learning by advice

A program called FOO, which accepts advice forplaying hearts, a card game. A human user firsttranslates the advice from english into a

representation that FOO can understand. A human can watch FOO play, detect new

mistakes, and correct them through yet moreadvice, such as play high cards when it is safe

to do so. The ability to operationalize knowledge is criticalfor systems that learn from a teachers advice.


8/30

Learning In Problem solving

Can program get better without the aid of

a teacher?

It can be by generalizing from its ownexperiences.


9/30

Learning by parameter adjustment Many programs rely on an evaluation procedure that combines information

from several sources into a single summary statistic.

Game playing programs do this in their static evaluation functions in which avariety of factors such as piece advantage and mobility are combined into asingle score reflecting the desirability of a particular board position.

Pattern classification programs often combine several features to determinethe correct category into which a given stimulus should be placed.

In designing such programs, it is often difficult to know a priori how muchweight should be attached to each feature being used.

One way of finding the correct weights is to begin with some estimate of thecorrect settings and then to let the program modify the settings on the basisof its experience.

Features that appear to be good predictors of overall success will have theirweights increased, while those that do not will have their weights

decreased. Samuels checkers program uses static evaluation function in the

polynomial: c1t1 + c2t2 + +c16 t16

The t terms are the values of the sixteen features that contribute to theevaluation.

The c terms are the coefficients that are attached to each of these values.As learning progresses, the c values will change.


10/30

Learning by Macro-operators Sequences of actions that can be treated as a whole are

called macro-operators.

Example: suppose you are faced with the problem ofgetting to the downtown post office. Your solution mayinvolve getting in your car, starting it, and driving along a

certain route. Substantial planning may go into choosingthe appropriate route, but you need not plan about howto about starting the car. You are free to treat START-CAR as an atomic action, even though it really consistsof several actions: sitting down, adjusting the mirror,inserting the key, and turning the key.

Macro-operators were used in the early problem solvingsystem STRIPS. After each problem solving episode, thelearning component takes the computed plan and storesit away as a macro-operator, or MACROP.

MACROP is just like a regular operator, except that it

consists of a sequence of actions, not just a single one.


11/30

Learning by Chunking Chunking is a process similar in flavor to macro-operators. The idea of chunking comes from the psychological literature

on memory and problem solving. Its computational basis is inProduction systems.

When a system detects useful sequence of production firings,

it creates chunk, which is essentially a large production thatdoes the work of an entire sequence of smaller ones.

SOAR is an example production system which uses chunking.

Chunks learned during the initial stages of solving a problemare applicable in the later stages of the same problem-solving

episode. After a solution is found, the chunks remain in memory, readyfor use in the next problem.

At present, chunking is inadequate for duplicating the contentsof large directly-computed macro-operator tables.


12/30

The utility problem While new search control knowledge can be of great benefit in solving

future problems efficiently, there are also some drawbacks.

The learned control rules can take up large amounts of memory andthe search program must take the time to consider each rule at eachstep during problem solving.

Considering a control rule amounts to seeing if its post conditions are

desirable and seeing if its preconditions are satisfied. This is a time consuming process.

While learned rules may reduce problem-solving time by directing thesearch more carefully, they may also increase problem-solving time byforcing the problem solver to consider them.

If we only want to minimize the number of node expansions in the

search space, then the more control rules we learn, the better. But if we want to minimize the total CPU time required to solve aproblem, we must consider this trade off.


13/30

Learning from Examples: Induction

Classification is the process of assigning, to a particular input, thename of a class to which it belongs.

The classes from which the classification procedure can choose canbe described in a variety of ways.

Their definition will depend on the use to which they are put.

Classification is an important component of many problem solvingtasks.

Before classification can be done, the classes it will use must bedefined: Isolate a set of features that are relevant to the task domain.Define each

class by a weighted sum of values of these features. Ex: task is weatherprediction, the parameters can be measurements such as rainfall,location of cold fronts etc.

Isolate a set of features that are relevant to the task domain. Defineeach class as a structure composed of these features. Ex: classifyinganimals, various features can be such things as color,length of neck etc

The idea of producing a classification program that can evolve itsown class definitions is called concept learning or induction.


14/30

Winstons Learning Program

An early structural concept learning program.

This program operates in a simple blocks worlddomain.

Its goal was to construct representations of thedefinitions of concepts in blocks domain.

For example, it learned the concepts House,Tent and Arch.

A near miss is an object that is not an instanceof the concept in question but that is very similarto such instances.


15/30

Basic approach of Winstons

Program1. Begin with a structural description of oneknown instance of the concept. Call thatdescription the concept defintion.

2. Examine descriptions of other knowninstances of the concepts. Generalize thdefinition to include them.

3. Examine the descriptions of near missesof the concept. Restrict the definition toexclude these.


16/30

Version spaces

The goal os version spaces is to produce a description that isconsistent with all positive examples but no negative examples inthe training set.

This is another approach to concept learning.

Version spaces work by maintaining a set of possible descriptions

and evolving that set as new examples and near misses arepresented.

The version space is simply a set of descriptions, so an initial idea isto keep an explicit list of those descriptions.

Version space consists of two subsets of the concept space.

One subset called G contains most general descriptions consistent

with the training examples . The other subset contains the mostspecific descriptions consistent with the training examples.

The algorithm for narrowing the version space is called theCandidate elimination algorithm.


17/30

Algorithm: Candidate Elimination

Given: A representation language and a set of positiveand negative examples expressed in that language.

Compute : A concept description that is consistent with allthe positive examples and none of the negative examples.

1. Initialize G to contain one element2. Initialize S to contain one element: the first positive

element.

3. Accept new training example.If it is a positive example,first remove from G any descriptions that do not cover the

example. Then update the set S to contain most specificset of descriptions in the version space that cover theexample and the current elements of the S set. Inverseactions for negative example

4. If S and G are both singleton sets, then if they areidentical, output their values and halt.


18/30

Decision Trees

This is a third approach to concept learning.

To classify a particular input, we start at the top of thetree and answer questions until we reach a leaf, wherethe specification is stored.

ID3 is a program example for Decision Trees. ID3 uses iterative method to build up decision trees,

preferring simple trees over complex ones, on the theorythat simple trees are more accurate classifiers of futureinputs.

It begins by choosing a random subset of the trainingexamples.

This subset is called the window.

The algorithm builds a decision tree that correctlyclassifies all examples in the windo.


19/30

Decision tree for Japanese

economy car

Origin?

India USA UKAus

Japan

Type?

Sports Economy Luxury

(-) (-) (-) (-)

(-) (+) (-)


20/30

Explanation-Based Learning

Learning complex concepts using Induction procedures typicallyrequires a substantial number of training instances.

But people seem to be able to learn quite a bit from singleexamples.

We dont need to see dozens of positive and negative examples of

fork( chess) positions in order to learn to avoid this trap in the futureand perhaps use it to our advantage.

What makes such single-example learning possible? The answer isknowledge.

Much of the recent work in machine learning has moved away fromthe empirical, data intensive approach described in the last sectiontoward this more analytical knowledge intensive approach.

A number of independent studies led to the characterization of thisapproach as explanation-base learning(EBL).

An EBL system attempts to learn from a single example x byexplaining why x is an example of the target concept.

The explanation is then generalized, and then systems performanceis improved through the availability of this knowledge.


21/30

EBL We can think of EBL programs as accepting the following as input:

A training example

A goal concept: A high level description of what the program issupposed to learn

An operational criterion- A description of which concepts are usable.

A domain theory: A set of rules that describe relationships betweenobjects and actions in a domain.

From this EBL computes a generalization of the training examplethat is sufficient to describe the goal concept, and also satisfies theoperationality criterion.

Explanation-based generalization (EBG) is an algorithm for EBLand has two steps: (1) explain, (2) generalize

During the explanation step, the domain theory is used to pruneaway all the unimportant aspects of the training example withrespect to the goal concept. What is left is an explanation of why thetraining example is an instance of the goal concept. This explanationis expressed in terms that satisfy the operationality criterion.

The next step is to generalize the explanation as far as possiblewhile still describing the goal concept.


22/30

Discovery

Learning is the process by which one entity

acquires knowledge. Usually that knowledge is

already possessed by some number of other

entities who may serve as teachers. Discovery is a restricted form of learning in

which one entity acquires knowledge without the

help of a teacher.

Theory-Driven Discovery

Data Driven Discovery

Clustering


23/30

AM: Theory-driven Discovery Discovery is certainly learning. More clearly than other kinds of

learning, problem solving.

Suppose that we want to build a program to discover things inmaths, such a program would have to rely heavily on the problem-solving techniques.

AM is written by Lenat and it worked from a few basic concepts ofset theory to discover a good deal of standard number theory.

AM exploited a variety of general-purpose AI techniques. It used aframe system to represent mathematical concepts. One of the majoractivities of AM is to create new concepts and fill in their slots.

AM uses Heuristic search, guided by a set of 250 heuristic rulesrepresenting hints about activities that are likely to lead to

interesting discoveries. In one run AM discovered the concept of prime numbers. How did it

do it? Having stumbled onto the natural numbers, AM explored operations

such as addition, multiplication and their inverses. It created the conceptof divisibilty and noticed that some numbers had very few divisors.


24/30

Bacon: Data Driven Discovery

AM showed how discovery might occur in theoritical setting.

Scientific discovery has inspired several computer models.

Langley et al presented a model of data-driven scientific discovery that has beenimplemented as a program called BACON ( named after Sir Francis Bacon, aphilosopher of science)

BACON begins with a set of variables for a problem.

For example in the study of the behavior of gases, some variables are p, the pressureon the gas, V, the volume of the gas, n, the amount of gas in moles, and T thetemperature of the gas.

Physicists have long known a law, called ideal gas law, that relates these variables.

BACON is able to derive this law on its own.

First, BACON holds the variables n and T constant, performing experiments at differentpressures p1, p2 and p3.

BACON notices that as the pressure increases, the volume V decreases.

For all values, n,p, V and T, pV/nT = 8.32 which is ideal gas law as shown by BACON.

BACON has been used to discover wide variety of scientifc laws such as Keplers thirdlaw, Ohms law, the conservation of momentum and Joules law.

BACONs discovery procedure is state-space search.

A better understanding of the science of scientific discovery may lead one day toprograms that display true creativity.

Much more work must be done in areas of science that BACON does not model.


25/30

Clustering

Clustering is very similar to induction. In Inductive learning aprogram learns to classify objects based on the labelings providedby a teacher,

In clustering, no class labelings are provided.

The program must discover for itself the natural classes that exist for

the objects, in addition to a method for classifying instances. AUTOCLASS is one program that accepts a number of training

cases and hypothesizes a set of classes.

For any given case, the program provides a set of probabilities thatpredict into which classes the case is likely to fall.

In one application, AUTOCLASS found meaningful new classes of

stars from their infrared spectral data. This was an instance of true discovery by computer, since the factsit discovered were previously unknown to astronomy.

AUTOCLASS uses statistical Bayesian reasoning of the typediscussed.


26/30

Analogy Analogy is a powerful inference tool. Our language and reasoning are laden with analogies.

Last month, the stock market was a roller coaster.

Bill is like a fire engine.

Problems in electromagnetism are just like problems in fluid flow.

Underlying each of these examples is a complicated mapping between whatappear to be dissimilar concepts.

For example, to understand the first sentence above, it is necessary to do twothings:

1. Pick out one key property of a roller coaster, namely that it travels up and downrapidly

2. Realize that physical travel is itself an analogy for numerical fluctuations.

This is no easy trick.

The space of possible analogies is very large.

An AI program that is unable to grasp analogy will be difficult to talk to andconsequently difficult to teach.

Thus analogical reasoning is an important factor in learning by advice taking.

Humans often solve problems by making analogies to things they alreadyunderstand how to do.


27/30

Formal Learning Theory

Learning has attracted the attention of mathematiciansand theoritical computer scientists.

Inductive learning in particular has received considerableattention.

Formally, a device learns a concept if it can givenpositive and negative examples, produces and algorithmthat will classify future examples correctly with probability1/h.

The complexity of learning a concept is a function ofthree factors: the error tolerance (h), the number ofbinary features present in the examples (t) and the sizeof the rule necessary to make the discrimination (f).

If the number of training examples required is polynomialin h, t, and f, then the concept is said to be learnable.


28/30

Formal Learning Theory

For example, given positive and negative examples ofstrings in some regular language, can we efficientlyinduce the finite automation that produces all and onlythe strings in the language? The answer is no; anexponential number of computational steps is required.

It is difficult to tell how such mathematical studies oflearning will affect the ways in which we solve AIproblems in practice.

After all, people are able to solve many exponentiallyhard problems by using knowledge to constrain the

space of possible solutions. Perhaps mathematical theory will one day be used to

quantify the use of such knowledge but this prospectseems far off.


29/30

Neural Net Learning and Genetic

Learning Collections of idealized neurons were presented with stimuli and

prodded into changing their behaviour via forms of reward andpunishment.

Researchers hoped that by imitating the learning mechanisms ofanimals, they might build learning machines from very simple parts.

Such hopes proved elusive. However, the field of neural network learning has seen a resurgencein recent years, partly as a result of the discovery of powerful newlearning algorithms.

While neural network models are based on a computational brainmetaphor,of a number of other learning techniques make use of ametaphor based on evolution.

In this work, learning occurs through a selection process that beginswith a large population of random programs.


30/30

Summary

The mos important thing to conclude from our

study of automated learning is that learning itself

is a problem-solving process.

Learning by taking advice

Learning from examples

Learning in problem solving

Discovery

A learning machine is the dream system of AI

Documents

Chapter17 Learning