115
1 Instance Based Learning Soongsil University Intelligent Systems Lab.

1 Instance Based Learning Soongsil University Intelligent Systems Lab

Embed Size (px)

Citation preview

Page 1: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

1

Instance Based Learning

Soongsil University

Intelligent Systems Lab.

Page 2: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

2

Content

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Page 3: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

3

Instance-based learning

One way of solving tasks of approximating discrete or real valued target functions

Have training examples: (xn, f(xn)), n=1..N. Key idea:

just store the training examples when a test example is given then find the closest matches

Page 4: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

4

Motivation: Eager Learning

The Learning Task: Try to approximate a target function through a hypothesis on the basis of training examples

EAGER Learning:As soon as the training examples and the hypothesis space are re-ceived the search for the first hypothesis begins Training phase:

given: training examples D=<Xi, f(Xi)> hypothesis space Hsearch: best hypothesis

Processing phase:for every new instance xq return

Examples

Radial based function

qf̂ x

Page 5: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

5

Motivation: Lazy Algorithms

LAZY ALGORITHMS: Training examples are stored and sleeping Generalisation beyond these examples is postponed

till new instances must be classified Every time a new query instance is encountered, its

relationship to the previously stored examples is ex-amined in order to compute the value of the target function for this new instance

Page 6: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

6

Motivation: Instance-Based Learning

Instance-Based Algorithms can establish a new local approximation for every new instance Training phase:

given: training sample D=<Xi, f(Xi)> Processing phase:

given: instance xq search: best local hypothesis return

Examples: Nearest Neighbour Algorithm Distance Weighted Nearest Neighbour Locally Weighted Regression ....

qf̂ x

Page 7: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

7

Motivation: Instance-Based Learning

How are the instances represented? How can we measure the similarity of the in-

stances? How can be computed?f xq

Page 8: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Content

Page 9: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

X X

Page 10: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Nearest Neighbour Algorithm

Idea: All instances correspond to the points in the n-dimensional space .Assign the value of the next, neighboured instance to the new instance

Representation: Let be an instance, where denotes the value of the r-th attribute of an instance x

Target Function: Discrete valued or real valued

n

i 1 i 2 i n ix = a x ,a x ,...,a x r ia x

We may also use Xir instead of

r ia x

Page 11: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

1-Nearest Neighbor

Four things make a memory based learner:

1. A distance metric

Euclidian

2. How many nearby neighbors to look at?

One

3. A weighting function (optional)

Unused

4. How to fit with the local points?

Just predict the same output as the nearest neighbor.

11

Page 12: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Nearest Neighbour Algorithm

HOW IS FORMED?

Discrete target function:

where V: set of s classes (e.g., red, black, yellow…)

Continuous target function:

Let the next neighbour of

n1, 2, sf : V | V = v v ..., v

nx

qf̂ x

n nf :

qx

n q i i qd x ,x min d x ,x

q nf̂ x = f x

Page 13: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

13

1-Nearest neighbour:Given a query instance xq, • first locate the nearest training example xn

• then := f(xn)

K-Nearest neighbour:Given a query instance xq, • first locate the k nearest training examples • If discrete valued target function, take vote among its k nearest

neighbour. (e.g., X, X, O, O, X, O, X, X)X

Else if real valued target faction, take the mean of the f values of the k nearest neighbour

k

)f(x:)(xf̂

k

1ii

q

Nearest Neighbour Algorithm

)(xf̂ q

Page 14: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

How to choose “k”

Average of k points more reliable when: noise in attributes noise in class labels classes partially overlap

Large k: less sensitive to noise (particularly class noise) better probability estimates for discrete classes larger training sets allow larger values of k

Small k: captures fine structure of problem space better may be necessary with small training sets

Balance must be struck between large and small k As training set approaches infinity, and k grows large, kNN be-

comes Bayes optimal14if p (x) > .5 then predict 1, else 0

Page 15: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

k-Nearest Neighbor

Four things make a memory based learner:

1. A distance metric

Euclidian

2. How many nearby neighbors to look at?

k

3. A weighting function (optional)

Unused

4. How to fit with the local points?

Just predict the average output among the k nearest neighbors.

15

Page 16: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

k-Nearest Neighbour

Idea:

If we choose k=1, then the algorithm assigns to the value f (xi)

where xi is the nearest training instance to xq

For larger values of k, the algorithm assigns the most common value among the k nearest training examples

How can be established ?

where ( , )=1 if a = b,𝛿 𝑎 𝑏 𝛿( , )=0 otherwise 𝑎 𝑏

qf̂ x

qf̂ x

k

1i

iiVv

q ))f(xδ(v,wargmax)(xf̂

Let xi, …xk denote the k instances from training examples that are nearest to xq

Page 17: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

k-Nearest Neighbour Algorithm

Training algorithm: For each training example < x, f(x) >, add the example to the list

training_examples

[Classification algorithm]: Given a query instance xq to be classified,

Let xi, …xk denote the k instances from training examples that are nearest to xq

Return

where if a = b,

otherwise

k

1i

iiVv

q ))f(xδ(v,wargmax)(xf̂

Page 18: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

18

The distance between examples

We need a measure of distance in order to know who are the neighbours

Assume that we have T attributes for the learning problem. Then one example point x has elements xt , t=1,…T.

The distance between two points xi , xj is often defined as the Euclidean distance:

T

t

tjtiji xxd1

2][),( xx

Page 19: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Similarity and Dissimilarity Be-tween Objects

Distances are normally used measures Minkowski distance: a generalization

If q = 2, d is Euclidean distance If q = 1, d is Manhattan distance Weighed distance

)0(||...||||),(2211

qj

xi

xj

xi

xj

xi

xjxixd qq

pp

qq

)0()||...||2

||1

),(2211

qj

xi

xpwj

xi

xwj

xi

xwjxixd qq

pp

qq

Page 20: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

20

Voronoi Diagram

qf̂ x = + qf̂ x =

Example:

1NN: 5-NN: Voronoi Diagram

Voronoi Diagram: The decision surface is induced by a 1-Nearest Neighbour algorithm for a typical set of training examples. The convex surrounding of each training example indicates the region of query points whose classification will be completely determined by the training example.

Page 21: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

21

Characteristics of Inst-b-Learning

An instance-based learner is a lazy-learner and does all the work when the test example is presented. This is opposed to so-called eager-learners, which build a parameterised compact model of the target.

It produces local approximation to the target function (different with each test instance)

Page 22: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

22

When to consider Nearest Neighbour algorithms?

Instances map to points in Not more than say 20 attributes per instance Lots of training data Advantages:

Training is very fast Can learn complex target functions Don’t lose information

Disadvantages: ? (will see them shortly…)

n

Page 23: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

23

twoone

four

three

five six

seven Eight ?

Page 24: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

24

Training data

Number Lines Line types Rectangles Colours Mondrian?

8 7 2 9 4

Test instance

Number Lines Line types Rectangles Colours Mondrian?

1 6 1 10 4 No

2 4 2 8 5 No

3 5 2 7 4 Yes

4 5 1 8 4 Yes

5 5 1 10 5 No

6 6 1 8 6 Yes

7 7 1 14 5 No

3

11

8

6

7

4

27

T

t

tjtiji xxd1

2][),( xx

Page 25: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

25

Distances of test instance from training data

Classification

1-NN No

3-NN Yes

5-NN Yes

7-NN No

Example

example

Mondrian?

1 No

2 No

3 Yes

4 Yes

5 No

6 Yes

7 No

Mondrian?Distanceof test from

3

11

8

6

7

4

27

THINK a moment !!. DOES this seem sensible to you ?

Isn’t the calculation being skewed by the large values of the rectangle data relative to the other data?

Page 26: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

26

Keep data in normalised form

One way to normalise the data ar(x) to a´r(x) is

t

ttt

xxx

'

attributestofmeanx tht

attributestofdeviationndardsta tht

Page 27: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

27

Normalised training dataNumber Lines Line

types Rectangles Colours Mondrian?

1 0.632 -0.632 0.327 -1.021 No

2 -1.581 1.581 -0.588 0.408 No

3 -0.474 1.581 -1.046 -1.021 Yes

4 -0.474 -0.632 -0.588 -1.021 Yes

5 -0.474 -0.632 0.327 0.408 No

6 0.632 -0.632 -0.588 1.837 Yes

7 1.739 -0.632 2.157 0.408 No

Number Lines Line types

Rectangles Colours Mondrian?

8 1.739 1.581 -0.131 -1.021

Test instance

Page 28: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

28

Distances of test instance from training data

ClassificationAfter Normalize Before After Normalize

1-NN Yes No

3-NN Yes Yes

5-NN No Yes

7-NN No No

Example

example

Mondrian?

1 2.517 No

2 3.644 No

3 2.395 Yes

4 3.164 Yes

5 3.472 No

6 3.808 Yes

7 3.490 No

Mondrian?Distanceof test from

3

11

8

6

7

4

27

Page 29: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Content

Page 30: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

30

Difficulties with k-nearest neighbour al-gorithms

Have to calculate the distance of the test case from all training cases

There may be irrelevant attributes amongst the attributes – curse of dimensionality

Page 31: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

31

What if the target function is real valued?

The k-nearest neighbour algorithm would just calculate the mean of the k nearest neighbours

Page 32: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

The weights of the neighbours are taken into account relative to their dis-tance to the query point.

To accommodate the case where the query point exactly matches one of the training instances and the denominator therefore is zero, we as-sign to be in this case Query point 와 정확히 일치하는 학습 data 가 있으면 ←

qx

qf̂ x if x qx

Distance-Weighted KNN

2),(

1

iqi

xxdw

Might want nearer neighbors with more heavy weight:

qf̂ x if x

Page 33: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Distance-Weighted KNN

33

For discrete-valued target functions:

For real-valued target functions:

Shepard method

ki i

ki ii

qw

xfwxf

1

1 )()(ˆ

k

iii

Vvq xfvwxf

1))(,(maxarg)(ˆ

Distance-weight

V: set of s classes (e.g., red, black, yellow…)

where if a = b, otherwise

Page 34: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Remarks on k-Nearest Neighbour Algorithm

PROBLEM:The measurement of the distance between two instances considers every attribute. So even irrelevant attributes can influence the approx-imation.

EXAMPLE: n =20 but only 2 attributes are relevant SOLUTION: Weight each attribute differently when calculating the

distance between two neighbours: stretching the relevant axes in Euclidian space:

shortening the axes that correspond to less relevant attributes lengthening the axes that correspond to more relevant attribute

PROBLEM: Determine which weight belongs to which attribute auto-matically? Cross-validation Leave-one-out see in next lecture

Page 35: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Remarks on k-Nearest Neighbour Algorithm 2

ADVANTAGE: The training phase is processed very fast Can learn complex target function Robust to noisy training data Quite effective when a sufficiently large set of training data is provided

DISADVANTAGE: Alg. delays all processing until a new query is received => significant

computation can be required to process; efficient memory indexing Processing is slow Sensibility about escape of the dimensions

BIAS:Inductive bias corresponds to an assumption that the classification of an instance will be most similar to the classification of other instances that are nearby in Euclidean distance

qx

Page 36: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Content

Page 37: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Generalizing k-nearest neighbor to continuous outputs

The version of k-nearest neighbors we have already seen works well for discrete outputs.

How would we generalize this to predict continuous outputs ?

Ideas?

37

Page 38: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally Weighted Regression

Local means using nearby points (i.e. a nearest neighborsapproach), based solely on the training data near the query point

Weighted means we value points based upon how far away they are from the query point

Regression means approximating a function This is an instance-based learning method

The idea: whenever you want to classify a sample: Build a local model of the function (using a linear function,

quadratic, neural network, etc.) Use the model to predict the output value Throw the model away

Page 39: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally Weighted Regression

IDEA: Generalization of Nearest Neighbour Alg.It constructs an explicit approximation to f over a local region surrounding xq. It uses nearby or distance-weighted training examples to form the local approximation to f. Local: The function is approximated based solely on the training

data near the query point Weighted: The construction of each training example is weighted

by its distance from the query point Regression: means approximating a real-valued target function

Page 40: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

How to works Locally Weighted Regres-sion

40

• Unweighted averaging using springs.

• Locally weighted averaging using springs.

The strength of the springs are equal in the unweighted case, and the position of the horizontal line minimizes the sumof the stored energy in the springs

The springs are not equal, and the spring constant of each spring is given by K(d(xi, q)). Note that the locally weighted average em-phasizes points close to the query point, and produces an answer(the height of the hori-zontal line) that is closer to the height of points near the query point than the un-weighted case.

Page 41: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Example of Locally Weighted Learn-ing

containing in the upper graphic the set of data points (x,y) (blue dots), query point (green line), local linear model (red line) and prediction (yellow dot). The graphic in the middle shows the activation area of the model. The corresponding weighting kernel (receptive field) is shown in the bottom graphic. 41

Page 42: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

How to works Locally Weighted Regres-sion

42

Page 43: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Fits using different types of local models for three and five data points.

43

Nearest neighbor

Weighted average

Locally Weighted regression

Page 44: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally weighted linear regression

In the following, x is an instance, D is the set of possible instances, D=<xi, f(xi)> ai(x) is the value of the ith attribute value of in-

stance x The weights wi form our hypothesis f is the target function is our approximation to the target function

44

Page 45: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally weighted linear regres-sion

In this case, we use a linear model to do the local approxi-mation

Suppose we aim to minimize the total squared error:

Recall the gradient descent we used in checkers for this

purpose

45

Dx

221 (x))f(f(x)E ˆ

Dx

jj (x)(x))af(f(x)w ˆΔ

η is a small number (the learning rate)

)(...)()(ˆ110 xawxawwxf nn

Page 46: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally weighted linear regres-sion

Now we adjust it to work with the present situation. Define the error for instance xq:

Minimise the squared error over the KNN set using some kernel function K to decrease this error based on the distance

And the new version of the gradient descent becomes:

qxof KNNx jqj (x)(x))af(f(x) x)) ,K(d(xw ˆΔ

x),K(d(x(x))f(f(x))E(x qxofKNNx

221

q

q

ˆ

Page 47: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally Weighted Linear Regression

We might approximate the target function in the neighborhood surrounding xq using a linear function, a quadratic function, a multilayer neu-ral network, or some other function form. Using linear function to approximate f:

Recall chapter 4:

)(...)()(ˆ110 xawxawwxf nn

Dxjj

Dx

xaxfxfw

xfxfE

)())(ˆ)((

))(ˆ)(( 221

gradient descent rule

Page 48: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Locally Weighted Regression

PROCEDURE:Given a new query xq , construct an approximation

that fits the training examples in the neighbourhood surrounding xq

This approximation is used to calculate , which is as the estimated target value assigned to the query instance.

The description of may change, because a differ-ent local approximation will be calculated for each in-stance

qf̂ x

Page 49: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Evaluation Locally Weighted Regression

ADVANTAGE Pointwise approximation of a complex target function Earlier data has no influence on the new ones

DISADVANTAGE The quality of the result depends on

Choice of the function Choice of the kernel function K Choice of the hypothesis space H

Sensibility against the relevant and irrelevant attributes

Page 50: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Content

Page 51: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

51

Radial Basis Function (RBF) Networks

RBF neural network has an input layer, a hidden layer, an output layer.

The neurons in the hidden layer contain Gaussian transfer functions whose outputs are inversely proportional to the

distance from the center of the neuron. ( 뉴런의 중심에서 멀리 떨어진 데이터 일수록 결과에

대한 영향을 줄이고자 한다 )

Page 52: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Similar to K-Means clustering and PNN (Probabilistic Neural Network ) /GRNN (Generalized Regression Neural Network). : 방법적인 면에서… .

The main difference: PNN/GRNN networks have one neuron for each

point in the training file, RBF networks have a variable number of neurons

that is usually much less than the number of training points.

For problems with small to medium size training sets, PNN/GRNN are usually more accurate than RBF

PNN/GRNN networks are impractical for large training sets.

52

Page 53: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

How RBF networks work Although the implementation is very different,

RBF neural networks are conceptually similar to K-Nearest Neighbor (k-NN) models. : 전략적인 면에서… .

The basic idea is that a predicted target value of an item is likely to be about the same as other items that have close values of the predictor variables.

Page 54: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial-Basis Function Networks

RBFs represent local receptors, as illustrated below, where each green point is a stored vector used in one RBF.

In a RBF network one hidden layer uses neurons with RBF acti-vation functions describing local receptors. Then one output node is used to combine linearly the outputs of the hidden neu-rons.

w1

w3

w2

The output of the red vectoris “interpolated” using the threegreen vectors, where each vector gives a contribution that depends onits weight and on its distance from the red point. In the picture we have

231 www

Page 55: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

In MLP

Page 56: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

in RBFN

MLP vs RBFN

Page 57: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function Network

A kind of supervised neural networks Design of NN as curve-fitting problem Learning

find surface in multidimensional space best fit to training data

Generalization Use of this multidimensional surface to interpolate

the test data

Page 58: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

New model : f(x) = w1h1(x) + w2h2(x) + w3h3(x)where h1(x) = 1,

h2(x) = x,

h3(x) = x2

Page 59: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function Network

Approximate function with linear combination of Radial basis functions

h(x) is mostly Gaussian function

m

jjjhwf

1

)()( xx

hj(x) = exp( -(x-cj)2 / rj2 )

Where cj is center of a region,

rj is width of the receptive field

Page 60: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

HIDDEN NEURON MODEL

x2

x1

xm

hj( || x - cj ||)

cj is called center of a regionj is called spreadcenter and spread are parameters

hj( || x - cj ||) the output depends on the distance of the input x from the center cj

hj

Page 61: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

RBF ARCHITECTURE

One hidden layer with RBF activation functions

Output layer with linear activation function.

||x-c|| disitance of x=(x1, …, xm) from vector c

x2

xm

x1

y

wm1

w11h

mh

mhh ...1

||)(||...||)(|| 111 mmm cxhwcxhwy

Page 62: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Three layers

Input layer Source nodes that connect to the network to its

environment Hidden layer

Hidden units provide a set of basis function High dimensionality

Output layer Linear combination of hidden functions

Page 63: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

RBF Network Architecture

Page 64: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Weight = RBF(distance) The further a neuron is from the point being evaluated, the less in-

fluence it has.

Page 65: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function

Different types of radial basis functions could be used, but the most common is the Gaussian function:

Page 66: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

66

If there is more than one predictor variable, then the RBF function has as many dimensions as there are variables.

Radial Basis Function

Z is the value coming out of the RBF func-tions

two predictor variables, X and Y

Three neurons in a space

Page 67: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

The best predicted value for the new point is found by summing the output values of the RBF functions multi-plied by weights computed for each neuron.

Page 68: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

The radial basis function for a neuron has a cen-ter and a radius (also called a spread). The radius may be different for each neuron,

and, in RBF

Page 69: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Training RBF Networks

The following parameters are determined by the training process: The number of neurons in the hidden layer. The coordinates of the center of each hidden-layer

RBF function. The radius (spread) of each RBF function in each di-

mension.

The weights applied to the RBF function outputs as they are passed to the summation layer.

Page 70: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

designing

Require Selection of the radial basis function width parameter Number of radial basis neurons

Page 71: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Number of radial basis neurons

By designer Max of neurons = number of input Min of neurons = ( experimentally deter-

mined) More neurons

More complex, but smaller tolerance

designing

Page 72: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Various learning strategies

How the centers of the radial-basis functions of the network are specified. Fixed centers selected at random Self-organized selection of centers Supervised selection of centers

learning strate-gies

Page 73: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Fixed centers selected at random(1)

Fixed RBFs of the hidden units The locations of the centers may be chosen

randomly from the training data set. We can use different values of centers and

widths for each radial basis function -> exper-imentation with training data is needed.

learning strate-gies

Page 74: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Fixed centers selected at random(2)

Only output layer weight is need to be learned.

Main problem Require a large training set for a satisfactory level of

performance

learning strate-gies

Page 75: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Self-organized selection of centers(1)

By means of clustering. Supervised learning of output weights by

LMS(Least Mean Square) algorithm. Hybrid learning

self-organized learning to estimate the centers of RBFs in hidden layer

supervised learning to estimate the linear weights of the output layer⋇ Center 는 clustering 으로 결정 하지만 output

weight

는 supervised learning !!

learning strate-gies

Page 76: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

76

Self-organized selection of centers(2)

k-means clustering1. Initialization

2. Sampling

3. Similarity matching

4. Updating

5. Continuation

learning strate-gies

Page 77: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Supervised selection of centers

All free parameters of the network are changed by supervised learning process.

Error-correction learning using LMS algorithm.

learning strate-gies

Page 78: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial functions

Gassian RBF:c : center, r : radius

2

2)(exp)(

rh

cxx

• Multiquadric RBF

r

rh

22 )()(

cxx

• monotonically decreases with distance from center

• monotonically increases with distance from center

Page 79: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Gaussian RBF multiqradric RBF

Page 80: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Least Squares

m

jjjhwf

1

)()( xx

• Training data : {(x1, y1), (x2, y2), …, (xp, yp)}

• Minimize the sum-squared-error

p

ii fyS

1

2))(( ix

Page 81: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Example

Sample points (noisy) from the curve y = x : {(1, 1.1), (2, 1.8), (3, 3.1)}

linear model :

f(x) = w1h1(x) + w2h2(x),

where h1(x) = 1, h2(x) = x

Estimate the coefficient w1, w2

Page 82: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

f(x) = x

Page 83: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

New model : f(x) = w1h1(x) + w2h2(x) + w3h3(x)where h1(x) = 1,

h2(x) = x,

h3(x) = x2

Page 84: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

If the model absorbs all the noise : overfit If it is too flexible, it will fit the noise

If it is too inflexible, it will miss the target

Page 85: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

The optimal weight vector

• model

m

jjjhwf

1

)()( xx

• sum-squared-error

p

ii fyS

1

2))(( ix

• cost function (minimized): weight penalty term is added

m

jjj

p

ii wfyC

1

2

1

2))(( ix

λj: regularization parameters

Page 86: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

jjj

p

ii

j

ww

fyf

w

C 2)())((21

ii xx

)()( ii xx jj

hw

f

jjj

p

ii

j

whyfw

C 2)())((21

ii xx

)()()(11

i

p

jjjjj

p

i

xhyiwhf

ii xx

0)())((01

jjj

p

ii

j

whyfwC ii xx

Page 87: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

p

ijijjj

p

ij

hywhfw

C

11

)()()(0 iii xxx

pppj

j

j

y

y

y

f

f

f

h

h

h

2

1

2

1

2

1

,

)(

)(

)(

,

)(

)(

)(

Let y

x

x

x

f

x

x

x

hj

mjw Tjjj

Tj ,....,2,1 allfor ,Then yhfh

Page 88: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

mjw Tjjj

Tj ,....,2,1 allfor , yhfh

y

y

y

fh

fh

fh

Tm

T

T

mmTm

T

T

h

h

h

w

w

w

2

1

22

11

2

1

m

m

TT

H

HH

00

00

00

,

where,

2

1

21 hhh

ywf

Page 89: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

)f(x

)f(x

)f(x

f

p

2

1

HHA T where

,ywf TT HH

Design matrix

wfy TT HH

www HHHH TT (

yTHA 1yw TT HHH 1)(

m

1jpjj

m

1j2jj

m

1j1jj

)(xhw

)(xhw

)(xhw

Hw

m

2

1

pmp2p1

2m2221

1m1211

w

w

w

)(xh)(xh)(xh

)(xh)(xh)(xh

)(xh)(xh)(xh

Page 90: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Example

Sample points (noisy) from the curve y = x : {(1, 1.1), (2, 1.8), (3, 3.1)}

linear model :

f(x) = w1h1(x) + w2h2(x),

where h1(x) = 1, h2(x) = x

Estimate the coefficient w1, w2

Page 91: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

3.1

1.8

1.1

y

,146

63HHT

yHAw T1

,

31

21

11

)(xh)(xh

)(xh)(xh

)(xh)(xh

H

3231

2221

1211

1

0

21

37

1T1

1

1H)(HA

{(1, 1.1), (2, 1.8), (3, 3.1)}

h1(x) = 1, h2(x) = x

f(x) = w1h1(x) + w2h2(x),

f(x) = 0*1 + 1*x

where h1(x) = 1, h2(x) = x

Page 92: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

f(x) = x

Page 93: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

It should have an extra term, x2

New model: f(x) = w1h1(x) + w2h2(x) + w3h3(x), Where w3h3(x) = x2

93

9

4

1

)(

)(

)(

23

22

21

33

23

13

x

x

x

xh

xh

xh

H

Page 94: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

94

New model : f(x) = w1h1(x) + w2h2(x) + w3h3(x)where h1(x) = 1,

h2(x) = x,

h3(x) = x2

Page 95: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function (RBF) Networks

Each prototype node computes a distance based kernel function (Gaussian is common)

Prototype nodes form a hidden layer in a neural network Train top layer with simple delta rule to get outputs Thus, prototype nodes learn weightings for each class

95

blend of instance-based method and neural network method.

Page 96: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function

96

Function to be learned:

One common choice for is:

Global approximation to target function, in terms of linear combination of local approxima-tions.

Related to distance-weighted regression, “ea-ger” instead of “lazy”.

Page 97: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Radial Basis Function Networks

97

ai(x) are attributes describing instance x. The first layer

computes variousKu(d(xu,x)).

Second layer computeslinear combination of first-layer unit values.

Hidden unit activation is close to 0 if x isn’t near xu

Page 98: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Approximation

MLP : Global network All inputs cause an output

RBF : Local network Only inputs near a receptive field produce an acti-

vation Can give “don’t know” output

MLP vs RBFN

Page 99: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

MLP vs RBFN

Global hyperplane Local receptive field

EBP(Error of Back Para-pagation) LMS

Local minima Serious local minima

Smaller number of hidden neurons

Larger number of hidden neurons

Shorter computation time Longer computation time

Longer learning time Shorter learning time

Page 100: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Radial Basis Functions (RBF) Case-Based Reasoning (CBR) Summary

Content

Page 101: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Case-based reasoning (CBR)

Instance-based methods and locally weighted regression

CBR: first two principles and instances are represented by using a richer symbolic description and the methods used to retrieval

CBR is an advanced instance based learning applied to more complex instance objects

Objects may include complex structural descriptions of cases & adaptation rules

Page 102: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

CBR cannot use Euclidean distance measures Must define distance measures for those complex

objects instead (e.g. semantic nets)

CBR tries to model human problem-solving uses past experience (cases) to solve new problems retains solutions to new problems

CBR is an ongoing area of machine learning research with many applications

Page 103: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

103

Applications of CBR

Design landscape, building, mechanical, conceptual

design of aircraft sub-systems Planning

repair schedules Diagnosis

medical Adversarial reasoning

legal

Page 104: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

CBR process

New Case

matchingMatched

Cases

Retrieve

Adapt?No

Yes

Closest Case

Suggest solution

Retain

Learn

Revise

Reuse

Case Base

Knowledge and Adaptation rules

Page 105: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

105

CBR example: Property pricing

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

1 8 2 1 terraced 1 poor 20,500

2 8 2 2 terraced 1 fair 25,000

3 5 1 2 semi 2 good 48,000

4 5 1 2 terraced 2 good 41,000

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

5 7 2 2 semi 1 poor ???

Test instance

Page 106: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

106

How rules are generated

There is no unique way of doing it. Here is one possibility:

Examine cases and look for ones that are al-most identical case 1 and case 2

Rule1: If recep-rooms changes from 2 to 1 then reduce price by £5,000

case 3 and case 4 Rule2: If Type changes from semi to terraced then re-

duce price by £7,000

Page 107: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

107

Matching

Comparing test instance matches(5,1) = 3 matches(5,2) = 3 (MAX COST: £ 25000) matches(5,3) = 2 matches(5,4) = 1

Estimate price of case 5 is £25,000

Page 108: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

108

Adapting

Reverse rule 2 if type changes from terraced to semi then increase

price by £7,000 Apply reversed rule 2

new estimate of price of property 5 is £32,000

Page 109: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

109

Learning

So far we have a new case and an estimated price nothing is added yet to the case base

If later we find house of location code 8 sold for £35,000 then the case would be added could add a new rule

if location changes from 7 to 8 increase price by £3,000

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

5 7 2 2 teraced 1 poor £32000

Case Location code

Bedrooms Recep rooms

Type floors Cond-ition

Price £

#5 8 2 2 teraced 1 poor £35000

Page 110: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

110

Problems with CBR

How should cases be represented? How should cases be indexed for fast retrieval? How can good adaptation heuristics be de-

veloped? When should old cases be removed?

Page 111: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

111

Advantages

A local approximation is found for each test case Knowledge is in a form understandable to human

beings Fast to train

Page 112: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

Lazy and Eager Learning

112

Lazy: wait for query before generalizing KNN, locally weighted regression, CBR

Eager: generalize before seeing query RBF networks

Differences: Computation time Global and local approximations to the target function Use same H, lazy can represent more complex functions. (e.g.,

consider H=linear functions)

Page 113: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

113

Summary

Differences and advantages KNN algorithm:

the most basic instance-based method. Locally weighted regression: generalization of

KNN. RBF networks:

blend of instance-based method and neural network method.

Case-based reasoning

Page 114: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

114

Lazy and Eager Learning

Lazy: wait for query before generalizing k-Nearest Neighbour, Case based reasoning

Eager: generalize before seeing query RBF Networks, ID3, …

Does it matter? Eager learner must create global approximation Lazy learner can create many local approximations

Page 115: 1 Instance Based Learning Soongsil University Intelligent Systems Lab

The End