# Regression i i i

• View
10

• Category

## Documents

Tags:

Embed Size (px)

### Text of Regression i i i

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

Introduction In this lecture we will look at RBFs, networks where the

activation of hidden is based on the distance betweenthe input vector and a prototype vector

Radial Basis Functions have a number ofinteresting properties

Strong connections to other disciplines function approximation, regularization theory, density estimation and

interpolation in the presence of noise [Bishop, 1995] RBFs allow for a straightforward interpretation of the

internal representation produced by the hidden layer RBFs have training algorithms that are significantly

faster than those for MLPs RBFs can be implemented as support vector machines

• CSCI 5521: Paul Schrater

Radial Basis Functions Discriminant function flexibility

NON-LinearBut with sets of linear parameters at each layerProvably general function approximators for sufficient nodes

Error functionMixed- Different error criteria typically used for hidden vs.output layers.

Hidden: Input approx.Output: Training error

Optimization Simple least squares - with hybrid training solution isunique.

• CSCI 5521: Paul Schrater

Two layer Non-linear NN

Non-linear functions are radially symmetric kernels, with freeparameters

• CSCI 5521: Paul Schrater

Input to Hidden Mapping

Each region of feature space by means of a radially symmetricfunction Activation of a hidden unit is determined by the DISTANCE between the input

vector x and a prototype vector

Choice of radial basis function Although several forms of radial basis may be used, Gaussian kernels are

most commonly used The Gaussian kernel may have a full-covariance structure, which requires D(D+3)/2

parameters to be learned

or a diagonal structure, with only (D+1) independent parameters

In practice, a trade-off exists between using a small number of basis with manyparameters or a larger number of less flexible functions [Bishop, 1995]

• CSCI 5521: Paul Schrater

Builds up a function out of Spheres

• CSCI 5521: Paul Schrater

RBF vs. NN

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

RBFs have their origins in techniques forperforming exact interpolation

These techniques place a basis function at each ofthe training examples f(x)

and compute the coefficients wk so that themixture model has zero error at examples

• CSCI 5521: Paul Schrater

Formally: Exact Interpolation

!

y = h(x) = wi" x # x i( )i=1:n

\$

"11

"12

L "1n

"21

"22

L "2n

M M M M

"n1 "n2 L "nn

%

&

' ' '

(

)

* * *

w1

w2

M

wn

%

&

' ' '

(

)

* * *

=

t1

t2

M

tn

%

&

' ' '

(

)

* * *

" ij =" x i # x j( )

Goal: Find a function h(x), such that h(xi) = ti, for inputs xi and targets ti

!

"r w =

r t

r w = "T"( )

#1

"Tr t

Solve for w

Form

Recall the Kerneltrick

• CSCI 5521: Paul Schrater

Solving conditions

Micchellis Theorem: If the points xi are distinct, then the matrix will be nonsingular.

Mhaskar and Micchelli, Approximation by superposition of sigmoidal and radial basis functions,Advances in Applied Mathematics,13, 350-373, 1992.

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

!

n outputs, m basis functions

yk = hk (x) = wki" ii=1:m

# x( ) + wk0, e.g. " i(x) = exp \$x\$ j

2

2%j2

&

' (

)

* +

= wki" ii= 0:m

# x( ), "0(x) =1

Rewrite this as :

r y = W

r " (x),

r " (x) =

"0(x)M

"m (x)

,

-

.

.

/

0

1 1

• CSCI 5521: Paul Schrater

Learning

RBFs are commonly trained following a hybrid procedurethat operates in two stages

Unsupervised selection of RBF centers RBF centers are selected so as to match the distribution of training examples

in the input feature space This is the critical step in training, normally performed in a slow iterative

manner A number of strategies are used to solve this problem

Supervised computation of output vectors Hidden-to-output weight vectors are determined so as to minimize the sum-

squared error between the RBF outputs and the desired targets Since the outputs are linear, the optimal weights can be computed using fast,

linear

• CSCI 5521: Paul Schrater

Least squares solution for output weightsGiven L input vectors xl, with labels tl

!

Form the least square error function :

E = yk (x l ) " tlk{ }

k=1:n

#l=1:L

#2

, tlk is the k th value of the target vector

E = (r y (x l ) "

r t l )

t (r y (x l ) "

r t l )

l=1:L

#

= (r \$ x l( )W

t "r t l )

l=1:L

#t

(r \$ x l( )W

t "r t l )

E = (%Wt "T)l=1:L

#t

(%Wt "T) = W%t%Wt " 2W%tT+l=1:L

# TtT

&E

dW= 0 = 2%t%Wt " 2%tT

Wt = % t%( )"1

%tT

• CSCI 5521: Paul Schrater

Solving for input parameters Now we have a linear solution, if we know the input

parameters. How do we set the input parameters? Gradient descent--like Back-prop Density estimation viewpoint: By looking at the meaning of

the RBF network for classification, the input parameters canbe estimated via density estimation and/or clusteringtechniques

We will skip these in lecture

• CSCI 5521: Paul Schrater

Unsupervised methods overview Random selection of centers

The simplest approach is to randomly select a number of training examples asRBF centers

This method has the advantage of being very fast, but the network will likely require anexcessive number of centers

Once the center positions have been selected, the spread parameters j can beestimated, for instance, from the average distance between neighboring centers

Clustering Alternatively, RBF centers may be obtained with a clustering procedure such as

the k-means algorithm The spread parameters can be computed as before, or from the sample covariance

of the examples of each cluster Density estimation

The position of the RB centers may also be obtained by modeling the featurespace density with a Gaussian Mixture Model using the Expectation- Maximizationalgorithm

The spread parameters for each center are automatically obtained from thecovariance matrices of the corresponding Gaussian components

• CSCI 5521: Paul Schrater

Probabilistic Interpretion of RBFs forClassification

=

IntroduceMixture

• CSCI 5521: Paul Schrater

Prob RBF cond

where

Basis function outputs: Posterior jthFeature probabilities

weights: Class probabilitygiven jth Feature value

• CSCI 5521: Paul Schrater

Regularized Spline Fits

!

" =

#1X

1( ) L #N X1( )M

#1XM( ) L #N XM( )

\$

%

& & &

'

(

) ) )

!

ypred ="w

Minimize

L = y -"w( )T

y -"w( ) + #wT\$w

Given

Make a penalty on large curvature

!

"ij =d2#i x( )dx

2

\$

% &

'

( ) *d2# j x( )dx

2

\$

% &

'

( ) dx

!

Minimize

L = y -"w( )Ty -"w( ) + #wT\$w

Solution

w = "T"+ #\$( )%1

"T r y Generalized Ridge Regression

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

Equivalent Kernel

Linear Regression solution admit a kernelinterpretation

!

w = "T"+ #\$( )%1

"Tr y

ypred (x) ="(x)w ="(x) "T"+ #\$( )

%1

"Tr y

Let "T"+ #\$( )%1

"T = S#"T

ypred (x) ="(x) S#r & (xi)yi'

= "(x)S#r & (xi)yi'

= K(x, xi)yi'

• CSCI 5521: Paul Schrater

Eigenanalysis and DF

An eigenanalysis of the effective kernel gives theeffective degrees of freedom (DF), i.e. # of basisfunctions

!

Evaluate K at all points

K ij = K(xi,x j )

Eigenanalysis

K =VDV"1

=USUT

• CSCI 5521: Paul Schrater

EquivalentBasis

Functions

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

Computing Error Bars on Predictions

It is important to be able to compute error bars onlinear regression predictions. We can do that bypropagating the error in the measured values to thepredicted values.

!

" =

#1

X1( ) L #N X1( )

M

#1

XM( ) L #N XM( )

\$

%

& & &

'

(

) ) )

w = "T"( )*1"T

r y

ypred = "w = " "T"( )

*1"T

r y = S

r y

cov[ypred ] = cov[Sr y ] = Scov[

r y ]S

T = S +I( )ST = +SST

Given the Gram Matrix Simple Least squares

predictions

Error

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

• CSCI 5521: Paul Schrater

Recommended ##### Regression Analysis - University of athienit/STA4210/reg_notes.pdf · 6 Multiple Regression I 98 6.1…
Documents ##### parametric Theils regression since their AI and I are both ... parametric (Theils) linear regression
Documents ##### ME104: Linear Regression Analysis Kenneth Benoit August 22 ... ?· I The multinomial logistic regression…
Documents ##### Linear regression, part I - unipd.it ?· the linear regression model ... Recall that in the location…
Documents ##### Correlation and Regression CHAPTER 3 Correlation and Regression 67 Regression The Regression Line We
Documents ##### Chapter 15 Regression Analysis 15 Regression Analysis I ... (logistic) 회기모형등 7 ... 14 Exercise 2 예제15
Documents ##### Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression
Documents ##### 1 Curve-Fitting Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression Interpolation
Documents ##### Advanced Regression Summer Statistics Institute ?· Regression Model Assumptions Y i = 0 + 1X i + ...…
Documents Documents