Linear and Nonlinear Regression_Manesh Meena

COURSE PRESENTATION EE -671

PRESENTED BY- MANESH MEENA

Linear and NonLinear Regression and Estimation Techniques

Content

IntroductionGeneral Regression modelParameter estimation in LRNonlinear and weighted RegressionNARMAX modelingRoute training in mobile robotics through

system identificationApplication of Regression analysis

Introduction

Regression analyses --to extract parameters from measured data, to define physical characteristics of a system.

Goal: Express the relationship between two (or more) variables by a mathematical formula.-x is the predictor (independent) variable-y is the response (dependent) variableWe specifically want to indicate how y varies as a

function of x.

Example This can be understood from the

following example Consider a Car Loan Company- This company wants to predict

behaviour of the costumers based on the previous historical behaviour .The data is is distributed as a narrow strip, and it is therefore possible to draw a curve that best "fits" the data . This curve will be considered as a satisfactory approximation of the true data distribution.

- that is called the regression function of the variable "Budget" on the variable "Age".

- Then using this regression function company will predict the value of the “Budget” attribute for the new customer.

Additional variables may be considered for the purpose of reducing the prediction errors on the predicted value of y. For example, Revenue, Gender, Annual milage, Number of children etc...could be included in the regression function. So, quite generally, doing regression is looking for the "best" function :

y = f(x1, x2, ..., xp) The regression function f(x1, x2, ..., xp) has therefore to be defined so as

to make the prediction errors as small as possible. Calculating Parameters There are two main ways of calculating the parameters of a regression

model : The "Least Squares" method, that minimizes the sum of the squares of the

prediction errors of the model on the design data Simple and Multiple Regression models are adjusted by the Least Squares method.

The "Maximum Likelihood" method, that tunes the model so as to make the likelihood of the sample maximum.Logistic Regression models are adjusted by the Maximum Likelihood method.

General Regression Model

Assume the true model is of the form:y(x) = m(x) + ɛ(x)

The systematic part, m(x) is deterministic, The error, ɛ(x) is a random variable

-Measurement error-Natural variations due to exogenous factorsTherefore, y(x) is also a random variableThe error is additive

We want to estimate m(x) and possibly the distribution ɛ(x)

The Standard Assumptions A1: E[ɛ(x)] = 0 ∀x

(Mean 0) A2: Var[ɛ(x)] = σ^2 ∀x A3: Cov[ɛ(x), ɛ(x’)] = 0 ∀x ≠ x’

(Uncorrelated) These assumptions are only on the error term. ɛ(x) = y(x) − m(x)Residuals The residuals can be used to

check the estimated model m’(x). If the model fit is good, the residuals should satisfy our

above three assumptions.

Parameter Estimation

How to Estimate parameter m(x)??Example: Relating Shoe Size to Height using footprint impressions

How can we estimate m(x) for the shoe example?

(Non-parametric): For each shoe size, take the mean of the observed heights.

(Parametric): Assume the trend is linear.

Linear Regression

Simple linear regression assumes that m(x) is of the parametric formm(x) = β0 + β1xwhich is the equation for a line.

Which line is the best estimate??

Write the observed data:yi = β0 + β1*xi + ɛi (i = 1, 2, . . . , n)

Where yi ≡ y(xi) is the response value for observation i,β0 and β1 are the unknown parameters (regression coefficients),xi is the predictor value for observation i ɛi ≡ ɛ(xi) is the random error for observation i

Let g(x) ≡ g(x; β) be an estimator for y(x)Define a Loss Function L(y(x), g(x)) ,

which describes how far g(x) is from y(x)The Risk or expected loss is R(x)=E[L(y(x),g(x))]

The best predictor minimizes the Risk (or expected Loss)

g∗(x) = arg min E[L(y(x), g(x))] g∈G

Residuals in above examaple

Model Comparison

Cofficient of determination

Non Linar Regression

Nonlinear regression takes the general formy(x) = m(x; β) + ɛ(x)for some specified function m(x; β) with unknown parameters β.

Making same assumptions as in linear regression (A1-A3), the least squares solution is still valid.

Non-linear regression is an iterative procedure in which the number of iterations depend on how quickly the parameters converge.

Weighted Regression

Consider the risk functions we have considered so far R(β) = ∑(yi − m(xi; β))^2

Each observation is equally contributes to the riskWeighted regression uses the risk function

so observations with larger weights are more important

Nonlinear modeling

To represent nonlinear models NARMAX(nonlinear autoregressive moving average with exazenous input) representation is used.

For multiple input, single output noiseless systems, this model takes the form

where y(n) and u(n) are the sampled output and input signals at time n respectively, Ng and Na are the regression orders of the output and input respectively. f() is a non-linear function.

Autoregressive moving average models

The notation ARMA(p,q) refers to the model with p autorgressive terms and q moving average terms

This model contains AR(p) and MA(q) models.

Nonlinear Autoregressive moving average with exozenous input(NARMAX) modeling

The NARMAX methodology breaks the modeling problem into the following steps:

1.Structure Detection2.Parameter Estimation3.Model Validation4.Prediction5.Analysis

NARMAX

Determine model structure and parameters based on estimation dataset.

Validate the model using validation dataset. The initial structure of NARMAX polynomial is determined by

the inputs u and output y and the input and the output time-lags Nu and Ng.

The general rule in choosing the suitable inputs for the model is that at least some of them should be causing the output.

But not all of them are significant contributors to the computation of the output.

The final structure of the estimated NARMAX model will indicate only significant inputs.

NARMAX

Before any removal of the model terms an equivalent auxiliary model is computed from the original NARMAX model. The model terms of the auxiliary model are orthogonal.

The calculation of the auxiliary model parameters and refinement of the model’s structure is an iterative process.

Each iteration involves three steps.1. Estimation of model parameters using the estimation dataset.2.Model validation using the validation dataset.3.Removel of noncontributing terms.

NARMAX

After the model validation step, if there is no significant error between the model predicted output and the actual output, non-contributing terms are removed in order to reduce the size of the polynomial.

To determine the contribution of a model term to the output the Error Reduction Ratio (ERR) is computed for each term, which is the percentage reduction in the total mean-squared error as a result of including the term under consideration.

Model terms with the ERR under certain threshold are removed from the model polynomial during the refinement process.

In the following iteration if the error is higher as a result of last removal of the model term then these are reinserted back into the model and the model equation is considered as final.

Finally NARMAX model parameters are computed from the auxiliary model.

Route Traning in mobile robotics through System Identification- Ulrich Nehmzow.

Purpose: to demonstrate how well the NARMAX model can represent route learning tasks.

Experimental procedure: The robot is equipped with 16 sonar, 16 infra-red and 16 tectile

sensors distributed uniformly around its circumference. A sick laser range finder is also present which scans the front semi-

circle of the robot with a radial resolution of 1 degree and distance resolution of 1 cm.

During experiment the inputs from all its sensors, its position, orientation, transitional and rotational velocities are recorded every 250 ms.

Position and orientation is obtained by placing point targets on top of the robot and using an overhead camera to track them continuously.

For the purpose of the experiment four separate route learning experiments were conducted.

Route Traning in mobile robotics through System Identification

In each case 1. initially the robot was driven manually several times through the specific route to be learned.2. During this time robots sensor values and rotational velocities were logged.3. The data collected was then used for estimation and validation of NARMAX model.4. then the model was put on the robot and executed in order to record a further set of data that was used to test the model’s performance.

Route-1

Route-1

After manual control for 1 hour all the sonar and laser measurements were taken.

The values delivered by the laser scanner were averaged in 12 sectors of 15 degrees each(laser bins) to obtain a 12 dimensional vector of laser spaces.

These laser bins as well as 16 sonar values were inverted so that large values indicate close-by-objects.

Finally, the sonar and laser readings at each instant were normalized by minimum sonar and laser readings respectively at that instant.

All these values are input into the model.

The parameters of the NARMAX model that author obtained were:

Nu=0, Ny=0, Ne=0, degree=2 Initial model had 496 terms but

after the removal of non-contributing terms only 70 remained.

To compare the two trajectories quantitatively difference between the distribution of values (x- x) under manual control and the distribution of the same under the NARMAX model was calculated which was not significant(~ 0.05).

Route-2 This time only laser sensor

was used and pre-processed same as in route-1

The characteristics of the NARMAX model obtained are Nu=0, Ny=0, Ne=0 and degree=3.

The initial model had 573 terms but just 94 remained after removal process of non-contributing terms.

Statistical space occupancy tests along x and y axis confirms that there is no significant difference between the two trajectories(~.05)

Manual route-2 NARMAX route-2

Comparison

Route-3 This time robot had to go

through two narrow passes and many symmetries.

To obtain the model normalized and inverted bins were used.

The best NARMAX model uses Nu=0, Ny=0 and degree of polynomial=2.

The initial model had 97 terms but after removal of non-contributing ones just 73 remained.

Once again the model was properly able to learn the trajectory with no significant difference in the space occupancy.

Manual route-3 NARMAX route-3

Comparison

Route-4 In this route robot had to start

from position labelled A and had to reach to point labelled B.

TO model the route’s behaviour ARMAX modeling was used which is the linear polynomial equivalent of NARMAX i.e. degree of polynomial is one.

In this experiment regression order of output (Ny) was 0 and that of input(Nu) was 8.

This model has successfully learned the route from A to B which was again confirmed using statistical analysis.

Applications of regression analysis

Trend line analysisRisk analysis for investmentMarket forecastingBusiness PlanningSystem Identification

References

http://www.wikipedia.org/http://dynsys.uml.edu/tutorials/regressionana

lysis.htm

“Route Training in mobile robotics: System Identification”- Ulrich Nehmzow and S. Billings

http://www.wikipedia.org/

http://dynsys.uml.edu/tutorials/regressionanalysis.htm

http://dynsys.uml.edu/tutorials/regressionanalysis.htm

THANK YOU!!

Questions

Type of the Model is linear or nonlinear?

• How regression is different from correlation?What will be the effect of adding polynomial

terms in the Linear model?What is the significance of coefficient of

determination (R^2)?What are the difficulties in regression

analysis.?

Documents

Linear and Nonlinear Regression_Manesh Meena