Nonlinear Regression - statvision.com

STATGRAPHICS – Rev. 9/16/2013

2013 by StatPoint Technologies, Inc. Nonlinear Regression - 1

Nonlinear Regression

Summary ......................................................................................................................................... 1 Analysis Summary .......................................................................................................................... 4 Plot of Fitted Model ........................................................................................................................ 6

Response Surface Plots ................................................................................................................... 7 Analysis Options ........................................................................................................................... 10 Reports .......................................................................................................................................... 11 Correlation Matrix ........................................................................................................................ 12 Observed versus Predicted ............................................................................................................ 13

Residual Plots................................................................................................................................ 13 Unusual Residuals ......................................................................................................................... 16

Influential Points ........................................................................................................................... 17 Save Results .................................................................................................................................. 18 Calculations................................................................................................................................... 18

Summary

The Nonlinear Regression procedure fits a user-specified function relating a single dependent

variable Y to one or more independent variables X. The model is estimated using nonlinear least

squares. The fitted model may be plotted, forecasts generated from it, and unusual residuals

identified.

Sample StatFolio: nonlinear reg.sgp

Sample Data

The file nonlin.sgd contains data on the amount of available chlorine in samples of a product as a

function of the number of weeks since it was produced. The data, from Draper and Smith (1998),

consists of n = 44 samples, a portion of which are shown below:

Weeks Chlorine

8 0.49

8 0.49

10 0.48

10 0.47

10 0.48

10 0.47

12 0.46

12 0.46

12 0.45

12 0.43

… …

It is desired to fit the following model to the data:

8)49.0( weeksbeaachlorine (1)

This model, suggested by a subject matter expert, contains two unknowns: a, the asymptotic

baseline value reached at large values of weeks, and b, the exponential rate of decay.



Data Input The first of two data input dialog boxes requests the name of the dependent variable and the

model to be fit:

Dependent Variable: numeric column containing the n values of Y.

Function: a STATGRAPHICS expression representing the function to be fit. It must include

one or more names of numeric columns, representing the independent variables. It may also

include functions such as SQRT or EXP. Any unrecognized names are considered to

represent model parameters that need to be estimated.

Weight: an optional numeric column containing weights to be applied to the squared

residuals when performing a weighted least squares fit.

Select: subset selection.



The second dialog box requests initial estimates for each of the unknown model parameters:

Enter an initial estimate for each parameter. The program will begin with the initial estimates and

perform a numerical search to find estimates that minimize the residual sum of squares.

Depending upon the complexity of the model, poor estimates may or may not lead to an optimal

solution. In all but the simplest cases, intelligent selection of initial estimates can greatly improve

the chances of obtaining a good solution. Typically, it is important to at least give estimates with

the proper sign (positive or negative), since the search procedure might otherwise move in an

entirely wrong direction.



Analysis Summary

The Analysis Summary shows the results of the fit.

Nonlinear Regression - chlorine Dependent variable: chlorine

Independent variables:

weeks

Function to be estimated: a+(0.49-a)*exp(-b*(weeks-8))

Initial parameter estimates:

a = 0.1

b = 0.1

Number of observations: 44

Estimation method: Marquardt

Estimation stopped due to convergence of residual sum of squares.

Number of iterations: 4

Number of function calls: 14

Estimation Results

Asymptotic 95.0%

Asymptotic Confidence Interval

Parameter Estimate Standard Error Lower Upper

a 0.390144 0.00501534 0.380022 0.400265

b 0.101644 0.0133628 0.0746763 0.128611

Analysis of Variance

Source Sum of Squares Df Mean Square

Model 7.982 2 3.991

Residual 0.00500168 42 0.000119088

Total 7.987 44

Total (Corr.) 0.0395 43

R-Squared = 87.3375 percent

R-Squared (adjusted for d.f.) = 87.036 percent

Standard Error of Est. = 0.0109127

Mean absolute error = 0.00769665

Durbin-Watson statistic = 1.98378

Lag 1 residual autocorrelation = 0.00702451

Residual Analysis

Estimation Validation

n 44

MSE 0.000119088

MAE 0.00769665

MAPE 1.82283

ME -0.000097621

MPE -0.0826224

Included in the output are:

Data Summary: a summary of the input data.

Function to be Estimated: the function to be estimated and the initial parameter estimates.

Estimation Statistics: the method of estimation used and the number of iterations and

function calls performed.



Parameter Estimates: the estimated parameters with approximate confidence intervals.

Confidence intervals that do not contain 0 indicate that the model parameter is statistically

significant at the stated confidence level.

Analysis of Variance: decomposition of the variability of the dependent variable Y into a

model sum of squares and a residual or error sum of squares.

Statistics: summary statistics for the fitted model, including:

R-squared - represents the percentage of the variability in Y which has been explained by the

fitted regression model, ranging from 0% to 100%. For the sample data, the regression has

accounted for about 87.3% of the variability amongst the observed chlorine concentrations.

Adjusted R-Squared – the R-squared statistic, adjusted for the number of coefficients in the

model. This value is often used to compare models with different numbers of coefficients.

Standard Error of Est. – the estimated standard deviation of the residuals (the deviations

around the model). This value is used to create prediction limits for new observations.

Mean Absolute Error – the average absolute value of the residuals.

Durbin-Watson Statistic – a measure of serial correlation in the residuals. If the residuals

vary randomly, this value should be close to 2. A small P-value indicates a non-random

pattern in the residuals. For data recorded over time, a small P-value could indicate that some

trend over time has not been accounted for.

Lag 1 Residual Autocorrelation – the estimated correlation between consecutive residuals, on

a scale of –1 to 1. Values far from 0 indicate that significant structure remains unaccounted

for by the model.

Residual Analysis – if a subset of the rows in the datasheet have been excluded from the

analysis using the Select field on the data input dialog box, the fitted model is used to make

predictions of the Y values for those rows. This table shows statistics on the prediction

errors, defined by

iii yye ˆ (2)

Included are the mean squared error (MSE), the mean absolute error (MAE), the mean

absolute percentage error (MAPE), the mean error (ME), and the mean percentage error

(MPE). This validation statistics can be compared to the statistics for the fitted model to

determine how well that model predicts observations outside of the data used to fit it.

For the sample data, the fitted model is

chlorine = 0.390144 + (0.49-0.390144)exp(-0.101644(weeks-8)) (3)

The model begins with chlorine = 0.49 at weeks = 8 and drops exponentially to a baseline at

approximately 0.39 as weeks increase.



Plot of Fitted Model

This Plot of Fitted Model pane plots the fitted model versus any one of the independent

variables, with the other variables set equal to values specified on the Pane Options dialog box.

Plot of Fitted Model

weeks

ch

lorin

e

0 10 20 30 40 50

0.38

0.4

0.42

0.44

0.46

0.48

0.5

Pane Options

Select any one variable to plot on the horizontal axis, together with its range. For the other

variables, enter values to be substituted into the fitted model.



Response Surface Plots

If more than one independent variable is included in the model, surface and contour plots can be

created. For example, Draper and Smith (1998) report on an experiment in which the fraction of

material Y remaining after a chemical reaction was described by the model

620

11expexp

2

211X

XY (4)

where X1 was the reaction time in minutes and X2 was the reaction temperature in degrees

Kelvin. The data is saved in the file nlreact.sgd and the analysis in nlreact.sgp. A surface plot of

the fitted model is shown below:

Estimated Response Surface

0 30 60 90 120 150time

600610

620630

640

temperature

0

0.2

0.4

0.6

0.8

1

mate

rial

In a surface plot, the height of the surface represents the predicted value of Y. The second option

labeled Response Surface Plots on the Graphical Options menu creates a contour plot:

Contours of Estimated Response Surface

time

tem

pera

ture

material

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 30 60 90 120 150

600

610

620

630

640

In a contour plot of the above form, each line represents combinations of X1 and X2 that result in

the same predicted value for Y.

Various other formats are available using Pane Options.



Pane Options

Type: choose from a 3-D Surface Plot, where the height of the surface represents the value

of Y versus any two independent variables; a 2-D Contour Plot, where lines or colored

regions represent the value of Y as a function of any two independent variables; a 2-D Square

Plot, where the predicted value of Y is shown at different combinations of 2 independent

variables; or a 3-D Cube Plot, in which the predicted value of Y is shown at different

combinations of 3 independent variables.

Contours: the limits and spacing of the contour lines or regions. The contours may be drawn

as solid Lines representing a single value of Y, Painted Regions representing intervals, or

using a Continuous range of colors.

Resolution: the number of divisions along each axis at which the value of Y is plotted.

Increasing the resolution may improve the quality of the plot, but it can also increase the

length of time required to draw it.

Surface: for a surface plot, the number of divisions along each axis between the lines used to

draw the surface. The surface may be drawn as a Wire Frame (transparent mesh), as a solid

colored surface, or contoured (colored according to values of Y). Contours below puts a

contour plot in the bottom of the cube. Show Points plots the observations with lines drawn

to the surface.

Factors: press this button to select the factors to be plotted. A dialog box similar to that

described for the Plot of Fitted Model will be displayed.



Example – Contour Plot with Continuous Colors

Contours of Estimated Response Surface

time

tem

pe

ratu

re

material0.00.10.20.30.40.50.60.70.80.91.0

0 30 60 90 120 150

600

610

620

630

640

Example – Surface Plot with Contour Below and Show Points Selected

Estimated Response Surface

time

temperature

ma

teri

al

material

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 30 60 90 120 150600

610

620

630

640

0

0.2

0.4

0.6

0.8

1



Analysis Options

The Analysis Options dialog box controls the algorithm used to fit the model:

Method: method used to estimate the model parameters. The Gauss-Newton method uses a

linearization technique that fits a sequence of linear regression models to locate the minimum

residual sum of squares. The Steepest-Descent method follows the gradient of the residual

sum of squares surface. Marquardt’s method, the default, is a fast and reliable compromise

between the other two.

Stopping Criterion 1: The algorithm is assumed to have converged when the relative

change in the residuals sums of squares from one iteration to the next is less than this value.

Stopping Criterion 2: The algorithm is assumed to have converged when the relative

change in all parameter estimates from one iteration to the next is less than this value.

Maximum Iterations: Estimation stops if convergence is not achieved within this many

iterations.

Maximum Function Calls: Estimation stops if convergence is not achieved when the

function being fit has been evaluated this many times. Multiple function evaluations are done

during each iteration.

Marquardt Parameter: The magnitude of the Marquardt parameter controls the extent to

which the other two methods are traded off against each other. For details on the Marquardt

algorithm, see Box, Jenkins and Reinsel (1994).

Confidence Level: the percentage used to calculate the asymptotic confidence intervals for

the model coefficients.



Reports

The Reports pane creates predictions using the fitted model. By default, the table includes a line

for each row in the datasheet that has complete information on the X variables and a missing

value for the Y variable. This allows you to add columns to the bottom of the datasheet

corresponding to levels at which you want predictions without affecting the fitted model.

For example, suppose a prediction is desired at Weeks = 50 (admittedly an extrapolation of the

model). In row #45 of the datasheet, the value 50 would be added to the Weeks column but the

Chlorine column would be left blank. The resulting table is shown below:

Regression Results for chlorine

Fitted Stnd. Error Lower 95.0% CL Upper 95.0% CL Lower 95.0% CL Upper 95.0% CL

Row Value for Forecast for Forecast for Forecast for Mean for Mean

45 0.392467 0.0115998 0.369057 0.415876 0.38453 0.400403

Included in the table are:

Row - the row number in the data sheet containing the values of the independent

variables.

Fitted Value - the predicted value of the dependent variable using the fitted model.

Standard Error for Forecast - the estimated standard error for predicting a single new

observation.

Confidence Limits for Forecast - prediction limits for new observations.

Confidence Limits for Mean - confidence limits for the mean value of Y at the settings

of the independent variables.

For row #45, the predicted chlorine level is approximately 0.392 A new sample at Weeks = 50

would be expected to be between 0.369 and 0.416 with 95% confidence (provided the

extrapolation held The mean chlorine level at 50 weeks is estimated to be somewhere between

0.385 and 0.400.

Using Pane Options, additional information about the predicted values and residuals for the data

used to fit the model can also be included in the table.



Pane Options

You may include:

Observed Y – the observed values of the dependent variable.

Fitted Y – the predicted values from the fitted model.

Residuals – the ordinary residuals (observed minus predicted).

Studentized Residuals – the Studentized deleted residuals as described earlier.

Standard Errors for Forecasts – the standard errors for new observations at values of the

independent variables corresponding to each row of the datasheet.

Confidence Limits for Individual Forecasts – confidence intervals for new observations.

Confidence Limits for Forecast Means – confidence intervals for the mean value of Y at

values of the independent variables corresponding to each row of the datasheet.

Correlation Matrix

The Correlation Matrix displays estimates of the correlation between the estimated coefficients.

Asymptotic correlation matrix for coefficient estimates

a b

a 1.0000 0.8864

b 0.8864 1.0000

This table can be helpful in determining how well the effects of different independent variables

have been separated from each other.



Observed versus Predicted

The Observed versus Predicted plot shows the observed values of Y on the vertical axis and the

predicted values Y on the horizontal axis.

Plot of chlorine

predicted

observ

ed

0.38 0.4 0.42 0.44 0.46 0.48 0.5

0.38

0.4

0.42

0.44

0.46

0.48

0.5

If the model fits well, the points should be randomly scattered around the diagonal line. It is

sometimes possible to see curvature in this plot, which would indicate the need for a curvilinear

model rather than a linear model. Any change in variability from low values of Y to high values

of Y might also indicate the need to transform the dependent variable before fitting a model to

the data.

Residual Plots

As with all statistical models, it is good practice to examine the residuals. In a regression, the

residuals are defined by

iii yye ˆ (5)

i.e., the residuals are the differences between the observed data values and the fitted model.

The Nonlinear Regression procedure creates various type of residual plots, depending on Pane

Options.



Scatterplot versus X

This plot is helpful in visualizing any need for a different model.

Residual Plot

predicted chlorine

Stu

de

nti

ze

d r

esid

ua

l

0.38 0.4 0.42 0.44 0.46 0.48 0.5

-3.6

-1.6

0.4

2.4

4.4

Normal Probability Plot

This plot can be used to determine whether or not the deviations around the line follow a normal

distribution, which is the assumption used to form the prediction intervals.

Normal Probability Plot for chlorine

Studentized residual

pe

rce

nta

ge

-2.7 -0.7 1.3 3.3 5.3

0.1

1

5

20

50

80

95

99

99.9

If the deviations follow a normal distribution, they should fall approximately along a straight

line. In the above plot, the data deviate quite a bit from the straight line, indicating that the

deviations follow a distribution with longer tails than that of a normal distribution.



Residual Autocorrelations

This plot calculates the autocorrelation between residuals as a function of the number of rows

between them in the datasheet.

Residual Autocorrelations for chlorine

lag

au

toco

rre

lati

on

0 2 4 6 8 10 12

-1

-0.6

-0.2

0.2

0.6

1

It is only relevant if the data have been collected sequentially. Any bars extending beyond the

probability limits would indicate significant dependence between residuals separated by the

indicated “lag”, which would violate the assumption of independence made when fitting the

regression model.

Pane Options

Plot: the type of residuals to plot:

1. Residuals – the residuals from the least squares fit.

2. Studentized residuals – the difference between the observed values yi and the predicted

values iy when the model is fit using all observations except the i-th, divided by the

estimated standard error. These residuals are sometimes called externally deleted

residuals, since they measure how far each value is from the fitted model when that



model is fit using all of the data except the point being considered. This is important,

since a large outlier might otherwise affect the model so much that it would not appear to

be unusually far away from the line.

Type: the type of plot to be created. A Scatterplot is used to test for curvature. A Normal

Probability Plot is used to determine whether the model residuals come from a normal

distribution. An Autocorrelation Function is used to test for dependence between consecutive

residuals.

Plot Versus: for a Scatterplot, the quantity to plot on the horizontal axis.

Number of Lags: for an Autocorrelation Function, the maximum number of lags. For small

data sets, the number of lags plotted may be less than this value.

Confidence Level: for an Autocorrelation Function, the level used to create the probability

limits.

Unusual Residuals

Once the model has been fit, it is useful to study the residuals to determine whether any outliers

exist that should be removed from the data. The Unusual Residuals pane lists all observations

that have Studentized residuals of 2.0 or greater in absolute value.

Unusual Residuals for chlorine

Predicted Studentized

Row Y Y Residual Residual

10 0.43 0.456641 -0.0266407 -2.67

17 0.46 0.42628 0.0337201 3.59

18 0.45 0.42628 0.0237201 2.35

35 0.38 0.400815 -0.0208151 -2.02

Studentized residuals greater than 3 in absolute value correspond to points more than 3 standard

deviations from the fitted model, which is a rare event for a normal distribution. Row #17 is

more than 3.5 standard deviations from the fitted model, which is a very rare event if the

deviations follow a normal distribution.

Note: Points can be removed from the fit while examining the Plot of the Fitted Model by

clicking on a point and then pressing the Exclude/Include button on the analysis toolbar.

Excluded points are marked with an X.



Influential Points

In fitting a regression model, all observations do not have an equal influence on the parameter

estimates in the fitted model. In a simple regression, points located at very low or very high

values of X have greater influence than those located nearer to the mean of X. The Influential

Points pane displays any observations that have high influence on the fitted model:

Influential Points for chlorine

Mahalanobis Cook's

Row Leverage Distance DFITS Distance

10 0.0407876 0.80918 -0.550164 0.00516818

17 0.051007 1.2807 0.833184 0.0130882

18 0.051007 1.2807 0.544379 0.00647643

40 0.0752918 2.44299 -0.440596 0.00654216

Average leverage of single data point = 0.0454545

Points are placed on this list for one of the following reasons:

Leverage – measures how distant an observation is from the mean of all n observations in

the space of the independent variables. The higher the leverage, the greater the impact of the

point on the fitted values .y Points are placed on the list if their leverage is more than 3 times

that of an average data point.

Mahalanobis Distance – measures the distance of a point from the center of the collection of

points in the multivariate space of the independent variables. Since this distance is related to

leverage, it is not used to select points for the table.

DFITS – measures the difference between the predicted values iy when the model is fit with

and without the i-th data point. Points are placed on the list if the absolute value of DFITS

exceeds np /2 , where p is the number of coefficients in the fitted model.



Save Results

The following results may be saved to the datasheet:

1. Predicted Values – the predicted value of Y corresponding to each of the n observations.

2. Standard Errors of Predictions - the standard errors for the n predicted values.

3. Lower Limits for Predictions – the lower prediction limits for each predicted value.

4. Upper Limits for Predictions – the upper prediction limits for each predicted value.

5. Standard Errors of Means - the standard errors for the mean value of Y at each of the n

values of X.

6. Lower Limits for Forecast Means – the lower confidence limits for the mean value of Y

at each of the n values of X.

7. Upper Limits for Forecast Means– the upper confidence limits for the mean value of Y at

each of the n values of X.

8. Residuals – the n residuals.

9. Studentized Residuals – the n Studentized residuals.

10. Leverages – the leverage values corresponding to the n values of X.

11. DFITS Statistics – the value of the DFITS statistic corresponding to the n values of X.

12. Mahalanobis Distances – the Mahalanobis distance corresponding to the n values of X.

13. Coefficients – the estimated model coefficients.

14. Function – a text string containing the STATGRAPHICS expression for the function that

was fit.

Calculations

Parameter estimates are found by numerically minimizing the residual sums of squares. The

variance-covariance matrix of the coefficients is estimated from the partial derivatives in the

neighborhood of the least squares solution.

Documents

Nonlinear Regression - statvision.com