69
ROBUST REGRESSION METHOD By, SUMON JOSE A Seminar Presentation Under the Guidence of Dr. Jessy John February 24, 2015 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69

Seminar on Robust Regression Methods

Embed Size (px)

Citation preview

Page 1: Seminar on Robust Regression Methods

ROBUST REGRESSION METHOD

By,SUMON JOSE

A Seminar Presentation

Under the Guidence of Dr. Jessy John

February 24, 2015

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69

Page 2: Seminar on Robust Regression Methods

CONTENTS

1 INTRODUCTION2 REVIEW3 ROBUSTNESS & RESISTANCE4 APPROACH5 STRENGTHS & WEAKNESSES6 M- ESTIMATORS7 DELIVERY TIME PROBLEM8 ANALYSIS9 PROPERTIES10 SURVEY OF OTHER ROBUST REGRESSION

ESTIMATORS11 REFERENCE

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 2 / 69

Page 3: Seminar on Robust Regression Methods

INTRODUCTION

Perfomance Evaluation- Geethu Anna Jose

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 3 / 69

Page 4: Seminar on Robust Regression Methods

REVIEW

The classical linear regression model relates the

dependednt or response variables yi to independent

explanatory variables xi1, xi2, ..., xip for i = 1, .., n, such

that

yi = xTi β + εi , (1)

for i=1,...,n

where xTi = (xi1, xi2, ..., xip), εi denote the error terms and

β = (β1, β2, ..., βp)T

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 4 / 69

Page 5: Seminar on Robust Regression Methods

REVIEW

The expected value of yi called the fitted value is

yi = xTi β (2)

and one can use this to calculate the residual for the i th

case,

ri = yi − yi (3)

In the case of simple linear regression model we may

calculate the value of β0 and β1 using the following

formulae:SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 5 / 69

Page 6: Seminar on Robust Regression Methods

REVIEW

β1 =

∑ni=1 yixi −

∑ni=1 yi

∑ni=1 xi

n∑ni=1 x

2i −

(∑n

i=1 xi )2

n

(4)

β0 = y − β1x (5)

The vector of fitted values yi curresponding to the

observed value yi may be expressed as follows:

y = X β (6)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 6 / 69

Page 7: Seminar on Robust Regression Methods

REVIEW

Limitations of Least Square Estimator

Extremly sensitive to deviations from the model

assumptions (as normal distribution is assumed for the

errors).

Drastically changed by the effect of outliers.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 7 / 69

Page 8: Seminar on Robust Regression Methods

REVIEW

What About Deleting Outliers Before Analysis

All the Outliers need not be erroneous data, they

could be exceptional occurances

Some of such Outliers could be the result of some

factors not considered in the current study

So in general, unusual observations are not always bad

observations. Moreover in large data it is often very

difficult to spot out the outlying data.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 8 / 69

Page 9: Seminar on Robust Regression Methods

ROBUSTNESS AND RESISTANCE

Resistant Regression Estimators

Definition

The Resistant regression estimators are primarily

concerned with robustness of validity: meaning that their

main concern is to prevent unsual observations from

affecting the estimates produced.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 9 / 69

Page 10: Seminar on Robust Regression Methods

ROBUSTNESS AND RESISTANCE

Robust Regression Estimators

Definition

They are concerned with both robustness of efficiency and

robustness of validity, meaning that they should also

maintain a small sampling variance, even when the data

does not fit the assumed distribution.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 10 / 69

Page 11: Seminar on Robust Regression Methods

ROBUSTNESS AND RESISTANCE

⇒ In general Robust regression estimators aim to fit

a model that describes the majority of a sample.

⇒ Their robustness is achieved by giving the data

different weights

⇒ Whereas in Least Square Approximation all data

are treated equally.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 11 / 69

Page 12: Seminar on Robust Regression Methods

APPROACH

Robust Estimation methods are powerful tools in

detection of outliers in complicated data sets.

But unless the data is very well behaved, different

estimators would give different estimates.

On their own, they do not provide a final model.

A healthy approach would be to employ both robust

regression methods as well as least square method to

compare the results.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 12 / 69

Page 13: Seminar on Robust Regression Methods

STRENGTHS & WEAKNESSES

Finite Sample Breakdown Point

Definition

Breakdown Point (BDP) is the measure of the resistance

of an estimator. The BDP of a regression estimator is the

smallest fraction of contamination that can cause the

estimator to ’breakdown’ and no longer represent the

trend of the data.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 13 / 69

Page 14: Seminar on Robust Regression Methods

STRENGTHS & WEAKNESSES

When an estimator breaks down, the estimate it produces

from the contaminated data can become arbitrarily far

from the estimate than it would give when the data was

uncontaminated.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 14 / 69

Page 15: Seminar on Robust Regression Methods

STRENGTHS & WEAKNESSES

In order to describe the BDP mathematically, define T as

a regression estimator, Z as a sample of n data points and

T (Z ) = β. Let Z′

be the corrupted sample where m of

the original data points are replaced with arbitrary values.

The maximum effect that could be caused by such

contamination is

effect(m; T ,Z ) = supz ′ |T (Z′)− T (Z )| (7)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 15 / 69

Page 16: Seminar on Robust Regression Methods

STRENGTHS & WEAKNESS

When (7) is infinite, an outlier can have an arbitrarily

large effect on T . The BDP of T at the sample Z is

therefore defined as:

BDP(T ,Z ) = min{m

n: effect(M ; T ,Z )is infinite} (8)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 16 / 69

Page 17: Seminar on Robust Regression Methods

STRENGTH & WEAKNESSES

The Least Square Method estimator for example has a

breakdown point of 1n because just one leverage point can

cause it to breakdown. As the number of data increases,

the breakdown point tends to 0 and so it is said to that

the least squares estimator has BDP 0%.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 17 / 69

Page 18: Seminar on Robust Regression Methods

STRENGTH & WEAKNESS

Remark

The highest breakdown point one can hope for is 50% as

if more than half the data is contaminated that one

cannot differentiate between ’good’ and ’bad’ data.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 18 / 69

Page 19: Seminar on Robust Regression Methods

STRENGTH & WEAKNESSES

Relative Efficiency of an Estimator

Definition

The efficiency of an estimator for a particular parameter is

defined as the ratio of its minimum possible variance to

its actual variance. Strictly, an estimator is considered

’efficient’ when this ratio is one.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 19 / 69

Page 20: Seminar on Robust Regression Methods

STRENGTH & WEAKNESSES

High efficiency is crucial for an estimator if the intention

is to use an estimate from sample data to make inference

about the larger population from which the same was

drawn.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 20 / 69

Page 21: Seminar on Robust Regression Methods

STRENGTH & WEAKNESSES

Relative Efficiency

Relative efficiency compares the efficiency of an

estimator to that of a well known method.

In the context of regression, estimators are compared

to the least squares estimator which is the most

efficient estimator known.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 21 / 69

Page 22: Seminar on Robust Regression Methods

STRENGTH & WEAKNESSES

Given two estimators T1 and T2 for a population

parameter β, where T1 is the most efficient estimator

possible and T2 is less efficient, the relative efficiency of

T2 is calculated as the ratio of its mean squared error to

the mean squared error of T1

Efficiency(T1,T2) =E [(T1 − β)2]

E [(T2 − β)2](9)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 22 / 69

Page 23: Seminar on Robust Regression Methods

M-ESTIMATORS

Introduction

1 Were first proposed by Huber(1973)

2 But the early ones had the weakness in terms of one

or more of the desired properties

3 From them developed the modern means

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 23 / 69

Page 24: Seminar on Robust Regression Methods

M-ESTIMATORS

Maximum Likelihood Type Estimators

M-estimation is based on the idea that while we still want

a maximum likelihood estimator, the errrors might be

better represented by a different, heavier tailed

distribution.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 24 / 69

Page 25: Seminar on Robust Regression Methods

M-ESTIMATORS

If the probability distribution function of the error of f (εi),

then the maximum likelihood estimator for β is that

which maximizes the likelihood function

n∏i=1

f (εi) =n∏

i=1

f (yi − xTi β) (10)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 25 / 69

Page 26: Seminar on Robust Regression Methods

M-ESTIMATORS

This means, it also maximizes the log-likelihood function

n∑i=1

ln f (εi) =n∑

i=1

ln f (yi − xTi β) (11)

When the errrors are normally distributed it has been

shown that this leads to minimising the sum of squared

residuals, which is the ordinary least square method.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 26 / 69

Page 27: Seminar on Robust Regression Methods

M-ESTIMATORS

Assuming the the errors are differently distributed, leads to

the maximum likelihood esimator, minimising a different

function. Using this idea, an M-estimator β minimizes

n∑i=1

ρ(εi) =n∑

i=1

ρ(yi − xTi β) (12)

where ρ(u) is a continuous, symmetric function called the

objectve function with a unique minimum at 0.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 27 / 69

Page 28: Seminar on Robust Regression Methods

M-ESTIMATORS

1 Knowing the appropriate ρ(u) to use requires

knowledge of how the errors are really distributed.

2 Functions are usually chosen through consideration of

how the resulting estimator down-weights the larger

residuals

3 A Robust M-estimator achieves this by minimizing the

sum of a less rapidly increasing objective function than

the ρ(u) = u2 of the least squares

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 28 / 69

Page 29: Seminar on Robust Regression Methods

M-ESTIMATORS

Constructing a Scale Equivariant Estimator

The M-estimator is not necessarily scale invariant i.e. if

the errors yi − xTi β were multiplied by a constant, the

new solution to the above equation might not be the

same as the scaled version of the old one.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 29 / 69

Page 30: Seminar on Robust Regression Methods

M-ESTIMATORS

To obtain a scale invariant version of this estimator we

usually solve,

n∑i=1

ρ(εis

) =n∑

i=1

ρ(yi − xT

i β

s) (13)

where s is a robust estimate of scale.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 30 / 69

Page 31: Seminar on Robust Regression Methods

M-ESTIMATORS

A popular choice for s is the re-scaled median absolute

deivation

s = 1.4826XMAD (14)

where MAD is the Median Absolute Deviation

MAD = Median|yi − xTi β| = Median|εi | (15)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 31 / 69

Page 32: Seminar on Robust Regression Methods

M-ESTIMATORS

’s’ is highly resistant to outlying observations, with BDP

50%, as it is based on the median rather than the mean.

The estimator rescales MAD by the factor 1.4826 so that

when the sample is large and εi really distributed as

N(0, σ2)), s estimates the standard deviation.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 32 / 69

Page 33: Seminar on Robust Regression Methods

M-ESTIMATORS

With a large sample and εi ∼ N(0, σ2):

P(|εi | < MAD) ≈ 0.5

⇒ P(|εi−0σ | <

MADσ ) ≈ 0.5

⇒ P(|Z | < MADσ ) ≈ 0.5

⇒ MADσ ≈ Φ−1(0.75)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 33 / 69

Page 34: Seminar on Robust Regression Methods

M-ESTIMATORS

⇒ MADΦ−1 ≈ σ

1.4826 X MAD ≈ σ

Thus the tuning constant 1.4826 makes s an

approximately unbiased estimator of σ if n is large and the

error distribution is normal.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 34 / 69

Page 35: Seminar on Robust Regression Methods

M-ESTIMATORS

Finding an M-Estimator

To obtain an M-estimate we solve,

Minimizeβ

n∑i=1

ρ(εis

) = Minimizeβ

n∑i=1

ρ(yi − x

s) (16)

For that we equate the first partial derivatives of ρ with

respect to βj (j=0,1,2,3,...,k) to zero, yielding a necessary

condition for a minimum.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 35 / 69

Page 36: Seminar on Robust Regression Methods

M-ESTIMATORS

This gives a system of p = k + 1 equations

n∑i=1

Xijψ(yi − x

s) = 0, j = 0, 1, 2, ..., k (17)

where ψ = ρ′ and Xij is the i th observation on the j th

regressor and xi0 = 1. In general ψ is a non-linear

function and so equation (17) must be solved iteratively.

The most widely used method to find this is the method

of iteratively reweighted least squares.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 36 / 69

Page 37: Seminar on Robust Regression Methods

M-ESTIMATORS

To use iteratively reweighted least squares suppose that aninitial estimate of β0 is available and that s is an estimateof the scale. Then we write the p = k + 1 equations as:

n∑i=1

Xijψ(yi − x

′i β

s) =

n∑i=1

xij{ψ[(yi − x ′iβ)/s]/(yi − x ′iβ)/s}(yi − x ′iβ)

s= 0

(18)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 37 / 69

Page 38: Seminar on Robust Regression Methods

M-ESTIMATORS

as

n∑i=1

XijW0i (yi − xiβ) = 0, j = 0, 1, 2, ..., k (19)

where

W 0i =

ψ[

(yi−x′i β)

s ]

(yi−x′iβ)

s

if yi 6= x′

i β0

1 if yi = x′

i β0

(20)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 38 / 69

Page 39: Seminar on Robust Regression Methods

M-ESTIMATORS

We may write the above equation in matrix form as

follows:

X′W 0Xβ = X

′W 0y (21)

where W0 is an n X n diagonal matrix of weights with

diagonal elements given by the expression

W 0i =

ψ[

(yi−x′i β)

s ]

(yi−x′iβ)

s

if yi 6= x′

i β0

1 if yi = x′

i β0

(22)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 39 / 69

Page 40: Seminar on Robust Regression Methods

M-ESTIMATORS

From the matrix form we realize that the expression is

same as that of the usual weighted least squares normal

equation. Consequently the one step estimator is

β1 = (X′W 0X )−1X

′W 0y (23)

At the next step we recompute the weights from the

equation for W but using β1 and not β0

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 40 / 69

Page 41: Seminar on Robust Regression Methods

M-ESTIMATORS

NOTE:

Usually only a few iterations are required to obtain

convergence

It could be easily be implemented by a computer

programme.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 41 / 69

Page 42: Seminar on Robust Regression Methods

M-ESTIMATORS

Re-Descending Estimators

Re- descending M estimators are those which have

influence functions that are non decreasing near the origin

but decreasing towards zero far from the origin.

Their ψ can be chosen to redescend smoothly to zero, so

that they usually satisfy ψ(x) = 0 for all |x | > r where r

is referred to as the minimum rejection point.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 42 / 69

Page 43: Seminar on Robust Regression Methods

M-ESTIMATORS

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 43 / 69

Page 44: Seminar on Robust Regression Methods

M-ESTIMATORS

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 44 / 69

Page 45: Seminar on Robust Regression Methods

M-ESTIMATORS

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 45 / 69

Page 46: Seminar on Robust Regression Methods

M-ESTIMATORS

Robust Criterion Functions

Citerion ρ ψ(z) w(x) range

Least

Squares z2

2 z 1.0 |z | <∞Huber’s

t-function z2

2 z 1.0 |z | < t

t = 2 |z |t − t2

2 tsign(z) t|z | |x | > t

Andrew’s

Wave function a(1− cos(za)) sin(za)sin( z

a )za|z | ≤ aπ

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 46 / 69

Page 47: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

ProblemA Softdrink bottler is analyzing the vending machine service routes in hisdistriution system. He is interested in predicting the amount of timerequired by the route driver to service the vending machines in an outlet.This service activity includes stocking the machine with beverage productsand minor maintenance or housekeeping. The industrial engineerresponsible for the study has suggested that the two most importantvariables affecting the delivery time (y) are the numer of cases of productstocked (x1) and the distance walked by the route driver (x2). Theengineer has collected 25 observations on delivery time, which are shownin the following table. Fit a regression model into it.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 47 / 69

Page 48: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Table of DataObservation Delivery time Number of cases Distance in Feets

i (in minutes) y x1 x21 16.8 7 5602 11.50 3 3203 12.03 3 3404 14.88 4 805 13.75 6 1506 18.11 7 3307 8 2 1108 17.83 7 2109 79.24 30 1460

10 21.50 5 60511 40.33 16 68812 21 10 21513 13.50 4 255

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 48 / 69

Page 49: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Observation Delivery time Number of cases Distance in Feets(in minutes) y x1 x2

14 19.75 6 46215 24.00 9 44816 29.00 10 77617 15.35 6 20018 19.00 7 13219 9.50 3 3620 35.10 17 77021 17.90 10 14022 52.32 26 81023 18.75 9 45024 19.83 8 63525 10.75 4 150

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 49 / 69

Page 50: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Least Square Fit of the Delivery Time DataObs. yi yi ei Weight

1 .166800E+02 .217081E+02 -.502808E+01 .100000E+012 0115000E+02 .103536E+02 .114639E+01 .100000E+013 .120300E+02 .120798E+02 -.497937E-01 .100000E+014 .148800E+02 .995565E+01 .492435E+01 .100000E+015 .137500E+02 .141944E+02 -.444398E+00 .100000E+016 .181100E+02 .183996E+02 -.289574E+00 .100000E+017 .800000E+01 .715538E+01 .844624E+00 .100000E+018 .178300E+02 .166734E+02 .115660E+02 .100000E+019 .792400E+02 .718203E+02 .741971E+01 .100000E+01

10 .215000E+02 .191236E+02 .237641E+01 .100000E+0111 .403300E+02 .380925E+02 .223749E+01 .100000E+0112 .2100000E+02 .215930E+02 -.593041E+00 .100000E+0113 .135000E+02 .124730E+02 .102701E+01 .100000E+01

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 50 / 69

Page 51: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Obs. yi yi ei Weight

14 .197500E+02 .186825E+02 .106754E+01 .100000E+0115 .240000E+02 .233288E+02 .671202E+00 .100000E+0116 .290000E+02 .296629E+02 -.662928E+00 .100000E+0117 .153500E+02 .149136E+02 .436360E+00 .100000E+0118 .190000E+02 .155514E+02 .344862E+01 .100000E+0119 .950000E+01 .770681E+01 .179319E+01 .100000E+0120 .351000E+02 .408880E+02 -.578797E+01 .100000E+0121 .179000E+02 .205142E+02 -.261418E+01 .100000E+0122 .523200E+02 .560065E+02 -.368653E+01 .100000E+0123 .187500E+02 .233576E+02 -.460757E+01 .100000E+0124 .198300E+02 .244029E+02 -.457285E+01 .100000E+0125 .107500E+02 .109626E+02 -.212584E+00 .100000E+01

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 51 / 69

Page 52: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Accordingly we have the following values for the

parameters:

β0 = 2.3412

β1 = 1.6159

β2 = 0.014385 Thus we have the regression line as

follows:

yi = 2.3412 + 1.6159x1 + 0.014385x2 (24)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 52 / 69

Page 53: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Huber’s t-Function, t=2Obs. yi yi ei Weight

1 .166800E+02 .217651E+02 -.508511E+01 .639744E+002 .115000E+02 .109809E+02 .519115E+00 .100000E+013 .120300E+02 .126296E+02 -.599594E+00 .100000E+014 .148800E+02 .105856E+02 .429439E+01 .757165E+005 .137500E+02 .146038E+02 -.853800E+00 .100000E+016 .181100E+02 .186051E+02 -.495085E+00 .100000E+017 .800000E+01 .794135E+01 .586521E-01 .100000E+018 .178300E+02 .169564E+02 .873625E+00 .100000E+019 .792400E+02 .692795E+02 .996050E+01 .327017E+00

10 .215000E+02 .193269E+02 .217307E+01 .100000E+0111 .403300E+02 .372777E+02 .305228E+01 .100000E+0112 .210000E+02 .216097E+02 -.609734E+00 .100000E+0113 .135000E+02 .129900E+02 .510021E+00 .100000E+01

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 53 / 69

Page 54: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Obs. yi yi ei Weighti

14 .197500E+02 .188904E+02 .859556E+00 .100000E+0115 .240000E+02 .232828E+02 .717244E+00 .100000E+0116 .290000E+02 .293174E+02 -.317449E+00 .100000E+0117 .153500E+02 .152908E+02 .592377E-01 .100000E+0118 .190000E+02 .158847E+02 .311529E+01 .100000E+0119 .950000E+01 .845286E+01 .104714E+01 .100000E+0120 .351000E+02 .399326E+02 -.483256E+01 .672828E+0021 .179000E+02 .205793E+02 -.267929E+01 .100000E+0122 .523200E+02 .542361E+02 -.191611E+01 .100000E+0123 .187500E+02 .233102E+02 -.456023E+01 .713481E+0024 .198300E+02 .243238E+02 .449377E+01 .723794E+0025 .107500E+02 .115474E+02 -.797359E+00 .100000E+01

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 54 / 69

Page 55: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Accordingly we get the values of the parameters as

follows: β0 = 3.3736

β1 = 1.5282

β2 = 0.013739

Thus we get the regression line as follows:

yi = 3.3736 + 1.5282x1 + 0.013739x2 (25)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 55 / 69

Page 56: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Andrew’s Wave Function with a = 1.48Obs. yi yi ei Weight

i

1 .166800E+02 .216430E+02 -.496300E+01 .427594E+002 .115000E+02 .116923E+02 -.192338E+00 .998944E+003 .120300E+02 .131457E+02 .-.111570E+01 .964551E+004 .148800E+02 .114549E+02 .342506E+01 .694894E+005 .137500E+02 .152191E+02 -.146914E+01 .939284E+006 .181100E+01 .188574E+02 -.747381E+00 .984039E+007 .800000E+01 .890189E+01 .901888E+00 .976864E+008 .178300E+02 ..174040E+02 ..425984E+00 .994747E+009 .792400E+02 .660818E+02 .131582E+02 .0

10 .215000E+02 .192716E+02 .222839E+01 .863633E+0011 .403300E+02 .363170E+02 .401296E+01 .597491E+0012 .210000E+02 .218392E+02 -.839167E+00 .980003E+0013 .135000E02 .135744E+02 -.744338E+01 .999843E+00

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 56 / 69

Page 57: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Obs. yi yi ei Weighti

14 .197500E+02 .198979E+02 .752115E+00 .983877E+0015 .240000E+02 .232029E+02 .797080E+00 .981854E+0016 ..290000E+02 .286336E+02 .366350E+00 .996228E+0017 .153500E+02 .158247E+02 -.474704E+00 .993580E+0018 .190000E+02 .164593E+02 .254067E+01 .824146E+0019 .950000E+01 .946384E+01 .361558E-01 .999936E+0020 .351000E+02 .387684E+02 -.366837E+01 .655336E+0021 .179000E+02 .209308E+02 -.303081E+01 .756603E+0022 .523200E+02 .523766E+02 -.566063E-01 .999908E+0023 .187500E+02 .232271E+02 .-.447714E+01 .515506E+0024 .198300E+02 .240095E+02 -.417955E+01 .567792E+0025 .107500E+02 .123027E+02 -1.55274E+01 .932266E+00

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 57 / 69

Page 58: Seminar on Robust Regression Methods

DELIVERY TIME PROBLEM

Thus we have the estimates as follows:

β0 = 4.6532

β1 = 1.4582

β2 = 0.012111

Thus we get the regression line as follows:

yi = 4.6532 + 1.4582x1 + 0.012111x2 (26)

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 58 / 69

Page 59: Seminar on Robust Regression Methods

ANALYSIS

Computing M-Estimators

Robust regression methods are not an option in most

statistical software today.

SAS, PROC, NLIN etc can be used to implement

iteratively reweighted least squares procedure.

There are also Robust procedures available in S-Pluz.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 59 / 69

Page 60: Seminar on Robust Regression Methods

ANALYSIS

Robust Regression Methods...

Robust regression methods have much to offer a data

analyst.

They will be extremly helpful in locating outliers and

hightly influential observations.

Whenever a least squares analysis is perfomed it would

be useful to perform a robust fit also.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 60 / 69

Page 61: Seminar on Robust Regression Methods

ANALYSIS

If the results of both the fit are in substantial

agreement, the use of Least Square Procedure offers a

good estimation of the parameters.

If the results of both the procedures are not in

agreement, the reason for the difference should be

identified and corrected.

Special attention need to be given to observations

that are down weighted in the robust fit.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 61 / 69

Page 62: Seminar on Robust Regression Methods

PROPERTIES

Breakdown Point The finite sample breakdown point is

the smallest fraction of anomalous data that can cause the

estimator to be useless. The smallest possible breakdown

poit is 1n , i.e. s single observation can distort the estimator

so badly that it is of no practical use to the regression

model builder. The breakdown point of OLS is 1n .

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 62 / 69

Page 63: Seminar on Robust Regression Methods

PROPERTIES

M-estimators can be affected by x-space outliers in an

identical manner to OLS.

Consequently, the breakdown point of the class of m

estimators is 1n as well.

We would generally want the breakdown point of an

estimator to exceed 10%.

This has led to the development of High Breakdown

point estimators.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 63 / 69

Page 64: Seminar on Robust Regression Methods

PROPERTIES

Efficiency

The M estimators have a higher efficiency than the least

squares, i.e. they behave well even as the size of the

sample increases to ∞.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 64 / 69

Page 65: Seminar on Robust Regression Methods

SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS

High Break Down Point Estimators Because both the

OLS and M-estimator suffer from a low breakdown point

1n , considerable effort has been devoted to finding

estimators that perform better with respect to this

property. Often a break down point of 50% is desirable.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 65 / 69

Page 66: Seminar on Robust Regression Methods

SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS

There are various other estimation procedures like

Least Median of Squares

Least Trimmed Sum of Squres

S Estimators

R and L Estimators

Robust Ridge regression

MM Estimation etc.

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 66 / 69

Page 67: Seminar on Robust Regression Methods

ABSTRACT & CONCLUSION

Review ⇒ Robustness and Resistance ⇒Our Approach ⇒ Strengths and Weaknesses

⇒ M-Estimators ⇒ Delivery time

problem ⇒ Analysis ⇒ Properties ⇒Survey of other Robust Regression Estimators

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 67 / 69

Page 68: Seminar on Robust Regression Methods

REFERENCE

1 Draper, R Norman. & Smith, Harry. “Applied Regression

Analysis”, 3rd edn., John Wiley and Sons, New York, 1998.

2 Montgomery, C Douglas. Peck, A Elizabeth. & Vining, Geoffrey

G. “Introduction to Linear Regression Analysis”, 3rd edn., Wiley

India, 2003.

3 Brook J, Richard. “Applied Regression Analysis and

Experimental Design”, Chapman & Hall, London, 1985.

4 Rawlings O, John. “Applied Regression Analysis: A Research

Tool”, Springer, New York, 1989.

5 Pedhazar, Elazar J. “Multiple Regression in Behavioural Research:

Explanation and Prediction”, Wadsworth, Australia, 1997SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 68 / 69

Page 69: Seminar on Robust Regression Methods

THANK YOU

SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 69 / 69