Lecture36 2012 Full

7/31/2019 Lecture36 2012 Full

http://slidepdf.com/reader/full/lecture36-2012-full 1/30

ST 102: Elementary Statistical TheoryLecture 36 – Linear Regression: Prediction and

Diagnostics

Piotr [email protected]

Department of Statistics, LSE

1 / 3 0

http://find/

7/31/2019 Lecture36 2012 Full


Objectives:

Confidence intervals for E (y )Predictive intervals for y

Regression diagnostics: a summary

Based on the observations {(x i ,

y i )

i = 1, · · · ,

n}, we fit aregression model y = β 0 + β 1x .

Goal. Predict (unobserved) y corresponding to (known) x .

Point prediction: y = β 0 + β 1x .

2 / 3 0

http://find/

7/31/2019 Lecture36 2012 Full


For the analysis to be more informative, we would like to have

some ‘error bars’ for our prediction. We introduce two methods:

Confidence interval for µ(x ) ≡ E (y ) = β 0 + β 1x

Predictive interval for y

Remark. Confidence interval is an interval estimator for anunknown parameter (i.e. for a constant) while predictive interval isfor a random variable. They are different and serve differentpurposes.

We assume the model is normal, i.e. ε = y −β 0 −β 1x ∼ N (0, σ2).

3 / 3 0

http://goforward/

http://find/

7/31/2019 Lecture36 2012 Full


Confidence interval for µ(x ) = Ey

Let µ(x ) = β 0

+ β 1

x . Then µ(x ) is an unbiased estimator for µ(x ).

Theorem. µ(x ) is normally distributed with mean µ(x ) andvariance

Var{

µ(x )} =

σ2

n

ni =1(x i − x )2

n

j =1

(x j − x )2.

Proof . Note both β 0 and β 1 are linear estimators. Therefore µ(x )may be written in the form µ(x ) =

ni =1 b i y i , where b 1, · · · , b n are

some constants. Hence

µ(x ) is normally distributed. To determine

its distribution entirely, we only need to find its mean and variance.

E { µ(x )} = E ( β 0) + E ( β 1)x = β 0 + β 1x = µ(x )

4 / 3 0

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Var{ µ(x )} = E [{( β 0 − β 0) + ( β 1 − β 1)x }2]

= Var(

β 0) + x 2Var(

β 1) + 2x Cov(

β 0,

β 1).

In Lecture 33 we derived

Var( β 0) =σ2

n

ni =1

x 2i n

j =1

(x j − x )2, Var( β 1) = σ2 n

j =1

(x j − x )2.

In Workshop 18 we showed Cov( β 0, β 1) = −σ2x n j =1(x j − x )2.

Hence n j =1(x j − x )2

σ2

Var{

µ(x )} =

1

n

n

i =1

x 2i + x 2 − 2x x

=1

n

ni =1

x 2i + nx 2 − 2x

ni =1

x i

=1

n

ni =1

(x i − x )2.

i.e. Var{ µ(x )} = (σ

2

/n) n

i =1(x i − x )

2

n

j =1(x j − x )

2

.5 / 3 0

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Now

{

µ(x ) − µ(x )}

σ2

n

ni =1(x i − x )2

n j =1(x j − x )2

1/2∼ N (0, 1),

and n−2σ2 σ2 ∼ χ2

n−2, where σ2 = 1n−2

ni =1(y i − β 0 − β 1x i )

2.

Furthermore, µ(x ) and σ2 are independent. Hence

{ µ(x ) − µ(x )} σ2

n ni =1(x i − x )2

n j =1(x j − x )21/2

∼ t n−2.

A (1 − α) confidence interval for µ(x ) is

µ(x ) ± t α/2, n−2 σ ni =1(x i − x )2

nn

j =1(x j − x )21/2

Recall: the above interval contains the true expectation Ey = µ(x )with probability 1− α. It does not cover y with probability 1 − α.

6 / 3 0

http://find/

7/31/2019 Lecture36 2012 Full


Predictive interval – an interval contains y withprobability 1 − α

We may assume that y to be predicted is independent from

y 1, · · · , y n used in estimation.

Hence y − µ(x ) is normal with mean 0 and variance

Var(y ) + Var{

µ(x )} = σ2 +

σ2

n

ni =1(x i − x )2

n

j =1(x j − x )2

Therefore

{y −

µ(x )}

σ2

1 +

ni =1(x i − x )2

n

n j =1(x j − x )2

1/2∼ t n−2.

An interval covering y with probability 1 − α is

µ(x ) ± t α/2, n−2 σ 1 +

ni =1(x i − x )2

nn

j =1(x j − x )2

1/2

7 / 3 0

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Remark. (i) It holds that

P

y ∈ µ(x ) ± t α/2, n−2 σ 1 +

ni =1(x i − x )2

nn

j =1(x j − x )2

1/2 = 1 − α.

(ii) The predictive interval for y is longer than the confidenceinterval for E (y ). The former contains unobserved random variabley with probability 1 − α, the latter contains unknown constantE (y ) with probability 1 − α.

8 / 3 0

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Example. The data set ‘usedFord.mtw’ contains the prices (y , in$1,000) of 100 three-year old Ford Tauruses together with their

mileages (x , in 1000 miles) when they were sold at auction. Basedon those data, a car dealer needs to make two decisions:

1 to prepare cash for bidding on a three-year old Ford Tauruswith the mileage of x = 40;

2

to prepare for buying several three-year old Ford Tauruseswith the mileages close to x = 40 from a rental company.

For the first task, a predictive interval would be more appropriate.For the second task, he needs to know the average price and,

therefore, a confidence interval.

This can be down easily using Minitab.

9 / 3 0

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


MTB > regr c1 1 c2;SUBC> predict 40.

Price = 17.2 - 0.0669 Mileage

Predictor Coef SE Coef T P

Constant 17.2487 0.1821 94.73 0.000Mileage -0.066861 0.004975 -13.44 0.000

S = 0.326489 R-Sq = 64.8% R-Sq(adj) = 64.5%

Analysis of Variance

Source DF SS MS F PRegression 1 1 9.256 1 9.256 1 80.64 0 .000

Residual Error 98 10.446 0.107Total 99 29.702

... ...

Predicted Values for New Observations

NewObs Fit SE Fit 95% CI 95% PI

1 14.5743 0.0382 (14.4985, 14.6501) (13.9220, 15.2266)

NewObs Mileage

1 40.0

We predict that a

Ford Taurus will sell

for between $13,922

and $15,227. The

average selling price

of several 3-year-old

Ford Tauruses is es-

timated to be be-tween $14,499 and

$14,650. Because

predicting the sell-

ing price for one car

is more difficult, thecorresponding inter-

val is wider.

10/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


To produce the plots with both confidence intervals for E (y ) andpredictive intervals for y :MTB > Fitline c1 c2;SUBC> Confidence 95;SUBC> Ci;SUBC> Pi.

11/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Regression Diagnostics

The usefulness of a fitted regression model rests on a basicassumption:Ey = β 0 + β 1x

Furthermore the inference such as the tests, the confidenceintervals and predictive intervals only make sense if ε1, · · · , εn are

(approximately) independent and normal with constantvariance σ2.

Therefore it is important to check those conditions are met inpractice — this task is called Regression Diagnostics.

Basic idea: looking into the residuals εi or the normalizedresiduals εi / σ.

12/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


What to look for?

Do the residuals manifest i.i.d. normal behaviour?

Is the scatter plot of εi versus x i patternless?

Is the scatter plot of

εi versus

y i patternless?

Is the scatter plot of εi versus i patternless?

If you see trends, periodic patterns, increasing variation in any oneof the above scatter plots, it is very likely that at least one

assumption is not met.

13/30

Th i id l l b b i d i Mi i b f ll

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


The various residual plots can be obtained in Minitab as follows(using the same example):MTB > Fitline c1 c2;SUBC> gfourpack;SUBC> gvars c2.

14/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


15/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Two other issues in regression diagnostics: outliers andinfluential observations.

Outlier: an unusually small or unusually large y i which lies outside of themajority of observations.

An outlier is often caused by an error in either sampling or recording data.If so, we should correct it before proceeding with the regression analysis.

If an observation which looks like an outlier indeed belongs to the sampleand no errors in sampling or recording were discovered, we may use morecomplex model or distribution to accommodate this ‘outlier’. Forexample, stock returns often exhibit extreme values and they oftencannot be modelled satisfactorily by a normal regression model.

Remark. Strictly speaking, outliers are defined with respect to the model:y is very unlikely to be 2σ distance away from Ey = β 0 + β 1x under thenormal regression model. This is how Minitab identifies potential outliers.

16/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Influential observation: an x i which is far away from other x s.

Such an observation may have a large influence on the fitted

regression line.

17/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Remark. (i) Minitab output marks both outliers and influentialobservations.

MTB > regr c1 1 c2;SUBC> predict 40.

Price = 17.2 - 0.0669 Mileage

... ...

Unusual Observations

Obs Mileage Price Fit SE Fit Residual St Resid8 19.1 15.7000 15.9717 0.0902 -0.2717 -0.87 X

14 34.5 15.6000 14.9420 0.0335 0.6580 2.03R19 48.6 14.7000 13.9993 0.0706 0.7007 2.20R63 21.2 15.4000 15.8313 0.0806 -0.4313 -1.36 X74 21.0 16.4000 15.8446 0.0815 0.5554 1.76 X78 44.3 13.6000 14.2868 0.0526 -0.6868 -2.13R

R denotes an observation with a large standardized residual.X denotes an observation whose X value gives it large influence.

... ...

18/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


(ii) To mitigate the impact of both outliers and influentialobservations, we could use robust regression, i.e. estimate β 0 and

β 1 by minimising the sum of absolute deviations:

SAD (β 0, β 1) =n

i =1

|y i − β 0 − β 1x i |

However, note that since the function f (x ) = |x | is notdifferentiable at where it attains its minimum, we would not beable to find β 0 and β 1 by differentiating SAD (β 0, β 1) w.r.t. to β 0and β 1 and equating the partial derivatives to zero. More complex

minimisation techniques would have to be used. This may beviewed as a drawback of this approach.

19/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Workshop 19

In this workshop we apply the simple linear regression method tostudy the relationship between two financial returns series: a

regression of Cisco Systems stock returns y on S&P500 Indexreturns x . This regression model is an example of the CAPM(Capital Asset Pricing Model).

Stock returns:

return =current price − previous price

previous price≈ log

current price

previous price

when the difference between the two prices is small.

Dataset: “return4.mtw” (on moodle). Daily returns 3 January –29 December 2000 (n = 252 observations). Dataset has 5columns: c1 – date, c2 – 100×(S&P500 return), c3 – 100×(Ciscoreturn), and c4 and c5 are two other stock returns.

20/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Workshop 19

Remark. Daily prices are definitely not independent. Howeverdaily returns may be seen as a sequence of uncorrelated random

variables.

MTB > describe c2 c3.

Descriptive Statistics: S&P500, Cisco

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3

S&P500 252 0 -0.0424 0.0882 1.4002 -6.0045 -0.8543 -0.0379 0.8021Cisco 252 0 -0.134 0.267 4.234 -13.439 -3.104 -0.115 2.724

Variable MaximumS&P500 4.6546Cisco 15.415

For S&P500, average daily return is -0.04%, the maximum daily

return is 4.46%, the minimum daily return is -6.01%, and thestandard deviation is 1.40.

For Cisco, average daily return is -0.13%, the maximum dailyreturn is 15.42%, the minimum daily return is -13.44%, and the

standard deviation is 4.23. 21/30

Workshop 19

http://find/

7/31/2019 Lecture36 2012 Full


Workshop 19

Remark. Cisco is much more volatile than S&P500.MTB > tsplot c2 c3;SUBC> overlay.

22/30

Workshop 19

http://find/

7/31/2019 Lecture36 2012 Full


Workshop 19

There is clear synchronisation between the movements of the tworeturn series.

MTB > corr c2 c3Pearson correlation of S&P500 and Cisco = 0.687

P-Value = 0.000

23/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


p 9

We fit a regression model: Cisco = β 0 + β 1S&P500 + ε

Rationale : part of the fluctuation in Cisco returns was driven bythe fluctuation of the S&P500 return.

MTB > regr c3 1 c2

The regression equation is Cisco = - 0.045 + 2.08 S&P500

Predictor Coef SE Coef T PConstant -0.0455 0.1943 -0.23 0.815

S&P500 2.0771 0.1390 14.94 0.000

S = 3.08344 R-Sq = 47.2% R-Sq(adj) = 47.0%

Analysis of Variance

Source DF SS MS F P

Regression 1 2123.1 2123.1 223.31 0.000

Residual Error 250 2376.9 9.5

Total 251 4500.0

24/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


p

Unusual Observations

Obs S&P500 Cisco Fit SE Fit Residual St Resid

2 -3.91 -5.771 -8.167 0.572 2.396 0.79 X

27 -2.10 2.357 -4.415 0.346 6.772 2.21R

36 0.63 11.208 1.259 0.215 9.949 3.23R

51 2.40 -2.396 4.936 0.391 -7.332 -2.40R

52 4.65 2.321 9.623 0.681 -7.302 -2.43RX

... ...210 1.37 -5.328 2.808 0.277 -8.135 -2.65R

211 2.17 11.431 4.470 0.364 6.961 2.27R

234 0.74 -5.706 1.487 0.222 -7.193 -2.34R

235 3.82 12.924 7.886 0.571 5.038 1.66 X

244 0.80 -11.493 1.624 0.227 -13.117 -4.27R246 -3.18 -13.439 -6.650 0.477 -6.789 -2.23RX

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

25/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


p

The estimated slope: β 1 = 2.077. The null hypothesis H 0 : β 1 = 0is rejected with p -value 0.000: extremely significant.

Attempted interpretation: when the market index goes up by 1%,Cisco stock goes up by 2.077% on average. However the errorterm ε in the model is large with the estimated σ = 3.08%.

The p -value for testing H 0 : β 0 = 0 is 0.815, so we cannot rejectthe hypothesis β 0 = 0. Recall β 0 = y − β 1x and both y and x arevery close to 0.

There are many standardised residual values ≥ 2 or ≤ −2,indicating non-normal error distribution.

R 2 = 47.2% of the variation of the Cisco stock may explained bythe variation of the S&P500 index, or in other words the 47.2% of

the risk in the Cisco stock is the market-related risk — seeCAPM below.

26/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


p

CAPM — a simple asset pricing model in finance:

y i = β 0 + β 1x i + εi

where y i is a stock return and x i is a market return at time i .

Total risk of the stock:

1

n

n

i =1(y i − y )

2

=

1

n

n

i =1 ( y i − y )

2

+

1

n

n

i =1 (y i − y i )

2

Market-related (or systematic) risk:

1

n

n

i =1

(

y i − y )2 =

1

n β 21

n

i =1

(x i − x )2.

Firm-specific risk: 1n

ni =1(y i − y i )

2

Remark. (i) β 1 measures the market-related (or systematic) risk of

the stock. 27/30

Workshop 19

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


(ii) Market-related risk is unavoidable, while firm-specific risk may

be “diversified away” through hedging.

(iii) Variance is a simple and one of the most frequently usedmeasure for risk in finance.

To plot the data with fitted regression line together withconfidence bounds for E (y ) and predictive bounds for y :

MTB > Fitline c3 c2;

SUBC> gfourpack;

SUBC> confidence 95;SUBC> ci;

SUBC> pi.

28/30

http://find/

http://goback/

7/31/2019 Lecture36 2012 Full


Workshop 19

7/31/2019 Lecture36 2012 Full


Top-left panel: points below the line in the top-right corner, above

the line in the bottom-left corner — the residual distribution hasheavier tails than N (0, σ2).

30/30

http://find/

Documents

Lecture36 2012 Full