Upload
ronaldang89
View
216
Download
0
Embed Size (px)
Citation preview
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 1/30
ST 102: Elementary Statistical TheoryLecture 36 – Linear Regression: Prediction and
Diagnostics
Piotr [email protected]
Department of Statistics, LSE
1 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 2/30
Objectives:
Confidence intervals for E (y )Predictive intervals for y
Regression diagnostics: a summary
Based on the observations {(x i ,
y i )
i = 1, · · · ,
n}, we fit aregression model y = β 0 + β 1x .
Goal. Predict (unobserved) y corresponding to (known) x .
Point prediction: y = β 0 + β 1x .
2 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 3/30
For the analysis to be more informative, we would like to have
some ‘error bars’ for our prediction. We introduce two methods:
Confidence interval for µ(x ) ≡ E (y ) = β 0 + β 1x
Predictive interval for y
Remark. Confidence interval is an interval estimator for anunknown parameter (i.e. for a constant) while predictive interval isfor a random variable. They are different and serve differentpurposes.
We assume the model is normal, i.e. ε = y −β 0 −β 1x ∼ N (0, σ2).
3 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 4/30
Confidence interval for µ(x ) = Ey
Let µ(x ) = β 0
+ β 1
x . Then µ(x ) is an unbiased estimator for µ(x ).
Theorem. µ(x ) is normally distributed with mean µ(x ) andvariance
Var{
µ(x )} =
σ2
n
ni =1(x i − x )2
n
j =1
(x j − x )2.
Proof . Note both β 0 and β 1 are linear estimators. Therefore µ(x )may be written in the form µ(x ) =
ni =1 b i y i , where b 1, · · · , b n are
some constants. Hence
µ(x ) is normally distributed. To determine
its distribution entirely, we only need to find its mean and variance.
E { µ(x )} = E ( β 0) + E ( β 1)x = β 0 + β 1x = µ(x )
4 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 5/30
Var{ µ(x )} = E [{( β 0 − β 0) + ( β 1 − β 1)x }2]
= Var(
β 0) + x 2Var(
β 1) + 2x Cov(
β 0,
β 1).
In Lecture 33 we derived
Var( β 0) =σ2
n
ni =1
x 2i n
j =1
(x j − x )2, Var( β 1) = σ2 n
j =1
(x j − x )2.
In Workshop 18 we showed Cov( β 0, β 1) = −σ2x n j =1(x j − x )2.
Hence n j =1(x j − x )2
σ2
Var{
µ(x )} =
1
n
n
i =1
x 2i + x 2 − 2x x
=1
n
ni =1
x 2i + nx 2 − 2x
ni =1
x i
=1
n
ni =1
(x i − x )2.
i.e. Var{ µ(x )} = (σ
2
/n) n
i =1(x i − x )
2
n
j =1(x j − x )
2
.5 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 6/30
Now
{
µ(x ) − µ(x )}
σ2
n
ni =1(x i − x )2
n j =1(x j − x )2
1/2∼ N (0, 1),
and n−2σ2 σ2 ∼ χ2
n−2, where σ2 = 1n−2
ni =1(y i − β 0 − β 1x i )
2.
Furthermore, µ(x ) and σ2 are independent. Hence
{ µ(x ) − µ(x )} σ2
n ni =1(x i − x )2
n j =1(x j − x )21/2
∼ t n−2.
A (1 − α) confidence interval for µ(x ) is
µ(x ) ± t α/2, n−2 σ ni =1(x i − x )2
nn
j =1(x j − x )21/2
Recall: the above interval contains the true expectation Ey = µ(x )with probability 1− α. It does not cover y with probability 1 − α.
6 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 7/30
Predictive interval – an interval contains y withprobability 1 − α
We may assume that y to be predicted is independent from
y 1, · · · , y n used in estimation.
Hence y − µ(x ) is normal with mean 0 and variance
Var(y ) + Var{
µ(x )} = σ2 +
σ2
n
ni =1(x i − x )2
n
j =1(x j − x )2
Therefore
{y −
µ(x )}
σ2
1 +
ni =1(x i − x )2
n
n j =1(x j − x )2
1/2∼ t n−2.
An interval covering y with probability 1 − α is
µ(x ) ± t α/2, n−2 σ 1 +
ni =1(x i − x )2
nn
j =1(x j − x )2
1/2
7 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 8/30
Remark. (i) It holds that
P
y ∈ µ(x ) ± t α/2, n−2 σ 1 +
ni =1(x i − x )2
nn
j =1(x j − x )2
1/2 = 1 − α.
(ii) The predictive interval for y is longer than the confidenceinterval for E (y ). The former contains unobserved random variabley with probability 1 − α, the latter contains unknown constantE (y ) with probability 1 − α.
8 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 9/30
Example. The data set ‘usedFord.mtw’ contains the prices (y , in$1,000) of 100 three-year old Ford Tauruses together with their
mileages (x , in 1000 miles) when they were sold at auction. Basedon those data, a car dealer needs to make two decisions:
1 to prepare cash for bidding on a three-year old Ford Tauruswith the mileage of x = 40;
2
to prepare for buying several three-year old Ford Tauruseswith the mileages close to x = 40 from a rental company.
For the first task, a predictive interval would be more appropriate.For the second task, he needs to know the average price and,
therefore, a confidence interval.
This can be down easily using Minitab.
9 / 3 0
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 10/30
MTB > regr c1 1 c2;SUBC> predict 40.
Price = 17.2 - 0.0669 Mileage
Predictor Coef SE Coef T P
Constant 17.2487 0.1821 94.73 0.000Mileage -0.066861 0.004975 -13.44 0.000
S = 0.326489 R-Sq = 64.8% R-Sq(adj) = 64.5%
Analysis of Variance
Source DF SS MS F PRegression 1 1 9.256 1 9.256 1 80.64 0 .000
Residual Error 98 10.446 0.107Total 99 29.702
... ...
Predicted Values for New Observations
NewObs Fit SE Fit 95% CI 95% PI
1 14.5743 0.0382 (14.4985, 14.6501) (13.9220, 15.2266)
NewObs Mileage
1 40.0
We predict that a
Ford Taurus will sell
for between $13,922
and $15,227. The
average selling price
of several 3-year-old
Ford Tauruses is es-
timated to be be-tween $14,499 and
$14,650. Because
predicting the sell-
ing price for one car
is more difficult, thecorresponding inter-
val is wider.
10/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 11/30
To produce the plots with both confidence intervals for E (y ) andpredictive intervals for y :MTB > Fitline c1 c2;SUBC> Confidence 95;SUBC> Ci;SUBC> Pi.
11/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 12/30
Regression Diagnostics
The usefulness of a fitted regression model rests on a basicassumption:Ey = β 0 + β 1x
Furthermore the inference such as the tests, the confidenceintervals and predictive intervals only make sense if ε1, · · · , εn are
(approximately) independent and normal with constantvariance σ2.
Therefore it is important to check those conditions are met inpractice — this task is called Regression Diagnostics.
Basic idea: looking into the residuals εi or the normalizedresiduals εi / σ.
12/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 13/30
What to look for?
Do the residuals manifest i.i.d. normal behaviour?
Is the scatter plot of εi versus x i patternless?
Is the scatter plot of
εi versus
y i patternless?
Is the scatter plot of εi versus i patternless?
If you see trends, periodic patterns, increasing variation in any oneof the above scatter plots, it is very likely that at least one
assumption is not met.
13/30
Th i id l l b b i d i Mi i b f ll
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 14/30
The various residual plots can be obtained in Minitab as follows(using the same example):MTB > Fitline c1 c2;SUBC> gfourpack;SUBC> gvars c2.
14/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 15/30
15/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 16/30
Two other issues in regression diagnostics: outliers andinfluential observations.
Outlier: an unusually small or unusually large y i which lies outside of themajority of observations.
An outlier is often caused by an error in either sampling or recording data.If so, we should correct it before proceeding with the regression analysis.
If an observation which looks like an outlier indeed belongs to the sampleand no errors in sampling or recording were discovered, we may use morecomplex model or distribution to accommodate this ‘outlier’. Forexample, stock returns often exhibit extreme values and they oftencannot be modelled satisfactorily by a normal regression model.
Remark. Strictly speaking, outliers are defined with respect to the model:y is very unlikely to be 2σ distance away from Ey = β 0 + β 1x under thenormal regression model. This is how Minitab identifies potential outliers.
16/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 17/30
Influential observation: an x i which is far away from other x s.
Such an observation may have a large influence on the fitted
regression line.
17/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 18/30
Remark. (i) Minitab output marks both outliers and influentialobservations.
MTB > regr c1 1 c2;SUBC> predict 40.
Price = 17.2 - 0.0669 Mileage
... ...
Unusual Observations
Obs Mileage Price Fit SE Fit Residual St Resid8 19.1 15.7000 15.9717 0.0902 -0.2717 -0.87 X
14 34.5 15.6000 14.9420 0.0335 0.6580 2.03R19 48.6 14.7000 13.9993 0.0706 0.7007 2.20R63 21.2 15.4000 15.8313 0.0806 -0.4313 -1.36 X74 21.0 16.4000 15.8446 0.0815 0.5554 1.76 X78 44.3 13.6000 14.2868 0.0526 -0.6868 -2.13R
R denotes an observation with a large standardized residual.X denotes an observation whose X value gives it large influence.
... ...
18/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 19/30
(ii) To mitigate the impact of both outliers and influentialobservations, we could use robust regression, i.e. estimate β 0 and
β 1 by minimising the sum of absolute deviations:
SAD (β 0, β 1) =n
i =1
|y i − β 0 − β 1x i |
However, note that since the function f (x ) = |x | is notdifferentiable at where it attains its minimum, we would not beable to find β 0 and β 1 by differentiating SAD (β 0, β 1) w.r.t. to β 0and β 1 and equating the partial derivatives to zero. More complex
minimisation techniques would have to be used. This may beviewed as a drawback of this approach.
19/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 20/30
Workshop 19
In this workshop we apply the simple linear regression method tostudy the relationship between two financial returns series: a
regression of Cisco Systems stock returns y on S&P500 Indexreturns x . This regression model is an example of the CAPM(Capital Asset Pricing Model).
Stock returns:
return =current price − previous price
previous price≈ log
current price
previous price
when the difference between the two prices is small.
Dataset: “return4.mtw” (on moodle). Daily returns 3 January –29 December 2000 (n = 252 observations). Dataset has 5columns: c1 – date, c2 – 100×(S&P500 return), c3 – 100×(Ciscoreturn), and c4 and c5 are two other stock returns.
20/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 21/30
Workshop 19
Remark. Daily prices are definitely not independent. Howeverdaily returns may be seen as a sequence of uncorrelated random
variables.
MTB > describe c2 c3.
Descriptive Statistics: S&P500, Cisco
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
S&P500 252 0 -0.0424 0.0882 1.4002 -6.0045 -0.8543 -0.0379 0.8021Cisco 252 0 -0.134 0.267 4.234 -13.439 -3.104 -0.115 2.724
Variable MaximumS&P500 4.6546Cisco 15.415
For S&P500, average daily return is -0.04%, the maximum daily
return is 4.46%, the minimum daily return is -6.01%, and thestandard deviation is 1.40.
For Cisco, average daily return is -0.13%, the maximum dailyreturn is 15.42%, the minimum daily return is -13.44%, and the
standard deviation is 4.23. 21/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 22/30
Workshop 19
Remark. Cisco is much more volatile than S&P500.MTB > tsplot c2 c3;SUBC> overlay.
22/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 23/30
Workshop 19
There is clear synchronisation between the movements of the tworeturn series.
MTB > corr c2 c3Pearson correlation of S&P500 and Cisco = 0.687
P-Value = 0.000
23/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 24/30
p 9
We fit a regression model: Cisco = β 0 + β 1S&P500 + ε
Rationale : part of the fluctuation in Cisco returns was driven bythe fluctuation of the S&P500 return.
MTB > regr c3 1 c2
The regression equation is Cisco = - 0.045 + 2.08 S&P500
Predictor Coef SE Coef T PConstant -0.0455 0.1943 -0.23 0.815
S&P500 2.0771 0.1390 14.94 0.000
S = 3.08344 R-Sq = 47.2% R-Sq(adj) = 47.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 2123.1 2123.1 223.31 0.000
Residual Error 250 2376.9 9.5
Total 251 4500.0
24/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 25/30
p
Unusual Observations
Obs S&P500 Cisco Fit SE Fit Residual St Resid
2 -3.91 -5.771 -8.167 0.572 2.396 0.79 X
27 -2.10 2.357 -4.415 0.346 6.772 2.21R
36 0.63 11.208 1.259 0.215 9.949 3.23R
51 2.40 -2.396 4.936 0.391 -7.332 -2.40R
52 4.65 2.321 9.623 0.681 -7.302 -2.43RX
... ...210 1.37 -5.328 2.808 0.277 -8.135 -2.65R
211 2.17 11.431 4.470 0.364 6.961 2.27R
234 0.74 -5.706 1.487 0.222 -7.193 -2.34R
235 3.82 12.924 7.886 0.571 5.038 1.66 X
244 0.80 -11.493 1.624 0.227 -13.117 -4.27R246 -3.18 -13.439 -6.650 0.477 -6.789 -2.23RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large influence.
25/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 26/30
p
The estimated slope: β 1 = 2.077. The null hypothesis H 0 : β 1 = 0is rejected with p -value 0.000: extremely significant.
Attempted interpretation: when the market index goes up by 1%,Cisco stock goes up by 2.077% on average. However the errorterm ε in the model is large with the estimated σ = 3.08%.
The p -value for testing H 0 : β 0 = 0 is 0.815, so we cannot rejectthe hypothesis β 0 = 0. Recall β 0 = y − β 1x and both y and x arevery close to 0.
There are many standardised residual values ≥ 2 or ≤ −2,indicating non-normal error distribution.
R 2 = 47.2% of the variation of the Cisco stock may explained bythe variation of the S&P500 index, or in other words the 47.2% of
the risk in the Cisco stock is the market-related risk — seeCAPM below.
26/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 27/30
p
CAPM — a simple asset pricing model in finance:
y i = β 0 + β 1x i + εi
where y i is a stock return and x i is a market return at time i .
Total risk of the stock:
1
n
n
i =1(y i − y )
2
=
1
n
n
i =1 ( y i − y )
2
+
1
n
n
i =1 (y i − y i )
2
Market-related (or systematic) risk:
1
n
n
i =1
(
y i − y )2 =
1
n β 21
n
i =1
(x i − x )2.
Firm-specific risk: 1n
ni =1(y i − y i )
2
Remark. (i) β 1 measures the market-related (or systematic) risk of
the stock. 27/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 28/30
(ii) Market-related risk is unavoidable, while firm-specific risk may
be “diversified away” through hedging.
(iii) Variance is a simple and one of the most frequently usedmeasure for risk in finance.
To plot the data with fitted regression line together withconfidence bounds for E (y ) and predictive bounds for y :
MTB > Fitline c3 c2;
SUBC> gfourpack;
SUBC> confidence 95;SUBC> ci;
SUBC> pi.
28/30
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 29/30
Workshop 19
7/31/2019 Lecture36 2012 Full
http://slidepdf.com/reader/full/lecture36-2012-full 30/30
Top-left panel: points below the line in the top-right corner, above
the line in the bottom-left corner — the residual distribution hasheavier tails than N (0, σ2).
30/30