21
Part 10: Prediction 0-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Embed Size (px)

Citation preview

Page 1: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-1/21

Econometrics IProfessor William Greene

Stern School of Business

Department of Economics

Page 2: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-2/21

Econometrics I

Part 10 - Prediction

Page 3: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-3/21

Forecasting Objective: Forecast Distinction: Ex post vs. Ex ante forecasting

Ex post: RHS data are observed Ex ante: RHS data must be forecasted

Prediction vs. model validation. Within sample prediction “Hold out sample”

Page 4: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-4/21

Prediction IntervalsGiven x0 predict y0.Two cases: Estimate E[y|x0] = x0; Predict y0 = x0 + 0 Obvious predictor, b’x0 + estimate of 0. Forecast 0 as 0, but allow for

variance. Alternative: When we predict y0 with bx0, what is the 'forecast error?' Est.y0 - y0 = bx0 - x0 - 0, so the variance of the forecast error is

x0Var[b - ]x0 + 2

How do we estimate this? Form a confidence interval. Two cases: If x0 is a vector of constants, the variance is just x0 Var[b] x0. Form

confidence interval as usual.

If x0 had to be estimated, then we use a random variable. What is the variance of the product? (Ouch!) One possibility: Use bootstrapping.

Page 5: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-5/21

Forecast Variance

Variance of the forecast error is 2 + x0’ Var[b]x0 = 2 + 2[x0’(X’X)-1x0] If the model contains a constant term, this is

In terms squares and cross products of deviations from means. Interpretation: Forecast variance is smallest in the middle of our “experience” and increases as we move outside it.

1 10 2 0 0 0

1 1

1Var[ ] 1 ( )( )( )

ZM ZK K

jkj j k k

j k

e x x x xN

Page 6: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-6/21

Butterfly Effect

Page 7: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-7/21

Internet Buzz Data

Page 8: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-8/21

A Prediction Interval

22e N 2

i 1 i

Prediction includes a range of uncertainty

ˆPoint estimate: y a bx*

The range of uncertainty around the prediction:

1 (x * x)a bx* 1.96 S 1+

N (x x)

The usual 95% Due to ε Due to estimating α and β with a and b

Page 9: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-9/21

Slightly Simpler Formula for Prediction

22 2e

Prediction includes a range of uncertainty

ˆPoint estimate: y a bx*

The range of uncertainty around the prediction:

1a bx* 1.96 S 1+ (x * x) SE(b)

N

Page 10: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-10/21

Prediction from Internet Buzz Regression

Buzz = 0.48242

Max(Buzz)= 0.79

Page 11: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-11/21

Prediction Interval for Buzz = .8

2 2 2

2 2 2

Predict Box Office for Buzz = .8

a+bx = -14.36 + 72.72(.8) = 43.82

1 s 1 (.8 Buzz) SE(b)

N

113.3863 1 (.8 .48242) 10.94

62

13.93

Interval = 43.82 1.96(13.93)

= 16.52 to

e

71.12

Page 12: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-12/21

Dummy Variable for One ObservationA dummy variable that isolates a single

observation. What does this do?Define d to be the dummy variable in question. Z = all other regressors. X = [Z,d]Multiple regression of y on X. We know that X'e = 0 where e = the column vector of residuals. That means d'e = 0, which says that

ej = 0 for that particular residual. The observation will be predicted perfectly.

Fairly important result. Important to know.

Page 13: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-13/21

Oaxaca DecompositionTwo groups, two regression models: (Two time periods,

men vs. women, two countries, etc.) y1 = X11 + 1 and y2 = X22 + 2

Consider mean values, y1* = E[y1|mean x1] = x1* 1 y2* = E[y2|mean x2] = x2* 2

Now, explain why y1* is different from y2*. (I.e., departing from y2, why is y1 different?) (Could reverse the roles of 1 and 2.)

y1* - y2* = x1* 1 - x2* 2

= x1*(1 - 2) + (x1* - x2*) 2

(change in model) (change in conditions)

Page 14: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-14/21

The Oaxaca Decomposition

1 1 1 2 2 2

1 2

1 2 1 1 2 1 2 2

Two groups (e.g., men=1, women=2)

Regression predictions:

ˆ ˆy , y (e.g., wage equations)

ˆ ˆExplain y - y .

ˆ ˆy - y ( ) + (

discrimination + qualificati

x b x b

x b -b x - x )b

2 1 2 11 1 2 1 1 1 1 2 2 2

2 2 1 2 11 1 2 1 1 1 1 2 2 2

ons

Var[ ( )]= { ( ) ( ) }

Wald: W=( ( )) / [ { ( ) ( ) } ]

What is the hypothesis?

1

1

x b -b x X X X X x

x b -b x X X X X x

Page 15: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-15/21

Application - IncomeGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of PeriodsVariables in the file areData downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice.  This is a large data set.  There are altogether 27,326 observations.  The number of observations ranges from 1 to 7.  (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987).  HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0EDUC =  years of schooling AGE = age in yearsMARRIED = 1 if married, 0 if notFEMALE = 1 if female, 0 if male

Page 16: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-16/21

Regression: Female=0 (Men)

Page 17: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-17/21

Regression Female=1 (Women)

Page 18: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-18/21

Pooled Regression

Page 19: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-19/21

Applicationnamelist ; X = one,age,educ,married,hhkids$? Get results for femalesinclude ; new ; female=1$ Subsample femalesregr ; lhs=hhninc;rhs=x$ Regressionmatrix ; bf=b ; vf = varb ; xbarf = mean(x) $ Coefficients, variance, mean Xcalc ; meanincf = bf'xbarf $ Mean prediction for females? Get results for malesinclude ; new ; female=0$ Subsample malesregr ; lhs=hhninc;rhs=x$ Regressionmatrix ; bm=b ; vm = varb ; xbarm = mean(x) $ Coefficients, etc.calc ; meanincm = bm'xbarm $ Mean prediction for males? Examine difference in mean predicted incomecalc ; list ; meanincm ; meanincf Display means ; diff = xbarm'bm - xbarf'bf $ Difference in meansmatrix ; vdiff = xbarm'[vm]xbarm + xbarf'[vf]xbarf $ Variance of differencecalc ; list ; diffwald = diff^2 / vdiff $ Wald test of difference = 0? “Discrimination” component of differencematrix ; db = bm-bf ; discrim = xbarm'db Difference in coeffs., discrimination ; vdb = vm+vf ; vdiscrim = xbarm'[vdb]xbarm $ Variance of discriminationcalc ; list ; discrim ; dwald = discrim^2 / vdiscrim $ Walt test that D = 0.? “Difference due difference in X”matrix ; dx = xbarm - xbarf $ Difference in characteristicsmatrix ; qual = dx'bf ; vqual = dx'[vf]dx $ Contribution to total differencecalc ; list ; qual ; qualwald = qual^2/vqual $ Wald test.

Page 20: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-20/21

Results

+------------------------------------+| Listed Calculator Results |+------------------------------------+MEANINCM = .359054MEANINCF = .344495DIFF = .014559DIFFWALD = 52.006502

DISCRIM = -.005693DWALD = 7.268757

QUAL = .020252QUALWALD = 1071.053640

Page 21: Part 10: Prediction 10-1/21 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 10: Prediction10-21/21

Decompositions