Class 10 (And some of 9) Line fitting OLS regression Discussion of time series and panel models Project check in

Class 10(And some of 9)

Line fittingOLS regression

Discussion of time series and panel models

Project check in

Statistical Significance• Research and null hypotheses

– Hypothesis states the relationship between two variables.

– The null hypothesis state that there is NO (or a random) relationship between two variables. • H: Democracies trade more with each other

than with non-democracies.• H0: Status as a democracy is not related to

trade volume

– You are testing to reject H0 not accept H.

Types of Error

Decision based on Sample

State of Nature

H0 true H0 Untrue

Reject H0Type 1 error

(false alarm)Correct

Do not Reject H0

Correct Type 2 error

Alpha level

=.05, 5% chance of committing Type 1 error, or 95% chance of the decision to reject the null hypothesis being correct.

Causality

• In establishing causality there is a dependent variable, which you are trying to explain, and one or more independent variables that are assumed to be factors in the variation of the dependent variable.

• You need a logical model to “explain” this relationship or causality

Thinking in Models (again)

• What is a model?– Explains which elements relate to each

other and how.– Describing Relationships in a model

• Covariation – move in the same direction– Direct or Positive – Inverse or Negative– Nonlinear

• False of spurious– Control (confounding) variables

• Are you looking for the best model or testing someone else’s?

Developing models

• Where does a model come from?– From your own assessment and

observation of the problem, or from talking to others.

– From the literature.• Elements others include or consider important• Definitions of these elements • Descriptions of the “expected” relationships

among variables• Results and explanations• Sources and strategies for data• Suggestions of models or variations to be

tested in the future

Types of Models

1. Schematic

2. Symbolica) Economic growth is a function of

changes to the amount of capital (K) and changes to the amount of Labor (L).

b) G=f(K,L)

Capital

Labor

Econ Growth

The basic linear model (equation)

You can express many relationships as the linear equation:

y = a + bx, where

• y is the dependent variable• x is the independent variable• a is a constant• b is the slope of the line• For every increase of 1 in x, y changes by an amount

equal to bA perfectly linear relationship is where each change

results in exactly the same change. i.e. a strict ad valorem tariff.

Line FittingOther relationships may not be so exact.

Weight, is only to some degree a function of height.

If you take a sample of actual heights and weights, you might see something like the graph to the right. 100

120

140

160

180

200

220

60 65 70 75

Height

Weight

Source: http://www.tennessee.gov/tacir/Fiscal%20Capacity/Workshop/Regression%20Analysis%20Handout%20(Methodology%20Part%201).ppt

Line Fitting (cont.)The line is the “average” relationship described by the equation:y = a + bx+eThe difference between the line and any individual observation is the error (e).The observations that contributed to this analysis were all for heights between 5’ and 6’4”. You cannot, extrapolate the results to heights outside of those observed. The regression results are only valid for the range of actual observations.

100

120

140

160

180

200

220

60 65 70 75

Height

Weight

RegressionRegression is the method by which we find the line that best fits

the observations, i.e. has the lowest error.

Since the line describes the mean of the effects of the independent variables, by definition, the sum of the actual errors will be zero.

If you add up all of the values of the dependent variable and you add up all the values predicted by the model, the sum is the same and the sum of the negative errors (for points below the line) will exactly offset the sum of the positive errors (for points above the line).

Therefore Summing the errors would always equal zero. So,

instead, regression must find another way to measure the scale of the error. An Ordinary Least Squares (OLS) regression finds the line that results in the lowest sum of squared errors.

Multiple Regression

• What if we have multiple factors contributing to a result or a prediction?– For example basic economic theory

suggests that capital and labor contribute to economic growth.

– Hard to “see” how these two factors contribute to growth.

The multiple regression equation

Each of these factors has a separate relationship with the price of a home. The equation that describes a multiple regression relationship is:

y = a + b1L + b2K + e

This equation separates each individual independent variable from the rest, allowing each to have its own coefficient describing its relationship to the dependent variable. If Labor and Capital have the same coefficient than both contribute equally to economic growth.

In a statistics software program you will enter your dependent variable first and then your independent variables.

You will need to make sure the data and the variables conform to the assumptions of the model

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.803316

R Square 0.645316

Adjusted R Square 0.639305

Standard Error 1.788119

Observations 121

ANOVA

df SS MS F Significance F

Regression 2 686.446 343.223 107.3455 2.76E-27

Residual 118 377.2895 3.197368

Total 120 1063.735

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept -15.0184 2.064116 -7.27595 4.12E-11 -19.1059 -10.9309 -19.1059 -10.9309

ln pop2001 0.656415 0.106162 6.183167 9.31E-09 0.446186 0.866643 0.446186 0.866643

ln GDP per capita 1.490166 0.104154 14.30738 1.53E-27 1.283913 1.696418 1.283913 1.696418

How good is the model

The R2 value • tells you what proportion of differences is

explained by the model. An R2 of .68, for example, means that 68% of the variance in the observed values of the dependent variable is explained by the model, and 32% of those differences remains unexplained in the error term.

Returning to the model of economic growth… Is explaining 50% of the causes good enough?

How much should you explain?

• Random error need not be a problem.– There is always error, a larger R-square is not a goal in

and of itself.

• Some error is due to latent variables that can not be observed. – There may be additional variables that can be logically

assumed to measure these causes of variation indirectly in some way.

– But even if they empirically appear to “explain” the variation within the regression model, variables should not necessarily be added unless there appears to be a logical way in which they might explain variation in the independent variable.

Statistical Significance• Each independent variable has a “p-value” or significance

level in the results. Sometimes it is explicitly given, sometimes just the test statistic with which significance can be derived.

• The p-value is a percentage. It tells you how likely it is that the coefficient for that independent variable emerged by chance and does not describe a real relationship (type I error).

• A p-value of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real.

• It is generally accepted practice to consider variables with a p-value of less than .1 as significant, though the only basis for this cutoff is convention.

Direction and Size

Look at the signs of the B coefficients. Do they have the expected signs?

– Your model and hypothesis should give you an expectation of the direction of each independent variable’s influence.

• Is the effect large or small? – Even if it is significant and in the right

direction, does a change in the independent variable yield a large or small change in the independent variable or vice versa?

F-Test

There is also a significance level for the model as a whole.

• The F-test or “Significance F” value in Excel measures the likelihood that the model as a whole describes a relationship that emerged at random, rather than a real relationship. As with the p-value, the lower the significance F value, the greater the chance that the relationships in the model are real.

Other Errors or Problems

• Multicollinearity

• Omitted Variables

• Endogeneity

• Other

Presenting Regression Results

Documents

Class 10 (And some of 9) Line fitting OLS regression Discussion of time series and panel models Project check in