View
225
Download
0
Category
Tags:
Preview:
Citation preview
Lecture 3-3Lecture 3-3
Summarizing Summarizing
rr relationships relationships among variablesamong variables
© 1
Topics covered in this Topics covered in this lecture notelecture note
We will cover several topics about ordinary least square estimation.
1. Testing the statistical significance of the estimated coefficient using t-statistics (i.e., testing whether advertisement spending has any effect on revenue).
2. Ordinary Least Square estimation when there are more explanatory variables.
3. An introduction to panel data (repeated observations over time)
2
1. Testing the statistical significance 1. Testing the statistical significance of the estimated coefficient: Exampleof the estimated coefficient: Example
Advertisement and revenue Product II
y = 13.451x + 15440
0
5000
10000
15000
20000
25000
30000
35000
0 20 40 60 80 100 120
Advertisement spending in 1000 yen
Revenue in 1
000 y
en
•The graph above shows a relationship between advertisement spending and revenue along with the estimated linear equation.
•The estimated slope coefficient is 13.4. This means that every 1000 yen you spend on advertisement, revenue increases by 13.4 thousand yen. Next Page 3
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontdAdvertisement and revenue Product II
y = 13.451x + 15440
0
5000
10000
15000
20000
25000
30000
35000
0 20 40 60 80 100 120
Advertisement spending in 1000 yen
Revenue in 1
000 y
en
However, the graph also seems to indicate that there is not much relationship between advertisement spending and revenue.
When we estimate a linear equation, we typically would like to know if advertisement has any effect on the revenue. To answer such a question, just estimating β0 and β1 is not enough. We need more information. 4
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontdAdvertisement and revenue Product II
y = 13.451x + 15440
0
5000
10000
15000
20000
25000
30000
35000
0 20 40 60 80 100 120
Advertisement spending in 1000 yen
Revenue in 1
000 y
en
The following slides describe the procedure to answer the following question: “Would the advertisement have any impact on the revenue?”
5
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontd
To test if advertisement spending has any impact on the revenue, we need to test whether the slope coefficient is “significantly” different from zero.
1. If the slope coefficient is significantly different from zero, we may conclude that advertisement spending has some effect on the revenue.
2. If the slope coefficient is not significantly different from zero, we may conclude that advertisement spending has no effect on the revenue.
Then, what would be the criterion to decide whether the slope coefficient is “significantly” different from zero?
See next slide
6
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontd
To decide whether the slope coefficient is significantly different from zero, we use “t-statistic”.
OLS estimation procedure estimates much more than β0 andβ1 , also it includes t-statistic. Now, we will obtain some of extra information from OLS estimation using Excel.
7
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontd
Open Data set “OLS Exercise 2-Advertisement and Revenue”.
This is the data set used to produce the graph in the previous slides.
Now, use “Data Analysis” to estimate the following Model
(Revenue)= β0+β1(Advertisement Spending)
8
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Example, the estimated coefficient: Example,
contdcontd
• The table above is the result of OLS regression.
1. Intercept Coefficient (β0)=15440.18
2. Slope Coefficient(β1)=13.45
3. We have some extra information, such as standard error and t statistic (t-Stat in the table). These are pieces of information needed to test whether slope coefficient is significantly different from zero. See next slides
Coefficie
ntsStandard
Errort Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept15440.1
82796.81
1
5.520639
5.87E-05
9478.923
21401.45
9478.923
21401.45
Advertisement Spending
13.45107
60.32826
0.222965
0.826571
-115.13
6
142.0377
-115.13
6
142.0377
9
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example
-Standard Error- -Standard Error-
Since data contain a lot of noise (unexpected rises and falls in revenue, etc), the effect of advertisement on revenue (β1) is estimated with some error.
Standard errors show the expected error in the estimation of the coefficients. Next Slides
Coefficie
ntsStandard
Errort Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept15440.1
82796.81
1
5.520639
5.87E-05
9478.923
21401.45
9478.923
21401.45
Advertisement Spending
13.45107
60.32826
0.222965
0.826571
-115.13
6
142.0377
-115.13
6
142.0377
10
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example
-Standard Error, contd- -Standard Error, contd-
For example, the standard error for the slope coefficient is 60.3. This means that there would be an error in the estimate of the slope coefficient (β1) of about ± 60.3 on average.
Thus, the smaller the standard error for (β1) is , the more precise the estimate of the impact of advertisement is.
Coefficien
tsStandard
Errort Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept15440.1
82796.81
1
5.520639
5.87E-05
9478.923
21401.45
9478.923
21401.45
Advertisement Spending
13.45107
60.32826
0.222965
0.826571
-115.13
6
142.0377
-115.13
6
142.0377
11
Testing the statistical significance of Testing the statistical significance of the estimated coefficient: Examplethe estimated coefficient: Example
-t statistic- -t statistic-
•t-statistic is obtained by dividing the coefficient by its standard error. For example, t-statistic for the slope coefficient is
13.45107/60.32825=0.222965
•Our confidence that the advertisement spending has some impact on revenue increases if t-statistic increases (because this happens when the standard error decreases or the coefficient increases)
•We use t-statistic to test whether the slope coefficient is significantly different from zero.
Coefficie
ntsStandard
Errort Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept15440.1
82796.81
1
5.520639
5.87E-05
9478.923
21401.45
9478.923
21401.45
Advertisement Spending
13.45107
60.32826
0.222965
0.826571
-115.13
6
142.0377
-115.13
6
142.0377
12
The procedure to test the statistical The procedure to test the statistical significance of the estimated significance of the estimated
coefficientcoefficient
The following is the procedure to test if a coefficient is significantly different from zero.
1. Obtain t-statistic2. Check if the absolute value of the t-statistic is
greater than or equal to 2 (that is, t-stat≤‒2 or t-stat≥+2)
3. If the absolute value of the t-statistic is greater than (or equal to) 2, the coefficient is statistically significantly different from zero
4. If the absolute value of the t-statistic is smaller than 2, then the coefficient is not statistically significantly different from zero
13
A note on the test of statistical A note on the test of statistical significance of the estimated significance of the estimated
coefficient 1coefficient 1When the coefficient is statistically
significantly different from zero, we simply say “the coefficient is statistically significant”.
1.If the coefficient is statistically significant, we conclude that the advertisement spending has some impact on the revenue.
2.If the coefficient is not statistically significant, we concluded that the advertisement spending has no impact on the revenue.
14
A note on the test of statistical A note on the test of statistical significance of the estimated significance of the estimated
coefficient 2 (Optional)coefficient 2 (Optional)
The criterion value for t-statistic that we used for testing the statistical significance was 2. More precisely speaking, this criterion value depends on the number of observations and the number of parameters to be estimated. This topic will be discussed more in detail later in the class. When you use the criterion value of 2, roughly speaking, you are testing the statistical significance of the slope coefficient at the 5% significance level.
15
ExerciseExercise Exercise 1: Open data “Statistical
Significance Exercise”. Use Product A data to estimate the effect of promotion on the revenue by estimating the following model. Pay particular attention to the statistical significance of the slope coefficient.
(Revenue)=β0+β1(Number of promotion) Exercise 2: Use data “Statistical
Significance Exercise”. Use Product C data to estimate the same model.
16
Exercise 1 AnswerExercise 1 Answer
The estimated effect of the promotion on the revenue is 99060.15, with t-statistic equal to 5.07. Since t-statistic is greater than 2, we conclude that the effect of the promotion on the revenue is statistically significant. Given the statistical significance of the coefficient, the estimated slope coefficient of 99060 indicates that, if we increase the number of promotion by one, the revenue is likely to increase by 99060 yen.
Product A
Coefficients
Standard Error
t Stat P-valueLower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
105827.1
254311.3
0.416132
0.689775
-495524707177.
8-495524
707177.8
Number of promotions
99060.15
19523.94
5.073779
0.001441
52893.37
145226.9
52893.37
145226.9
17
Exercise 2 AnswerExercise 2 Answer
The estimated effect of promotion on the revenue is -11751.1 with t-statistic equal to -1.3. Since the absolute value of t-statistic is smaller than 2, we conclude that the slope coefficient is not statistically significant. In other word, we did not find evidence that promotion has any impact on the revenue from the product C.
Product C
Coefficients
Standard Error
t Stat P-valueLower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
341540.1
111203.4
3.07131
0.018034
78585.82
604494.4
78585.82
604494.4
Number of promotions
-11751.
1
8970.74
-1.3099
3
0.231567
-32963.
5
9461.373
-32963.
5
9461.373
18
2. OLS with multiple 2. OLS with multiple explanatory variablesexplanatory variables
IntroductionIntroduction
So far, we have considered a model with only one explanatory variable.
Y=β0+β1X
Often, we have more than one explanatory variable. For example, in addition to promotion, the company may increase the number of sales persons. If we have data about the number of sales persons, we can also incorporate such a variable.
19
OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-
Suppose you are considering to pursue more education (going to graduate school, etc). Then you may want to know if this is worth your effort.
20
OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-
To investigate by how much the extra education increases your future salary we can utilize OLS regression.
Open data “Returns on education”. This data contain three variables. These are data collected for 935 persons. For each person, data contain information about weekly wage in dollars, number of years of education, and number of years of work experience.
As an exercise, find the mean, variance and standard deviation for the three variables.
21
OLS with multiple regressionOLS with multiple regression-Example: Returns on Education--Example: Returns on Education-
To investigate the effect of education on wage, we may estimate the OLS regression: (wage)=β0+β1(education).
However, wage is affected not only by education, but also the number of years of work experience. Therefore, it seems better to incorporate “work experience” in the model.
The simplest way to incorporate experience in the model is the following:
(wage)=β0+β1(education)+β2(experience)
Notice, that this OLS equation has two explanatory variables on the right hand side of the equation.
22
OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-
Excel estimates coefficients β0, β1 and β2 automatically
(wage)=β0+β1(education)+β2(experience)
The estimated β1 is the effect of education on wage, holding experience constant. This is the big advantage of OLS with multiple explanatory variables. When we look at data, education and experience vary at the same time, so it is difficult see the effect of education separately from the effect of experience just by looking at the data. By incorporating these two variables we can separate the effect of experience from the effect of education.
Exercise: Estimate the model above using Excel.23
OLS with multiple regressorsOLS with multiple regressors-Example: Returns on Education--Example: Returns on Education-
•Estimated β0=-272.5, β1=76.2 and β2=17.6
•Also notice that t-statistic for β1 is 12.1, which is bigger than 2. Therefore, the estimated β1 is statistically significant. Therefore, education does have an impact on wage.
•Given the statistical significance of β1, we can say that, holding experience constant, increasing the year of education by one year would increase the weekly wage by $76.2.
•This also means that if you go to graduate school for 2 years, your annual salary would increase by $76.2*(52 weeks)*(2 years)=$7924.8
Coefficien
tsStandard Error t Stat P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept-
272.528107.262709
4-2.54075
0.01122266
-483.0
32
-62.02
34
-483.0
32
-62.02
34
education (in years)
76.21639
6.296603998
12.104361.98778E-31
63.85922
88.57355
63.85922
88.57355
work experience in years)
17.63777
3.1617754 5.5784393.18016E-08
11.43275
23.84279
11.43275
23.84279
24
Exercise 2Exercise 2 Open Data “Returns on education 2”
This is the same data set as “Returns on education 1”, except that it has more variables. This data set contains information about the age of the person, and IQ test score of the person.
Exercise: Add IQ to the model. Does this change the results?
25
OLS with multiple variables: OLS with multiple variables: ApplicationApplication
-Making a model more -Making a model more flexible-flexible-
When you specify a model for OLS estimation, the first criterion is the simplicity.
(Revenue)=β0+β1(Promotion)
Such a simple equation gives a clear idea of the effect of promotion on revenue.
However, simplicity comes with a cost: It is often not flexible.
26
OLS with multiple variables: OLS with multiple variables: ApplicationApplication
-Making a model more flexible--Making a model more flexible-
The model implicitly assumes that the effect of increasing the number of promotion by one does not change revenue. That is, the model assumes that the effect of increasing the number of promotion from 10 to 11 is the same as the effect of increasing the number of promotion from 40 to 41.
However, it is reasonable to think that the effect of promotion would diminish due to the law of diminishing marginal return.
See the next example.
27
-Making a model more flexible. -Making a model more flexible. An exampleAn example
Open the data set “Making a model more flexible”. This data show the relationship between number of promotion and revenue for product D.
Plot the relationship between the number of promotion and revenue, then describe the relationship.
28
-Making a model more flexible: -Making a model more flexible: An exampleAn example
Product D: Number of promotions and revenue
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
0 5 10 15 20 25 30 35 40 45
Number of promotions
Reve
nue in y
en
•The relationship seems to be a curve, not a straight line.
•The effectiveness of promotion seems to be diminishing as the number of promotion increases.
•How do we incorporate the“diminishing effectiveness” of promotion in the model?
29
-Making a model more flexible: -Making a model more flexible: An example-An example-
To incorporate the “diminishing effectiveness” in the model we need to specify the model that can “curve”.
A simple way to achieve this is to estimate the following model:
(Revenue)=β0+β1(Number of promotion)
+β2(Number of promotion)2
30
-Making a model more flexible:-Making a model more flexible: Exercise- Exercise-
Use the data “Making a model more flexible” and estimate the following model:
(Revenue)=β0+β1(Number of promotion)
+β2(Number of promotion)2
31
Exercise: AnswerExercise: Answer
•The estimated equation is
(Revenue)=-295299.7+181554.72(Number of promotion)
‒2629.38(Number of promotion)2
•Note the both β1 and β2 are statistically significant.
Coefficien
tsStandard Error t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept-
295299.7
166846.4598
-1.7698888
7
0.093683
-64583
1
55231.71
-64583155231.7
1
Number of promotions
181554.72
16497.65368
11.00488133
2.01E-09
146894.4
216215146894.
4216215
(Number of promotion)^2
-2629.83
8
359.2650349
-7.3200508 8.48E-07
-3384.
63
-1875.05
-3384.63
-1875.05
32
More exercisesMore exercises
Exercise 1: Using the estimated equation compute “predicted” revenue for each observation.
Exercise 2: Now plot the predicted revenue and the number of promotions. Also plot the actual revenue and promotions, on the same graph. See how well the model predicts the outcome.
33
More exercisesMore exercises
Exercise 3: Using the estimated results, compute the expected increases in revenue when you increase the number of promotion from 10 to 11, and 25 to 26.
34
OLS with multiple variables:OLS with multiple variables:Application 2Application 2
-Dummy Variables--Dummy Variables-
Often, our data contain qualitative variables. For example, if you have data about your clients, for each client you may have data about whether the person is male or female. Such data (about gender) is not a quantitative variable but a qualitative variable.
35
OLS with multiple variables:OLS with multiple variables:Application 2Application 2
-Dummy Variables--Dummy Variables-
However, such a qualitative variable is also important in analyzing data. For example, you would like to answer the following question: “which gender consumes more?”
36
To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”.
A dummy variable is a variable that takes 1 if a particular criterion is satisfied, and takes 0 otherwise.
If you would like to incorporate gender information in your model, create the following dummy variable:
Female =1 if the client is female =0 if the client is maleThen you can estimate(Consumer spending)=β0+β1(Number of promotion)
+β2(Female)
37
OLS with multiple variables:OLS with multiple variables:Application 2Application 2
-Dummy Variables--Dummy Variables-
A dummy variable is very versatile. Suppose you would like to know if there is any wage differentials among different races (for example between white and black), then you can use a dummy variable that takes 1 if the person is black, and 0 otherwise.
A dummy variable can be created for many other occasions. The use of a dummy variable is one of the most important techniques in regression analysis.
38
Dummy variable exerciseDummy variable exercise Open Data. “Dummy variable Exercise”. This
data set contains three dummy variables.Black =1 if the person is black =0 otherwiseMarried =1 if the person is married =0 otherwiseSouth =1 if the person lives in South of USA =0 otherwiseUrban =1 if the person lives in urban area =0 otherwise.
39
Dummy variable exerciseDummy variable exercise
Exercise 1: Estimate the following model:
(Wage)=β0+β1(Education)+β2(Experience)
+β3(Age)+ β4(IQ) +β5(Black)
Then interpret the results. 40
Dummy variable Dummy variable exerciseexercise 、、 AnswerAnswer
The coefficient for the dummy variable for black person is -124.6. The t-statistic is -3.19;the absolute value of t-statistic is greater than 2. Therefore, the coefficient is statistically significant. The results indicate that, holding education, experience, age, and IQ constant, the weekly wage is lower for a black person by $124.6. There seems to exist a large wage gap among white and black races.
Coefficien
tsStandard
Errort Stat P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept -726.121 165.4365 -4.38912 1.27E-05 -1050.79 -401.448 -1050.79 -401.448
education 52.70889 7.266112 7.25407 8.52E-13 38.44899 66.96879 38.44899 66.96879
experience 11.27217 3.699921 3.046597 0.00238 4.010995 18.53334 4.010995 18.53334
age 13.38011 4.646612 2.879541 0.004074 4.261035 22.49918 4.261035 22.49918
IQ 4.119113 0.997874 4.127889 3.99E-05 2.160765 6.077462 2.160765 6.077462
black -124.653 39.04528 -3.19253 0.001458 -201.28 -48.0259 -201.28 -48.0259
41
Dummy variable:Dummy variable:More exercisesMore exercises
Use data “Dummy Variable Exercise”. Specify your own model, estimate, and interpret the results.
42
Recommended