23
Go to Table of Conten t Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

Embed Size (px)

Citation preview

Page 1: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

Go to Table of Content

Single Variable Regression

Farrokh Alemi, Ph.D.

Kashif Haqqi M.D.

Page 2: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

2Go to Table of Content

Additional Reading

• For additional reading see Chapter 15 and Chapter 14 in Michael R. Middleton’s Data Analysis Using Excel, Duxbury Thompson Publishers, 2000.

• Example described in this lecture is based in part on Chapter 17 and Chapter 18 of Keller and Warrack’s Statistics for Management and Economics. Fifth Edition, Duxbury Thompson Learning Publisher, 2000.

• Read any introductory statistics book about single and multiple variable regression.

Page 3: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

3Go to Table of Content

Which Approach Is Appropriate When?

• Choosing the right method for the data is the key statistical expertise that you need to have.

• You might want to review a decision tool that we have organized for you to help you in choosing the right statistical method.

Page 4: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

4Go to Table of Content

Do I Need to Know the Formulas?

• You do not need to know exact formulas.• You do need to know where they are in your

reference book.• You do need to understand the concept behind

them and the general statistical concepts imbedded in the use of the formulas.

• You do not need to be able to do correlation and regression by hand. You must be able to do it on a computer using Excel or other software.

Page 5: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

5Go to Table of Content

Table of Content

• Objectives

• Purpose of Regression

• Correlation or Regression?

• First Order Linear Model

• Probabilistic Linear Relationship

• Estimating Regression Parameters

• Assumptions

• Sum of squares• Tests• Percent of variation

explained• Example• Regression Analysis in

Excel• Normal Probability Plot• Residual Plot• Goodness of Fit• ANOVA For Regression

Page 6: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

6Go to Table of Content

Objectives

• To learn the assumptions behind and the interpretation of single and multiple variable regression.

• To use Excel to calculate regressions and test hypotheses.

Page 7: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

7Go to Table of Content

Purpose of Regression

• To determine whether values of one or more variable are related to the response variable.

• To predict the value of one variable based on the value of one or more variables.

• To test hypotheses.

Page 8: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

8Go to Table of Content

Correlation or Regression?

• Use correlation if you are interested only in whether a relationship exists.

• Use Regression if you are interested in building a mathematical model that can predict the response variable.

• Use regression if you are interested in the relative effectiveness of several variables in predicting the response variable.

Page 9: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

9Go to Table of Content

First Order Linear Model

• A deterministic mathematical model between y and x:

y = 0 + 1 * x

0 is the intercept with y axis, the point at which x = 0

1 is the angle of the line, the ratio of rise divided by the run in figure to the right. It measures the change in y for one unit of change in x.

Dependent variable xIn

de

pe

nd

en

t va

ria

ble

y

Rise

Run

Page 10: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

10Go to Table of Content

Probabilistic Linear Relationship

• But relationship between x and y is not always exact. Observations do not always fall on a straight line.

• To accommodate this, we introduce a random error term referred to as epsilon: y = 0 + 1 * x +

• The task of regression analysis then is to estimate the parameters b0 and b1 in the equation:

= b0 + b1 * xso that the difference between y and is minimizedy

y

Page 11: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

11Go to Table of Content

Estimating Regression Parameters

• Red dots show the observations

• The solid line shows the estimated regression line

• The distance between each observation and the solid line is called residual

• Minimize the sum of the squared residuals (differences between line and observations).

20

25

30

35

40

45

50

1 3 5

X

Y

Residual

Regression line

Page 12: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

12Go to Table of Content

Assumptions

• The dependent (response) variable is measured on an interval scale

• The probability distribution of the error is Normal with mean zero

• The standard deviation of error is constant and does not depend on values of x

• The error terms associated with any particular value of Y is independent of error term associated with other values of Y

Page 13: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

13Go to Table of Content

Sum of Squares

• Variation in y = SSR + SSE• MSR divided by MSE is the test statistic for

ability of regression to explain the data

Sum of square of differences between

Degrees of freedom

Regression (SSR)Predicted values and mean of observations 1

Error (SSE)Predicted values and observations n-2

Variation in YObservations and mean of observations n-1

Mean sum of square is obtained by dividing SS by degrees of freedom

Page 14: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

14Go to Table of Content

Tests

• The hypothesis that the regression equation does not explain variation in Y and can be tested using F test.

• The hypothesis that the coefficient for x is zero can be tested using t statistic.

• The hypothesis that the intercept is 0 can be tested using t statistic

Page 15: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

15Go to Table of Content

Percent of Variation Explained• R2 is the coefficient of determination.• The minimum R2 is zero. The maximum is 1.• 1- R2 is the variation left unexplained.• If Y is not related to X or related in a non-linear

fashion, then R2 will be small.• Adjusted R2 shows the value of R2 after

adjustment for degrees of freedom. It protects against having an artificially high R2 by increasing the number of variables in the model.

Page 16: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

16Go to Table of Content

Example

• Is waiting time related to satisfaction ratings?

• Predict what will happen to satisfaction ratings if waiting time reaches 15 minutes?

PatientWaiting time

Satisfaction ratings

1 9 802 7 903 5 904 6 1005 8 856 5 1007 7 858 8 75

Page 17: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

17Go to Table of Content

Regression Analysis in Excel

• Select tools

• Select data analysis

• Select regression analysis

• Identify the x and y data of equal length

• Ask for residual plots to test assumptions

• Ask for normal probability plot to test assumption

Page 18: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

18Go to Table of Content

Normal Probability Plot

• Normal Probability Plot compares the percent of errors falling in particular bins to the percentage expected from Normal distribution.

• If assumption is met then the plot should look like a straight line.

Normal Probability Plot

60

70

80

90

100

110

0 50 100

Sample Percentile

Sat

isfa

ctio

n r

atin

gs

Page 19: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

19Go to Table of Content

Residual Plot

• Tests that residuals have mean of zero and constant standard deviation

• Tests that residuals are not dependent on values of x

Waiting time Residual Plot

-10

-5

0

5

10

4 6 8 10

Waiting timeR

es

idu

als

Page 20: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

20Go to Table of Content

Linear Equation• Satisfaction = 121.3 – 4.8* Waiting time• At 15 minutes waiting time, satisfaction is predicted to be:

121.3 - 4.8 * 15 = 48.87

• The t statistic related to both the intercept and waiting time coefficient are statistically significant.

• The hypotheses that the coefficients are zero are rejected.

CoefficientsStandard

Error t Stat P-valueIntercept 121.34 10.48 11.58 0.00Waiting time -4.83 1.50 -3.23 0.02

Page 21: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

21Go to Table of Content

Goodness of Fit

• 57% of variation in satisfaction ratings is explained by the equation

• 43% of variation in satisfaction ratings is left unexplained

Regression StatisticsMultiple R 0.796902768R Square 0.635054022Adjusted R Square 0.574229692Standard Error 5.7674349Observations 8

Page 22: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

22Go to Table of Content

ANOVA For Regression• The regression model has mean sum of square of 347.• The mean sum of errors is 33. Note the error term is called

residuals in Excel.• F statistics is 10, the probability of observing this statistic

is 0.02.• The hypothesis that the MSR and MSE are equal is

rejected. Significant variation is explained by regression.

ANOVA

df SS MS FSignificance F

Regression 1 347.30 347.30 10.44 0.02Residual 6 199.58 33.26Total 7 546.88

Page 23: Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D

23Go to Table of Content

Take Home Lesson

• Regression is based on SS approach, similar to ANOVA

• Regression assumptions can be examined by looking at residuals

• Several hypotheses can be tested using regression analysis