10
Multiple Regression Multiple Regression Analysis Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables we get an equation Y = α + β 1 X 1 + β 2 X 2. . It is customary to write it as Y = β 0 1 X 1 + β 2 X 2 As an example, if a hypotensive agent is administered prior to surgery, recovery time for blood pressure to normal value will depend on the dose of the hypotensive and the blood pressure during surgery.

Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Embed Size (px)

Citation preview

Page 1: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Multiple Regression Multiple Regression AnalysisAnalysis

The principles of Simple Regression Analysis can be extended to two or more explanatory variables.

With two explanatory variables we get an equation

Y = α + β1X1 + β2X2. . It is customary to write it as Y = β0 +β1X1 + β2X2

As an example, if a hypotensive agent is administered prior to surgery, recovery time for blood pressure to normal value will depend on the dose of the hypotensive and the blood pressure during surgery.

This can be modelled as Recovery time = log dose – Surgery B.P.

Page 2: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Categorical Explanatory Categorical Explanatory VariablesVariables

Binary variables are coded 0, 1. For Binary variables are coded 0, 1. For example a binary variable example a binary variable xx11(‘Gender’) is coded male = 0, (‘Gender’) is coded male = 0, female = 1.female = 1.

Page 3: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Recovery time for Blood Recovery time for Blood Pressure and dose of Pressure and dose of

hypotensivehypotensive

6.55.54.53.52.5

70

60

50

40

30

20

10

0

Logdose

Re

cvT

ime

S = 14.7103 R-Sq = 15.5 % R-Sq(adj) = 13.8 %

RecvTime = -14.2576 + 8.00772 Logdose

95% CI

Regression

Recovery time for Blood Pressure and dose of hypotensiveThe scatter plot shows a linear relationship. Blood Pressure takes longer to come back to normal value the larger the dose of the hypotensive.

There are many outliers because of individual variability of subjects and because of different types of surgical operations.

Page 4: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Recovery time for Blood Recovery time for Blood Pressure and lowest Blood Pressure and lowest Blood Pressure reading during Pressure reading during

surgerysurgery

9080706050

70

60

50

40

30

20

10

0

BpsurgR

ecv

Tim

e

S = 15.9386 R-Sq = 0.8 % R-Sq(adj) = 0.0 %

RecvTime = 34.4692 - 0.183546 Bpsurg

95% CI

Regression

Recovery time for Blood Pressure and lowest B.P. reading during surgery

The lower the blood pressure achieved during surgery the longer the time for it to reach normal value during recovery from anaesthesia

Page 5: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Multiple Regression Multiple Regression AnalysisAnalysis

The effects of the two explanatory variables acting jointly is described by the equation

Recov. Time = 22.3 + 10.6 Log dose – 0.740 Surg. B.P.

As noted on the scatter plots several observations had outliers or larger than expected X values.

Page 6: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Categorical Explanatory Categorical Explanatory VariablesVariables

Binary variables are coded 0, 1. For example a variable xBinary variables are coded 0, 1. For example a variable x11 (Gender) is coded (Gender) is coded

male = 0 female = 1. Then in the regression equationmale = 0 female = 1. Then in the regression equationY = Y = ββ00 + + ββ11xx1 1 + + ββ22xx2 2 when xwhen x11 = 1 the value of Y indicates what is obtained for female = 1 the value of Y indicates what is obtained for female gender; and when xgender; and when x11 = 0 the value of Y indicates what is obtained for males. = 0 the value of Y indicates what is obtained for males.

If we have a nominal variable with more than two categories we have to create a If we have a nominal variable with more than two categories we have to create a number of new number of new dummydummy (also called (also called indicatorindicator) binary variables ) binary variables

Page 7: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

How many Explanatory How many Explanatory Variables?Variables?

As a rule of thumb multiple As a rule of thumb multiple regression analysis should not be regression analysis should not be performed if the total number of performed if the total number of variables is greater than the number variables is greater than the number of of

subjects subjects ÷ 10.÷ 10.

Page 8: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

AnalysisAnalysis

In the computer output look for:In the computer output look for:

Adjusted RAdjusted R22. It represents the proportion of . It represents the proportion of variability of Y explained by the X’s. R2 is variability of Y explained by the X’s. R2 is adjusted so that models with different number of adjusted so that models with different number of variables can be compared.variables can be compared.

The The F-F-test in the ANOVA table. Significant F test in the ANOVA table. Significant F indicates a linear relationship between Y and at indicates a linear relationship between Y and at least one of the X’s.least one of the X’s.

The The t-t-test of each partial regression coefficient. test of each partial regression coefficient. SignificantSignificant t t indicates that the variable in indicates that the variable in question influences the Y response while question influences the Y response while controlling for other explanatory variables.controlling for other explanatory variables.

Page 9: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Usefulness of Scatter Usefulness of Scatter Plots - IPlots - I

The scatter plot on the The scatter plot on the right illustrates the right illustrates the relationship between relationship between water hardness and water hardness and mortality in 61 large mortality in 61 large towns in England and towns in England and Wales.Wales.

The regression line The regression line indicates inverse indicates inverse relationship between relationship between water hardness and water hardness and mortality rates.mortality rates.

140120100 80 60 40 20 0

2000

1500

1000

CalciumM

ort

al

S = 143.029 R-Sq = 42.9 % R-Sq(adj) = 41.9 %

Mortal = 1676.36 - 3.22609 Calcium

95% CI

Regression

Motality and Water Hardness

Page 10: Multiple Regression Analysis The principles of Simple Regression Analysis can be extended to two or more explanatory variables. With two explanatory variables

Usefulness of Scatter Usefulness of Scatter Plots - IIPlots - II

0

50

100

1stQtr

3rdQtr

EastWestNorth

100 90 80 70 60 50 40 30 20 10 0

2000

1900

1800

1700

1600

1500

1400

CalciumN

Mort

alN

S = 129.209 R-Sq = 13.6 % R-Sq(adj) = 11.0 %

MortalN = 1692.31 - 1.93134 CalciumN

95% CI

Regression

Motality and Water Hardness in Towns in the North

The inverse relationship between water hardness is The inverse relationship between water hardness is till maintained. Buttill maintained. But

For towns in the North the regression line is less For towns in the North the regression line is less steep than for towns in the South indicating that steep than for towns in the South indicating that other causes of mortality are stronger in the North other causes of mortality are stronger in the North compared to the South.compared to the South.

140120100 80 60 40 20 0

1600

1500

1400

1300

1200

1100

CalciumS

Mo

rta

lS

S = 114.297 R-Sq = 36.3 % R-Sq(adj) = 33.6 %

MortalS = 1522.82 - 2.09272 CalciumS

95% CI

Regression

Motality and Water Hardness in Towns in the South