66
ANOVA and ANOVA and Regression Regression Brian Healy, PhD Brian Healy, PhD

ANOVA and Regression

  • Upload
    tejano

  • View
    55

  • Download
    5

Embed Size (px)

DESCRIPTION

ANOVA and Regression. Brian Healy, PhD. Objectives. ANOVA Multiple comparisons Introduction to regression Relationship to correlation/t-test. Comments from reviews. Please fill them out because I read them More examples and not just MS - PowerPoint PPT Presentation

Citation preview

Page 1: ANOVA and Regression

ANOVA and ANOVA and RegressionRegressionBrian Healy, PhDBrian Healy, PhD

Page 2: ANOVA and Regression

ObjectivesObjectives ANOVAANOVA

– Multiple comparisonsMultiple comparisons Introduction to regressionIntroduction to regression

– Relationship to correlation/t-testRelationship to correlation/t-test

Page 3: ANOVA and Regression

Comments from reviewsComments from reviews Please fill them out because I read themPlease fill them out because I read them More examples and not just MSMore examples and not just MS More depth on technical More depth on technical

details/statistical theory/equationsdetails/statistical theory/equations– First time ever!!First time ever!!– I have made slides from more in depth I have made slides from more in depth

courses available on-line so that you have courses available on-line so that you have access to formulas for t-test, ANOVA, etc.access to formulas for t-test, ANOVA, etc.

Talks too fast for non-native speakersTalks too fast for non-native speakers

Page 4: ANOVA and Regression

ReviewReview Types of dataTypes of data p-valuep-value Steps for hypothesis testSteps for hypothesis test

– How do we set up a null hypothesis?How do we set up a null hypothesis? Choosing the right testChoosing the right test

– Continuous outcome Continuous outcome variable/dichotomous explanatory variable/dichotomous explanatory variable: Two sample t-testvariable: Two sample t-test

Page 5: ANOVA and Regression

Steps for hypothesis testingSteps for hypothesis testing1)1) State null hypothesisState null hypothesis2)2) State type of data for explanatory and State type of data for explanatory and

outcome variableoutcome variable3)3) Determine appropriate statistical testDetermine appropriate statistical test4)4) State summary statisticsState summary statistics5)5) Calculate p-value (stat package)Calculate p-value (stat package)6)6) Decide whether to reject or not reject the Decide whether to reject or not reject the

null hypothesisnull hypothesis• NEVER accept nullNEVER accept null

7)7) Write conclusionWrite conclusion

Page 6: ANOVA and Regression

ExampleExample In previous class, two groups were In previous class, two groups were

compared on a continuous outcomecompared on a continuous outcome What if we have more than two groups?What if we have more than two groups? Ex. A recent study compared the Ex. A recent study compared the

intensity of structures on MRI in normal intensity of structures on MRI in normal controls, benign MS patients and controls, benign MS patients and secondary progressive MS patientssecondary progressive MS patients

Question: Is there any difference among Question: Is there any difference among these groups?these groups?

Page 7: ANOVA and Regression
Page 8: ANOVA and Regression

Two approachesTwo approaches Compare each group to each other Compare each group to each other

group using a t-testgroup using a t-test– Problem with Problem with multiple comparisonsmultiple comparisons

Complete Complete global comparisonglobal comparison to see to see if there is any differenceif there is any difference– Analysis of variance (ANOVA)Analysis of variance (ANOVA)– Good first step even if eventually Good first step even if eventually

complete pairwise comparisonscomplete pairwise comparisons

Page 9: ANOVA and Regression

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysisContinuousContinuous DichotomousDichotomous t-test, Wilcoxon t-test, Wilcoxon

testtestContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear

regressionregressionContinuousContinuous ContinuousContinuous Correlation, Correlation,

linear regressionlinear regressionDichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test,

logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

Page 10: ANOVA and Regression

Global test-ANOVAGlobal test-ANOVA As a first step, we can compare across As a first step, we can compare across

all groups at onceall groups at once The null hypothesis for ANOVA is that The null hypothesis for ANOVA is that

the means in all of the groups are equalthe means in all of the groups are equal ANOVA compares the within group ANOVA compares the within group

variance and the between group variance and the between group variancevariance– If the patients within a group are very alike If the patients within a group are very alike

and the groups are very different, the and the groups are very different, the groups are likely differentgroups are likely different

Page 11: ANOVA and Regression
Page 12: ANOVA and Regression

Hypothesis testHypothesis test1)1) HH00: mean: meannormalnormal=mean=meanBMSBMS=mean=meanSPMSSPMS2)2) Outcome variable: continuousOutcome variable: continuous

Explanatory variable: categoricalExplanatory variable: categorical3)3) Test: ANOVATest: ANOVA4)4) meanmeannormalnormal=0.41; mean=0.41; meanBMSBMS= 0.34; = 0.34;

meanmeanSPMSSPMS=0.30=0.305)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: At least one of the groups is Conclusion: At least one of the groups is

significantly different than the others significantly different than the others

Page 13: ANOVA and Regression
Page 14: ANOVA and Regression

Technical asideTechnical aside Our F-statistic is the ratio of the between group Our F-statistic is the ratio of the between group

variance and the within group variancevariance and the within group variance

This ratio of variances has a known distribution (F-This ratio of variances has a known distribution (F-distribution)distribution)

If our calculated F-statistic is high, the between If our calculated F-statistic is high, the between group variance is higher than the within group group variance is higher than the within group variance, meaning the differences between the variance, meaning the differences between the groups are not likely due to chancegroups are not likely due to chance

Therefore, the probability of the observed result or Therefore, the probability of the observed result or something more extreme will be low (low p-value)something more extreme will be low (low p-value)

1111

1

122

11

1

2

2

2

kkk

k

iii

within

between

nnsnsn

kxxn

ssF

Page 15: ANOVA and Regression

This is the distribution under the null

This small shaded region is the part of the distribution that is equal to or more extreme than the observed value.The p-value!!!

Page 16: ANOVA and Regression

Now whatNow what The question often becomes which The question often becomes which

groups are differentgroups are different Possible comparisonsPossible comparisons

– All pairsAll pairs– All groups to a specific controlAll groups to a specific control– Pre-specified comparisonsPre-specified comparisons

If we do many tests, we should If we do many tests, we should account for account for multiple comparisonsmultiple comparisons

Page 17: ANOVA and Regression

Type I errorType I error Type I error is when you reject the Type I error is when you reject the

null hypothesis even though it is true null hypothesis even though it is true ((=P(reject H=P(reject H00|H|H00 is true)) is true))

We accept making this error 5% of We accept making this error 5% of the timethe time

If we run a large experiment with 100 If we run a large experiment with 100 tests and the null hypothesis was tests and the null hypothesis was true in each case, how many times true in each case, how many times would we expect to reject the null?would we expect to reject the null?

Page 18: ANOVA and Regression

Multiple comparisonsMultiple comparisons For this problem, three comparisonsFor this problem, three comparisons

– NC vs. BMS; NC vs. SPMS; BMS vs. SPMSNC vs. BMS; NC vs. SPMS; BMS vs. SPMS If we complete each test at the 0.05 level, If we complete each test at the 0.05 level,

what is the chance that we make a type I what is the chance that we make a type I error? error? – P(reject at least 1 | HP(reject at least 1 | H00 is true) is true) = = – P(reject at least 1 | HP(reject at least 1 | H00 is true) is true) = 1- = 1- P(fail to reject P(fail to reject

all three| Hall three| H00 is true) is true) = 1-0.95= 1-0.9533 = 0.143 = 0.143 Inflated type I error rateInflated type I error rate Can correct p-value for each test to maintain Can correct p-value for each test to maintain

experiment type I errorexperiment type I error

Page 19: ANOVA and Regression

Bonferroni correctionBonferroni correction The The Bonferroni correctionBonferroni correction multiples all p- multiples all p-

values by the number of comparisons values by the number of comparisons completedcompleted– In our experiment, there were 3 comparisons, so In our experiment, there were 3 comparisons, so

we multiply by 3we multiply by 3– Any p-value that remains less than 0.05 is Any p-value that remains less than 0.05 is

significant significant The Bonferroni correction is conservative (it is The Bonferroni correction is conservative (it is

more difficult to obtain a significant result more difficult to obtain a significant result than it should be), but it is an extremely easy than it should be), but it is an extremely easy way to account for multiple comparisons.way to account for multiple comparisons.– Can be very harsh correction with many testsCan be very harsh correction with many tests

Page 20: ANOVA and Regression

Other correctionsOther corrections All pairwise comparisonsAll pairwise comparisons

– Tukey’s testTukey’s test All groups to a controlAll groups to a control

– Dunnett’s testDunnett’s test MANY othersMANY others False discovery rateFalse discovery rate

Page 21: ANOVA and Regression

ExampleExample For our three-group comparison, we For our three-group comparison, we

compare each and get the following results compare each and get the following results from Tukey’s testfrom Tukey’s test

GroupsGroups Mean Mean diffdiff

p-valuep-value SignificaSignificantnt

NC vs. BMSNC vs. BMS 0.0750.075 0.100.10NC vs. SPMSNC vs. SPMS 0.1140.114 0.0120.012 **BMS vs. BMS vs. SPMSSPMS

0.0390.039 0.600.60

Page 22: ANOVA and Regression
Page 23: ANOVA and Regression

Questions to ask yourselfQuestions to ask yourself What is the null hypothesis?What is the null hypothesis? We would like to test the null We would like to test the null

hypothesis at the 0.05 levelhypothesis at the 0.05 level If well defined prior to the experiment, If well defined prior to the experiment,

the correction for multiple comparison the correction for multiple comparison if necessary will be clearif necessary will be clear

Hypothesis generating vs. Hypothesis generating vs. hypothesis testinghypothesis testing

Page 24: ANOVA and Regression

ConclusionsConclusions If you are doing a multiple group If you are doing a multiple group

comparison, always specify before the comparison, always specify before the experiment which comparisons are of experiment which comparisons are of interest if possibleinterest if possible

If the null hypothesis is that all the groups If the null hypothesis is that all the groups are the same, test global null using ANOVAare the same, test global null using ANOVA

Complete appropriate additional Complete appropriate additional comparisons with corrections if necessarycomparisons with corrections if necessary

No single right answer for every situationNo single right answer for every situation

Page 25: ANOVA and Regression

Types of analysis-independent Types of analysis-independent samplessamples

OutcomeOutcome ExplanatoryExplanatory AnalysisAnalysisContinuousContinuous DichotomousDichotomous t-test, Wilcoxon t-test, Wilcoxon

testtestContinuousContinuous CategoricalCategorical ANOVA, linear ANOVA, linear

regressionregressionContinuousContinuous ContinuousContinuous Correlation, Correlation,

linear regressionlinear regressionDichotomousDichotomous DichotomousDichotomous Chi-square test, Chi-square test,

logistic logistic regressionregression

DichotomousDichotomous ContinuousContinuous Logistic Logistic regressionregression

Time to eventTime to event DichotomousDichotomous Log-rank testLog-rank test

Page 26: ANOVA and Regression

CorrelationCorrelation Is there a linear Is there a linear

relationship relationship between IL-10 between IL-10 expression and IL-6 expression and IL-6 expression? expression?

The best graphical The best graphical display for this display for this data is a scatter data is a scatter plotplot

Page 27: ANOVA and Regression

CorrelationCorrelation DefinitionDefinition: the degree to which two : the degree to which two

continuous variables are linearly relatedcontinuous variables are linearly related– Positive correlation- As one variable goes up, the Positive correlation- As one variable goes up, the

other goes up (positive slope)other goes up (positive slope)– Negative correlation- As one variable goes up, the Negative correlation- As one variable goes up, the

other goes down (negative slope)other goes down (negative slope) Correlation (Correlation () ranges from -1 (perfect ) ranges from -1 (perfect

negative correlation) to 1 (perfect positive negative correlation) to 1 (perfect positive correlation)correlation)

A correlation of 0 means that there is no linear A correlation of 0 means that there is no linear relationship between the two variablesrelationship between the two variables

Page 28: ANOVA and Regression

Positive correlation

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Negative correlation

0

2

4

6

8

10

12

0 2 4 6 8 10 12

No correlation

0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8 10 12

No correlation (quadratic)

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10

Page 29: ANOVA and Regression

Hypothesis testHypothesis test1)1) HH00: correlation between IL-10 expression and : correlation between IL-10 expression and

IL-6 expression=0IL-6 expression=02)2) Outcome variable: IL-6 expression- Outcome variable: IL-6 expression-

continuouscontinuousExplanatory variable: IL-10 expression- Explanatory variable: IL-10 expression- continuouscontinuous

3)3) Test: correlationTest: correlation4)4) Summary statistic: correlation=0.51Summary statistic: correlation=0.515)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: A statistically significant Conclusion: A statistically significant

correlation was observed between the two correlation was observed between the two variables variables

Page 30: ANOVA and Regression
Page 31: ANOVA and Regression

Technical aside-correlationTechnical aside-correlation The formal definition of the correlation is given by: The formal definition of the correlation is given by:

Note that this is dimensionless quantity Note that this is dimensionless quantity This equation shows that if the covariance between This equation shows that if the covariance between

the two variables is the same as the variance in the the two variables is the same as the variance in the two variables, we have perfect correlation because two variables, we have perfect correlation because all of the variability in x and y is explained by how all of the variability in x and y is explained by how the two variables change togetherthe two variables change together

)()(),(),(yVarxVar

yxCovyxCorr

Page 32: ANOVA and Regression

How can we estimate the How can we estimate the correlation?correlation?

The most common estimator of the correlation is The most common estimator of the correlation is the the Pearson’s correlation coefficientPearson’s correlation coefficient, given by: , given by:

This is a estimate that requires both x and y are This is a estimate that requires both x and y are normally distributed. Since we use the mean in the normally distributed. Since we use the mean in the calculation, the estimate is sensitive to outliers.calculation, the estimate is sensitive to outliers.

n

ii

n

ii

n

iii

yyxx

yyxxr

1

2

1

2

1

Page 33: ANOVA and Regression

Distribution of the test Distribution of the test statisticstatistic

The standard error of the sample The standard error of the sample correlation coefficient is given bycorrelation coefficient is given by

The resulting distribution of the test The resulting distribution of the test statistic is a t-distribution with n-2 degrees statistic is a t-distribution with n-2 degrees of freedom where n is the number of of freedom where n is the number of patients (not the number of measurements)patients (not the number of measurements)

21)(ˆ

2

nrres

22 12

21

0r

nr

nr

rt

Page 34: ANOVA and Regression

Regression-Everything in one Regression-Everything in one placeplace

All analyses we have done to this All analyses we have done to this point can be completed using point can be completed using regression!!!regression!!!

Page 35: ANOVA and Regression

Quick math reviewQuick math review As you remember, As you remember,

the equation of a the equation of a line is line is y=mx+by=mx+b

FFor every one unit or every one unit increase in x, there increase in x, there is an m unit is an m unit increase in yincrease in y

bb is the value of y is the value of y when x is equal to when x is equal to zerozero

Line

y = 1.5x + 4

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12

Page 36: ANOVA and Regression

PicturePicture Does there seem to Does there seem to

be a linear be a linear relationship in the relationship in the data?data?

Is the data Is the data perfectly linear?perfectly linear?

Could we fit a line Could we fit a line to this data?to this data?

0

5

10

15

20

25

0 2 4 6 8 10 12

Page 37: ANOVA and Regression

How do we find the best How do we find the best line?line?

Linear regression Linear regression tries to find the tries to find the best line (curve) to best line (curve) to fit the data Let’s fit the data Let’s look at three look at three candidate linescandidate lines

Which do you think Which do you think is the best?is the best?

What is a way to What is a way to determine the best determine the best line to use?line to use?

Page 38: ANOVA and Regression

What is linear regression?What is linear regression? The method of The method of

finding the best finding the best line (curve) is least line (curve) is least squares, which squares, which minimizes the minimizes the distance from the distance from the line for each of line for each of points points

The equation of the The equation of the line is y=1.5x + 4line is y=1.5x + 4

y = 1.5x + 4

0

5

10

15

20

25

0 2 4 6 8 10 12

Page 39: ANOVA and Regression

ExampleExample For our investigation of the For our investigation of the

relationship between IL-10 relationship between IL-10 and IL-6, we can set up a and IL-6, we can set up a regression equationregression equation

is the expression of IL-6 is the expression of IL-6 when IL-10=0 (intercept)when IL-10=0 (intercept)

is the change in IL-6 for is the change in IL-6 for every 1 unit increase in IL-10 every 1 unit increase in IL-10 (slope)(slope)

ii is the residual from the is the residual from the lineline

iii ILIL 10*6 10

Page 40: ANOVA and Regression

The final regression equation is The final regression equation is

The coefficients mean The coefficients mean – the estimate of the mean expression of IL-6 the estimate of the mean expression of IL-6

for a patient with IL-10 expression=0 (for a patient with IL-10 expression=0 (00))– an increase of one unit in IL-10 expression an increase of one unit in IL-10 expression

leads to an estimated increase of 0.63 in the leads to an estimated increase of 0.63 in the mean expression of IL-6 (mean expression of IL-6 (11))

10*63.04.266̂ ILIL

Page 41: ANOVA and Regression

Tough questionTough question In our correlation hypothesis test, we In our correlation hypothesis test, we

wanted to know if there was an association wanted to know if there was an association between the two measuresbetween the two measures

If there was no relationship between IL-10 If there was no relationship between IL-10 and IL-6 in our system, what would happen and IL-6 in our system, what would happen to our regression equation?to our regression equation?– No effect means that the change in IL-6 is not No effect means that the change in IL-6 is not

related to the change in IL-10related to the change in IL-10– 11=0=0

Is Is 11 significantly different than zero? significantly different than zero?

Page 42: ANOVA and Regression

Hypothesis testHypothesis test1)1) HH00: no relationship between IL-6 : no relationship between IL-6

expression and IL-10 expression, expression and IL-10 expression, 11 =0 =02)2) Outcome variable: IL-6- continuousOutcome variable: IL-6- continuous

Explanatory variable: IL-10- continuousExplanatory variable: IL-10- continuous3)3) Test: linear regressionTest: linear regression4)4) Summary statistic: Summary statistic: 11 = 0.63 = 0.635)5) Results: p=0.011Results: p=0.0116)6) Reject null hypothesisReject null hypothesis7)7) Conclusion: A significant correlation was Conclusion: A significant correlation was

observed between the two variables observed between the two variables

Page 43: ANOVA and Regression
Page 44: ANOVA and Regression

Wait a second!!Wait a second!! Let’s check somethingLet’s check something

– p-value from correlation analysis = 0.011p-value from correlation analysis = 0.011– p-value from regression analysis = 0.011p-value from regression analysis = 0.011– They are the same!!They are the same!!

Regression leads to same conclusion as Regression leads to same conclusion as correlation analysiscorrelation analysis

Other similarities as well from modelsOther similarities as well from models

Page 45: ANOVA and Regression

Technical aside-Estimates of Technical aside-Estimates of regression coefficientsregression coefficients

Once we have solved the least squares Once we have solved the least squares equation, we obtain estimates for the equation, we obtain estimates for the ’s, ’s, which we refer to as which we refer to as

To test if this estimate is significantly To test if this estimate is significantly different than 0, we use the following different than 0, we use the following equation: equation:

10ˆ,ˆ

xy

xx

yyxx

n

ii

n

iii

10

1

2

11

ˆˆ

ˆ

111

ˆˆ

ˆ

est

Page 46: ANOVA and Regression

Assumptions of linear Assumptions of linear regressionregression

LinearityLinearity– Linear relationship between outcome and predictorsLinear relationship between outcome and predictors– E(Y|X=x)=E(Y|X=x)=++xx1 1 + + 22xx22

22 is still a linear regression is still a linear regression equation because each of the equation because each of the ’s is to the first power’s is to the first power

Normality of the residualsNormality of the residuals– The residuals, The residuals, ii, are normally distributed, N(0, , are normally distributed, N(0,

Homoscedasticity of the residualsHomoscedasticity of the residuals– The residuals, The residuals, ii, have the same variance, have the same variance

IndependenceIndependence– All of the data points are independentAll of the data points are independent– Correlated data points can be taken into account Correlated data points can be taken into account

using multivariate and longitudinal data methodsusing multivariate and longitudinal data methods

Page 47: ANOVA and Regression

Linear regression with Linear regression with dichotomous predictordichotomous predictor

Linear regression can also be used for Linear regression can also be used for dichotomous predictors, like sexdichotomous predictors, like sex

Last class we compared relapsing MS Last class we compared relapsing MS patients to progressive MS patientspatients to progressive MS patients

To do this, we use an indicator variable, To do this, we use an indicator variable, which equals 1 for relapsing and 0 for which equals 1 for relapsing and 0 for progressive. The resulting regression progressive. The resulting regression equation for expression isequation for expression is

iii Rex *10

Page 48: ANOVA and Regression

Interpretation of modelInterpretation of model The meaning of the coefficients in this case The meaning of the coefficients in this case

are are – 0 0 is the estimate of the mean expression when is the estimate of the mean expression when

R=0, in the progressive groupR=0, in the progressive group– is the estimate of the mean expression is the estimate of the mean expression

when R=1, in the relapsing groupwhen R=1, in the relapsing group– 1 1 is the estimate of the mean increase in is the estimate of the mean increase in

expression between the two groupsexpression between the two groups The difference between the two groups is The difference between the two groups is 11 If there was no difference between the If there was no difference between the

groups, what would groups, what would 11 equal? equal?

Page 49: ANOVA and Regression

Mean in wildtype=0

Mean in Progressive group=0

Difference between groups=1

Page 50: ANOVA and Regression

Hypothesis testHypothesis test1)1) Null hypothesis: meanNull hypothesis: meanprogressiveprogressive=mean=meanrelapsing relapsing

((11=0)=0)2)2) Explanatory: group membership- Explanatory: group membership-

dichotomousdichotomousOutcome: cytokine production-continuousOutcome: cytokine production-continuous

3)3) Test: Linear regressionTest: Linear regression4)4) 11=6.87=6.875)5) p-value=0.199p-value=0.1996)6) Fail to reject null hypothesisFail to reject null hypothesis7)7) Conclusion: The difference between the Conclusion: The difference between the

groups is not statistically significantgroups is not statistically significant

Page 51: ANOVA and Regression

T-testT-test As hopefully you remember, you could As hopefully you remember, you could

have tested this same null hypothesis have tested this same null hypothesis using a two sample t-testusing a two sample t-test

Very similar result to previous classVery similar result to previous class If we would have assumed equal If we would have assumed equal

variance for our t-test, we would have variance for our t-test, we would have gotten to the same result!!!gotten to the same result!!!

ANOVA results can also be tested using ANOVA results can also be tested using regression using more than one regression using more than one indicatorindicator

Page 52: ANOVA and Regression

Multiple regressionMultiple regression A large advantage of regression is the A large advantage of regression is the

ability to include multiple predictors of an ability to include multiple predictors of an outcome in one analysisoutcome in one analysis

A multiple regression equation looks just A multiple regression equation looks just like a simple regression equation.like a simple regression equation.

exxxY nn ...22110

Page 53: ANOVA and Regression

ExampleExample Brain parenchymal fraction (BPF) is a Brain parenchymal fraction (BPF) is a

measure of disease severity in MSmeasure of disease severity in MS We would like to know if gender has We would like to know if gender has

an effect on BPF in MS patientsan effect on BPF in MS patients We also know that BPF declines with We also know that BPF declines with

age in MS patientsage in MS patients Is there an effect of sex on BPF if we Is there an effect of sex on BPF if we

control for age?control for age?

Page 54: ANOVA and Regression

.75

.8.8

5.9

.95

BP

F

0 .2 .4 .6 .8 1Sex

Blue=males; Red=females

Page 55: ANOVA and Regression

Blue=males; Red=females

.75

.8.8

5.9

.95

BP

F

20 30 40 50 60Age

Page 56: ANOVA and Regression

Is age a potential Is age a potential confounder?confounder?

We know that age has an effect on We know that age has an effect on BPF from previous researchBPF from previous research

We also know that male patients We also know that male patients have a different disease course than have a different disease course than female patients so the age at time of female patients so the age at time of sampling may also be related to sexsampling may also be related to sex

BPFSex

Age

Page 57: ANOVA and Regression

ModelModel The multiple linear regression model The multiple linear regression model

includes a term for both age and sexincludes a term for both age and sex

What are the values genderWhat are the values genderii takes takes on?on?– gendergenderii=0 if the patient is female=0 if the patient is female– gendergenderii=1 if the patient is male=1 if the patient is male

iiii agegenderBPF ** 210

Page 58: ANOVA and Regression

ExpressionExpression Females:Females:

– BPFBPFi i = = 00+ + 22*age*ageii++ii Males:Males:

– BPFBPFi i = (= (00+ + )+ )+ 22*age*ageii++ii What is different about the equations?What is different about the equations?

– InterceptIntercept What is the same?What is the same?

– SlopeSlope This model allows an effect of gender on the This model allows an effect of gender on the

intercept, but not on the change with ageintercept, but not on the change with age

Page 59: ANOVA and Regression

The meaning of each coefficientThe meaning of each coefficient– the average BPF when age is 0 and the the average BPF when age is 0 and the

patient is femalepatient is female– the average difference in BPF between the average difference in BPF between

males and female, HOLDING AGE CONSTANTmales and female, HOLDING AGE CONSTANT– the average increase in BPF for a one unit the average increase in BPF for a one unit

increase in age, HOLDING GENDER CONSTANT increase in age, HOLDING GENDER CONSTANT Note that the interpretation of the Note that the interpretation of the

coefficient requires mention of the other coefficient requires mention of the other variables in the modelvariables in the model

Interpretation of coefficientsInterpretation of coefficients

Page 60: ANOVA and Regression

Estimated coefficientsEstimated coefficients Here is the estimated regression equationHere is the estimated regression equation

The average difference between males and The average difference between males and females is 0.017 holding age constantfemales is 0.017 holding age constant

For every one unit increase in age, the mean For every one unit increase in age, the mean BPF decreases 0.0026 units holding sex constantBPF decreases 0.0026 units holding sex constant

Are either of these effects statistically Are either of these effects statistically significant?significant?– What is the null hypothesis?What is the null hypothesis?

iii agesexFBP *0026.0*017.0942.0ˆ

Page 61: ANOVA and Regression

Hypothesis testHypothesis test1)1) HH00: No effect of sex, controlling for age : No effect of sex, controlling for age =0=02)2) Continuous outcome, continuous predictorContinuous outcome, continuous predictor3)3) Linear regression controlling for sexLinear regression controlling for sex4)4) Summary statistic: Summary statistic: =0.017=0.0175)5) p-value=0.37p-value=0.376)6) Since the p-value is more than 0.05, we fail to Since the p-value is more than 0.05, we fail to

reject the null hypothesisreject the null hypothesis7)7) We conclude that there is no significant We conclude that there is no significant

association between sex and BPF controlling association between sex and BPF controlling for agefor age

Page 62: ANOVA and Regression

Hypothesis testHypothesis test1)1) HH00: No effect of age, controlling for sex : No effect of age, controlling for sex 22=0=02)2) Continuous outcome, continuous predictorContinuous outcome, continuous predictor3)3) Linear regression controlling for sexLinear regression controlling for sex4)4) Summary statistic: Summary statistic: =-0.0026=-0.00265)5) p-value=0.00p-value=0.00 446)6) Since the p-value is less than 0.05, we reject Since the p-value is less than 0.05, we reject

the null hypothesisthe null hypothesis7)7) We conclude that there is a significant We conclude that there is a significant

association between age and BPF controlling association between age and BPF controlling for sexfor sex

Page 63: ANOVA and Regression

Estimated effect of age

p-value for age

Estimated effect of sex

p-value for sex

Page 64: ANOVA and Regression

.75

.8.8

5.9

.95

BP

F

20 30 40 50 60Age

Page 65: ANOVA and Regression

ConclusionsConclusions Although there was a marginally Although there was a marginally

significant association of sex and significant association of sex and BPF, this association was not BPF, this association was not significant after controlling for agesignificant after controlling for age

The significant association between The significant association between age and BPF remained statistically age and BPF remained statistically significant after controlling for sexsignificant after controlling for sex

Page 66: ANOVA and Regression

What we learned (hopefully)What we learned (hopefully) ANOVAANOVA CorrelationCorrelation Basics of regressionBasics of regression