41
Copyright ©2010 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Statistics and Data Analysis for Nursing Research, Second Edition Denise F. Polit Statistics and Data Analysis for Nursing Research Second Edition CHAPTER Logistic Regression 12

Polit ln ch12

Embed Size (px)

Citation preview

Page 1: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Statistics and Data Analysisfor Nursing Research

Second Edition

CHAPTER

Logistic Regression

12

Page 2: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Logistic Regression• Logistic regression (logit analysis)

analyzes the relationship between one or more predictor variable and a categorical dependent variable:– Binary logistic regression, used when the

outcome is dichotomous (e.g., sepsis, absence of sepsis)

– Multinomial logistic regression, used when the outcome has three or more categories (e.g., live birth, miscarriage, abortion) Chapter focuses on binary logistic regression

Page 3: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Maximum Likelihood Estimation

• In logistic regression, parameter estimation is based on maximum likelihood estimation (MLE)

• Maximum likelihood estimators are those that estimate the parameters most likely to have generated the observed sample data

Page 4: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

The Logit

• Logistic regression predicts the odds that an outcome will occur– The odds of an outcome is the ratio of the

probability that an event will occur to the probability that it will not

• In logistic regression, the dependent variable is transformed to be the natural log of the odds of the outcome, which is called a logit– A logit ranges from minus infinity to plus infinity

Page 5: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Logistic Regression Equation

• The predicted value or left side of the equation is the logit

• The right side of the equation is the constant plus predictor variables weighted by b regression coefficients:

• The equation: Log [Prob (event) ÷ Prob (no event)] = b0 + b1X1 + b2X2 + .... bkXk

Page 6: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Logistic Regression Equation (cont’d)

• Log odds are incomprehensible, so the equation is modified so that the left hand expression is the odds, not the log odds

• Right hand expression involves raising e (the base of natural logarithms) to the power of the right side of the equation:

eb0 + b1X1 + b2X2 + .... bkXk

Page 7: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Odds Ratio

• The factor by which the odds change for a given predictor is the odds ratio

• The odds ratio (OR) provides an estimate of the risk of the event occurring given one condition, versus the risk of it occurring given another condition, when other predictors in the equation are held constant

Page 8: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Odds Ratio Example

• The odds ratio for having a tubal ligation, predicted on the basis of whether or not a woman in this population has a child with a disability = 1.41

• The odds of having a tubal ligation were 41% higher for women who had a disabled child than for those who did not, holding constant other factors like age and number of prior births

Page 9: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Classification

• The logistic regression equation produces predicted probabilities of the outcome for each case– Probabilities range from .00 to 1.00

• Predicted probabilities can be used to classify cases

• The default for the cut value is .50: – Those whose predicted probability > .50 are

predicted to have the outcome; others are classified as not having the outcome

Page 10: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Classification Table Example

Observed: Tubal Ligation?

Predicted: Tubal Ligation?

Percent Correct

Predicted: No (0)

Predicted: Yes (1)

No (0) 2017 366 84.6%

Yes (1) 946 435 31.5%

Overall Percentage 65.1%

Page 11: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Dependent variable is usually coded1 (to represent an event or characteristic) or 0 (to represent the absence of the event or

characteristic)

• SPSS by default predicts to the category coded 1

Dependent Variable Coding

Page 12: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• In logistic regression, predictors can be: – Continuous variables (e.g., age)– Dichotomous variables (e.g., sex)– Categorical variables (e.g., employment status:

not working (1), working part time (2), working full time (3)

– Interaction terms

Predictor Variables

Page 13: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• In SPSS, categorical predictors can be defined in terms of the type of contrast desired and which category to use as the reference category

• SPSS will create C - 1 new variables, where C = number of categories

Categorical Predictor Variables

Page 14: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Indicator coding– C - 1 new “dummy” variables are created, all

with codes of 1 or 0– Reference group is coded 0 on all new

variables– Coefficients for new variables represent the

effect of each category compared to the reference category

Indicator Coding

Page 15: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Deviation coding– C - 1 new variables are created– Presence of attribute coded 1, absence coded

0, BUT– Reference group is coded -1 on all new

variables– Coefficients for new variables represent the

effect of each category compared to average effects for all categories

Deviation Coding

Page 16: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Parameter Coding

Marital Status

Frequency Category 1

Category 2

Married 300 1 0

Divorced, Widowed

100 0 1

Never Married

100 -1 -1

Deviation Coding Example

Page 17: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Same entry options as in least-squares multiple regression: – Simultaneous (direct), all predictors entered

at once– Hierarchical (sequential), researcher

controls order of entry in blocks– Stepwise, forward selection or backward

elimination of variables using statistical criteria (LR criterion preferred)

Entering Predictors in Logstic Regression

Page 18: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Several approaches to testing the overall goodness of fit of the data to the hypothesized model of predictors

• Standard approach involves calculating the likelihood index, which is the probability of the observed results– MLE seeks to maximize likelihood; iterations stop

when likelihood does not increase significantly

• Likelihood most often shown transformed, multiplying its value by -2 times its log (-2LL)

Testing the Overall Model

Page 19: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• The likelihood ratio test (chi-square goodness-of-fit test) involves subtracting -2LL for the model with predictors from -2LL for the null model (constant-only model)

• The null model estimates the outcome without any predictors:– In the absence of other information, the null

model predicts that everyone has the outcome that is most prevalent

Likelihood Ratio Test

Page 20: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Basic likelihood ratio statistic:

Χ2 = (-2LL [Reduced mode]) – (-2LL [Larger model])

• The null hypothesis is rejected if the chi-square value is statistically significant

• The likelihood ratio test can also be used to evaluate the significance of improvements to the model when new predictors are added (e.g., in hierarchical regression)

Likelihood Ratio Test (cont’d)

Page 21: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• SPSS presents the likelihood ratio test results in a panel called Omnibus Test of Model Coefficients

• There is a statistic for the overall model, and for individual steps and blocks (not it is always relevant)

Chi-Square df Sig.

Step 450.567 5 .000

Block 450.567 5 .000

Model 450.567 5 .000

SPSS Omnibus Model Test

Page 22: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• An alternative approach to testing the overall model: Comparing the prediction model to a hypothetically “perfect” model

• One such test is the Hosmer-Lemeshow test, which involves dividing people in the two outcome categories into deciles of risk, based on deciles of the predicted probability value

Hosmer-Lemeshow Test

Page 23: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• The 10 deciles for the two outcome categories result in a 2 × 10 matrix, for which observed and expected frequencies are compared; then, a chi-square statistic is computed

• The desirable outcome is nonsignificance, which supports the inference that the model being tested is not reliably different from the perfect model

Hosmer-Lemeshow Test (cont’d)

Page 24: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• The Homer-Lemeshow test is preferred by some to the likelihood ratio goodness-of-fit test BUT

• The test is not recommended for use with small samples (< 400 cases) or when small cell frequencies are expected

• Also, it can result in significance with large samples even when the model fits well

Hosmer-Lemeshow Test (cont’d)

Page 25: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Individual predictors are most often evaluated using the Wald statistic

• When predictors have 1 degree of freedom, the squared Wald statistic, which is distributed as chi-square, is:

(b ÷ SEb)2

– Where b = regression coefficient and SEb = its standard error

Wald Statistic

Page 26: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• A Wald statistic is computed for each predictor or interaction term, indicating whether the b coefficient is statistically different from zero

• When there are categorical variables, a Wald statistic is computed for the overall variable, and for each new variable representing the desired contrast

Wald Statistic (cont’d)

Page 27: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• When the absolute value of a b coefficient is large, its standard error is large, which results in a heightened risk of a Type II error with the Wald statistic

Wald Statistic Problems

Page 28: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• Thus, some prefer the likelihood ratio improvement test to evaluate individual predictors:– This requires each predictor to be added in

successive blocks, to evaluate whether reductions to -2LL are significant

– This approach should always be used if the absolute value of a b coefficient is large

Wald Statistic Problems (cont’d)

Page 29: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

• SPSS presents b, SE, Wald, and odds ratios [Exp(B)] in the panel called “Variables in the Equation”

b SE Wald df Sig. Exp(B)

Age .07 .006 180.2 1 .000 1.077

Births .27 .025 118.5 1 .000 1.32

Married .13 .101 1.65 1 .241 .98

Constant -4.05 .212 370.4 1 .000

Wald Statistic Output

Page 30: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Classification Success

• Another way to consider the success of the model: Its ability to correctly classify sample members for whom the outcome is known

• Compare the overall percent correctly classified with predictors to the percent correctly classified with the null model

• Also, compare improvement in classification for those in an important outcome group

Page 31: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Effect Size

• There is no ideal measure of overall effect size in logistic regression

• Several pseudo R2 statistics have been proposed,

including:– Cox and Snell R2: Not ideal, can never achieve the

value of 1.0– Nagelkerke R2: Can range from .00 to 1.00, and is a

preferred index

• These indexes are approximations to R2 but should not be interpreted as a proportion of the variance explained

Page 32: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Sample Size

• For stability of the parameter estimates, there should be at least 15 (but preferably 20+) cases for each predictor in the model

• Power analysis in logistic regression can be done, but is complex:– A crude approximation to achieve adequate power:

Base sample size estimation on the expected relationship between the outcome and a single important predictor, preferably the one with the most modest relationship to the outcome

Page 33: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Assumptions in Logistic Regression

• Assumptions in logistic regression are much less restrictive than in least-squares regression

• Does not assume:– Linear relationship between dependent

variable and predictors– Normal distribution (multivariate normality)– Homoscedasticity (homogeneity of variances)

Page 34: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Assumptions in Logistic Regression (cont’d)

• BUT, there are two important assumptions:– Independence of observations (i.e., data

are not from a repeated measures or pair-matched design)

– Linearity between the continuous predictors and the logit Violation increases risk of a Type II error

Page 35: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Linearity Assumption

• Linearity assumption is not easy to test

• One approach is a logit step test:– Divide a continuous variable into equal-

interval categories– Enter this new variable as a categorical

variable, with indicator coding– Examine resulting b coefficients to see if

the increase or decrease in magnitude of the coefficients is approximately linear

Page 36: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Other Potential Problems

• Use predictors with minimal level of measurement error

• Avoid multicollinearity (high correlations among predictors)

• Avoid outliers:– An outlier in logistic regression is one for

which the absolute value of the standardized residual value is large (greater than 2.58 or, less conservatively, 3.0)

Page 37: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Logistic Regression

• Analyze Regression Binary Logistics

• Move outcome into Dependent slot

• Move predictors into slot for Covariates

Page 38: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Logistic Regression (cont’d)

• Hierarchical entry: define new blocks using “Next” button

• To define categorical variable contrasts, push Categorical button

• To select statistical and display options, push Options button

Page 39: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Logistic Regression: Categorical Dialog Box

• Move categorical variables into slot for Categorical variables

• Select type of contrast

• Select reference category

• Remember to click Change

Page 40: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Logistic Regression: Options Dialog Box

• Useful options:• Hosmer-Lemeshow

test• Iteration history (to

see -2LL for null model)

• Listing of outliers via residuals

• Confidence intervals around odds ratios

Page 41: Polit ln ch12

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Logistic Regression: Save Dialog Box

• Use the Save dialog box to add new variables to the original data file for each case

• Especially useful: Predicted probabilities

• Predicted classification on the outcome (group membership)