Unit 4b: Fitting the Logistic Model to Data
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 1
http://xkcd.com/953/
• Building the Logistic Regression Model• Likelihood Ratio Chi-Square and Pseudo-• Interpreting Logistic Regression Model Coefficients• Probabilities, Odds, Log Odds, and Log Odds Ratios
© Andrew Ho, Harvard Graduate School of Education Unit 4b– Slide 2
Multiple RegressionAnalysis (MRA)
Multiple RegressionAnalysis (MRA) iiii XXY 22110
Do your residuals meet the required assumptions?
Test for residual
normality
Use influence statistics to
detect atypical datapoints
If your residuals are not independent,
replace OLS by GLS regression analysis
Use Individual
growth modeling
Specify a Multi-level
Model
If time is a predictor, you need discrete-
time survival analysis…
If your outcome is categorical, you need to
use…
Binomial logistic
regression analysis
(dichotomous outcome)
Multinomial logistic
regression analysis
(polytomous outcome)
If you have more predictors than you
can deal with,
Create taxonomies of fitted models and compare
them.
Form composites of the indicators of any common
construct.
Conduct a Principal Components Analysis
Use Cluster Analysis
Use non-linear regression analysis.
Transform the outcome or predictor
If your outcome vs. predictor relationship
is non-linear,
Use Factor Analysis:EFA or CFA?
Course Roadmap: Unit 4b
Today’s Topic Area
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1,000)
© Andrew Ho, Harvard Graduate School of Education Unit 4a – Slide 3
The Bivariate Distribution of HOME on HUBSAL
0 10 20 30 40 50Husband's Annual Salary (in $1,000)
In Labor Force
HomemakerRQ: In 1976, were married Canadian women
who had children at home and husbands with higher salaries more likely to work at home
rather than joining the labor force (when compared to their married peers with no children at home and husbands who earn
less)?
Unit 4b – Slide 4
HUBSALeHOME
101
11P
This will be our statistical model for relating a categorical outcome to predictors.We will fit it to data using Nonlinear Regression Analysis …
We consider the non-linear Logistic Regression Model for representing the hypothesized population relationship between the dichotomous outcome, HOME, and predictors … We consider the non-linear Logistic Regression Model for representing the hypothesized population relationship between the dichotomous outcome, HOME, and predictors …
The outcome being modeled is the
underlying probability that the
value of the outcome HOME equals 1
Parameter 1 determines the slope of
the curve, but is not equal to it (in fact, the
slope is different at every point on the
curve).
Parameter 0 determines the intercept of the curve, but is not
equal to it.
The Logistic Regression Model
© Andrew Ho, Harvard Graduate School of Education
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 5
Building the Logistic Regression Model: The Unconditional Model
_cons .8715548 .1052638 8.28 0.000 .6652415 1.077868 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -263.22441 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 434
Iteration 1: log likelihood = -263.22441 Iteration 0: log likelihood = -263.22441
. logit HOME
To gain our footing, we can fit an unconditional logistic model:
This should look familiar: it is our unconditional percentage of women who are homemakers in our sample:
HOME 434 .7050691 .456538 0 1 Variable Obs Mean Std. Dev. Min Max
. summarize HOME
We recall from multilevel modeling that we wish to maximize our likelihood, “maximum likelihood.”
Because the likelihoods are a product of many, many small probabilities, we maximize the sum of log-likelihoods, an attempt at making a negative number as positive as possible.
Later, we’ll use the difference in -2*loglikelihoods (the deviance) in a statistical test to compare models.
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 6
Building the Logistic Regression Model
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
Iteration 4: log likelihood = -252.02479 Iteration 3: log likelihood = -252.02479 Iteration 2: log likelihood = -252.02492 Iteration 1: log likelihood = -252.20292 Iteration 0: log likelihood = -263.22441
. logit HOME HUBSAL
Our fitted model
Before we interpret these coefficients directly, it is generally easiest to visualize the fitted model graphically.
We notice that our log likelihood is more positive than before (a better fit, from -263 to -252), but it took a bit longer to converge (increased complexity given the predictor).
We can show that the deviance (loglikelihood) decreases from 526 to 504.
0.5
1Is
Wo
man
a H
ome
mak
er?
0 10 20 30 40 50Husband's Annual Salary (in $1000)© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 7
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
Graphical Interpretation of the Logistic Regression Model
�̂� (𝐻𝑂𝑀𝐸=1 )= 1
1+𝑒− (−.237+.081𝐻𝑈𝐵𝑆𝐴𝐿 )
Comparing local polynomial, linear, and logistic fits to the data.
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 8
The Likelihood Ratio Chi-Square
_cons .8715548 .1052638 8.28 0.000 .6652415 1.077868 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -263.22441 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 434
Iteration 1: log likelihood = -263.22441 Iteration 0: log likelihood = -263.22441
. logit HOME
Our Log Likelihood from our baseline model, with no predictors, is -263.22.
Deviance = -2*loglikelihood = 526.44
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
Iteration 4: log likelihood = -252.02479 Iteration 3: log likelihood = -252.02479 Iteration 2: log likelihood = -252.02492 Iteration 1: log likelihood = -252.20292 Iteration 0: log likelihood = -263.22441
. logit HOME HUBSAL
Our Log Likelihood from our 1-predictor model is -252.02. The loglikelihood of the data is less negative (more likely) given the model parameter estimates.
Deviance = -2*loglikelihood = 504.04. The deviance has dropped (and will always drop).
The drop in deviance is . This drop in deviance is chi-square distributed. A difference in logs is a log of a ratio, hence “likelihood ratio chi-square.” Degrees of freedom are equal to the difference in the number of terms in the model (in this case, from 0 to 1
predictors, so 1 degree of freedom). Because we are comparing this to the baseline model, this is an omnibus test, of the null hypothesis that all
coefficients are 0, which we can reject, . We can generalize the likelihood-ratio test to compare any nested models.
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
Iteration 4: log likelihood = -252.02479 Iteration 3: log likelihood = -252.02479 Iteration 2: log likelihood = -252.02492 Iteration 1: log likelihood = -252.20292 Iteration 0: log likelihood = -263.22441
. logit HOME HUBSAL
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 9
The Pseudo- Statistic0
.2.4
.6.8
1Is
Wo
man
a H
ome
mak
er?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
We could calculate our usual statistic
One minus the sum of squared residuals over the original variation of .
This statistic isn’t that meaningful when is constrained to be dichotomous, as you can see from the graph.
Instead, we define another “Pseudo-” as the proportional reduction in deviance from the unconditional model, or the proportional increase in loglikelihood over the unconditional model.
4.25% of the unconditional model deviance has been accounted for by the predictors.
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 10
Interpreting Model Results Graphically, Formulaically
We have emphasized graphical interpretation of results throughout this course, particularly for interactions and nonlinear relationships.
We can always pick a handful of prototypical points and describe model implications.
Husband's income in 1976 Canadian
Dollars
Estimated probability that the wife is a
homemaker$10,000 64%$20,000 80%$30,000 90%$40,000 95%
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 11
Interpreting Logistic Model Parameter Estimates – Interpreting Sign
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
. logit HOME HUBSAL, nolog
But we must also be able to interpret parameter estimates directly, given their prominent placement in tables.
Direct interpretation of parameter estimates can be difficult with interactions and nonlinear relationships. It’s certainly difficult here.
Positive coefficients imply that positive increments in predict greater probabilities of , if all else in the model can be held constant.
Positive constants imply that when all .
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 12
Object
Is it an Easter Egg?(0 = no;1 = yes) 1 0 1 0 1 1 0 1 0 1
Probability of picking an Easter Egg at
random,p
Odds of picking an Easter Egg (vs. not an
Easter Egg),(p/1-p)
Log-Odds of picking an Easter Egg (vs. not an
Easter Egg),Log(p/1-p)
6.10
6randomat EggEaster an picking of
p
yProbabilit
5.12
3
4.
6.
6.1
6.
1
Egg"Easter an pickingnot" vs.Egg"Easter an
picking" of
p
pOdds
405.05.1Log Egg"Easter an picking
not" vs.Egg"Easter an picking" of
e
Odds
eLog
Probability, Odds, and Log-Odds: Formulaically
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 13
One issue with probabilities is that their range of admissible values is restricted, to falling between 0 and 1. This was one of our clues that a linear model would be inappropriate.
One issue with probabilities is that their range of admissible values is restricted, to falling between 0 and 1. This was one of our clues that a linear model would be inappropriate.
The logit transformation stretches the probability scale, facilitating a linear relationshipThe logit transformation stretches the probability scale, facilitating a linear relationship
00 11ppProbabilityProbability
Theoretical RangeTheoretical Range
MinimumMinimum MaximumMaximumFormulaFormulaQuantityQuantity
-- ++Log(Odds) or “logit”Log(Odds) or “logit”
p
pe 1
log
Notice that a log-odds transformation of a probability leads to a scale with an
unrestricted range
00 ++OddsOddsp
p-1
Probability, Odds, and Log-Odds: By Range
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 14
From Probabilities to Odds
: How much more likely that than ?
Percentage Probability Odds Odds
10% 0.10 1/9 0.11
25% 0.25 1/3 0.33
50% 0.50 1/1 1
75% 0.75 3/1 3
90% 0.90 9/1 9
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 15
From Probabilities to Log-Odds (Logits)
𝐿𝑜𝑔𝑖𝑡𝑠=log( 𝑝1−𝑝 )
Percentage Probability Logits
10% 0.10 -2.2
25% 0.25 -1.1
50% 0.50 0
75% 0.75 1.1
90% 0.90 2.2
Try to remember that a logit of 1 is a probability around 25%/75%, and a logit of 2 is a probability around 10%/90%.
Note that the logit transformation stretches extreme probabilities and compresses central probabilities.
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 16
The Logistic Function as the Inverse of the Logit Function
If , then what is ?
Recall that our logistic regression model is:
If is like , and is like , then we can reexpress the logistic regression model in terms of .
If , then
Our logistic regression model is a linear model for the log-odds of
_cons .8715548 .1052638 8.28 0.000 .6652415 1.077868 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -263.22441 Pseudo R2 = 0.0000 Prob > chi2 = . LR chi2(0) = 0.00Logistic regression Number of obs = 434
Iteration 1: log likelihood = -263.22441 Iteration 0: log likelihood = -263.22441
. logit HOME
Revisiting our baseline, unconditional model, we can interpret our constant, 0.87, on the logit scale.
We recall that a logit of 1 is a probability around 75%, so we aren’t surprised that:
A Linear Model for the Log Odds that
0.2
.4.6
.81
Fitt
ed
Pro
bab
ility
tha
t HO
ME
=1
0 10 20 30 40 50Husband's Annual Salary (in $1,000)
01
23
4Lo
git T
ran
sfo
rmed
Fitt
ed P
rob
abili
ty th
at H
OM
E=
1
0 10 20 30 40 50Husband's Annual Salary (in $1,000)© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 17
General Relationship
Our Model
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
. logit HOME HUBSAL, nolog
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 18
Interpreting Coefficients in Terms of Logits (Log-Odds)
Our constant term is the estimated log-odds that (the woman is a homemaker) when all (when the husband’s salary is 0).
We remember that a logit of 0 is a probability of 50%, and a logit of -1 is 25%.
If we want the exact probability,
For two observations that differ by 1 unit on , is the estimated difference in their log-odds that .
For two women whose husband’s salaries differ by $1000, their estimated difference in the log-odds that they are homemakers is .081.
Recall that shifting logits from 0 to 1 takes you from 50% to around 75%, and from 1 to 2 from around 75% to around 90%.
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 19
Interpreting Model Results in Terms of Odds
Let’s take a look at the fitted odds that the woman is a homemaker when her husband’s annual salary is $10K.
Odds = , in this case,
When the husband earns $10K/year, the fitted odds that the woman is a homemaker is 1.77 to 1.
When the husband earns $10K/year, for every woman in the workforce, we estimate that 1.77 are homemakers.
When the husband earns $10K/year, the estimated probability that the woman is a homemaker is 1.77 times the estimated probability that the woman works outside the home.
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 20
Interpreting Model Results in Terms of Odds Ratios
Husband's income in 1976
Canadian Dollars
Estimated probability that the
wife is a homemaker
Estimated odds that the wife is a
homemakerEstimated Odds Ratio
$10,000 64% 1.77 2.248$20,000 80% 3.99 2.248$30,000 90% 8.96 2.248$40,000 95% 20.15
We can calculate the ratio of odds at regular intervals: How much greater are the odds that a wife is a homemaker when the husband’s salary is $20,000 vs. $10,000? This odds ratio is 3.99/1.77=2.248.
𝑝1−𝑝
=.64.36
=1.77
𝑝1−𝑝
=.80.20
=3.99
This is not a typo! Successive odds ratios are constant!
0.2
.4.6
.81
Is W
om
an a
Hom
em
ake
r?
0 10 20 30 40 50Husband's Annual Salary (in $1000)
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 21
From Log-Odds to Odds Ratios
Let’s say that .
Let’s try to add 1 to and see what happens.
Subtracting the fourth equation from the third,
And, since the difference in logs is a ratio:Thus, exponentiating the slope, , gives a constant “odds ratio,” the multiplicative factor by which odds increment for a unit increment in .
_cons -.2371923 .2626906 -0.90 0.367 -.7520565 .2776718 HUBSAL .0808408 .0184165 4.39 0.000 .0447451 .1169364 HOME Coef. Std. Err. z P>|z| [95% Conf. Interval]
Log likelihood = -252.02479 Pseudo R2 = 0.0425 Prob > chi2 = 0.0000 LR chi2(1) = 22.40Logistic regression Number of obs = 434
. logit HOME HUBSAL, nolog
© Andrew Ho, Harvard Graduate School of Education Unit 4b – Slide 22
Four Ways to Interpret Slope Coefficients in a Logistic Regression Model
Odds RatiosTwo wives whose husband’s 1976 salaries differ by $1000 have fitted odds of being a homemaker that differ by a factor of , or a predicted 8.4% increment in fitted odds for a unit increment in .
Pick Prototypical Odds
Estimated odds of being a homemaker across
prototypical husband’s income levels:
Husband's income in 1976
Canadian Dollars
Estimated probability that the
wife is a homemaker
Estimated odds that the wife is a
homemakerEstimated Odds Ratio
$10,000 64% 1.77 2.248$20,000 80% 3.99 2.248$30,000 90% 8.96 2.248$40,000 95% 20.15
Log-Odds/LogitsTwo women whose husband’s 1976 salaries differ by $1000 differ by .081 in their fitted log-odds of being a homemaker.
Pick Prototypical Probabilities
Estimated probabilities of being a homemaker
across prototypical husband’s income
levels: