Upload
samuel-sims
View
254
Download
0
Embed Size (px)
Citation preview
Logistic Regression
• Linear regression – numerical responseLogistic regression – binary categorical response
• eg. has the disease, or unaffected by the disease• Interested to find the attributes that are
associated with the onset of the disease• Or interested to predict the probability of getting
the disease, given a set of attributes
Theory
• Linear regression – numerical responseLogistic regression – binary categorical response
• eg. has the disease, or unaffected by the disease• Interested to find the attributes that are
associated with the onset of the disease• Or interested to predict the probability of getting
the disease, given a set of attributes• Fits the model:
• Effectively a linear model for log odds
...1
log 332211
XbXbXbap
p
Theory
Lung CancerAn age effect?Associated with smoking?
Logistic Regression• Assess whether a variable is significantly
associated with the response• Quantify the association, in terms of odds ratio
Logistic Regression• Assess whether a variable is significantly
associated with the response• Quantify the association, in terms of odds ratio• Consider the equation
0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age – 20)
where p = probability of getting lung cancer
with baseline of a non-smoking female of age 20• Keep everything else constant to interpret the
effects of each variable
pp
1log
0.03 + 1.4 Smoke + 0.02 Gender + 0.01 (Age – 20)
• Non-smoking male of age 20 is exp(0.02) = 1.02 times more likely than a non-smoking female of age 20 to get lung cancer
• Smoking female of age 20 is exp(1.4) = 4.06 times more likely than non-smoking female of age 20
• Non-smoking female of age 50 is exp(30 0.01) = 1.35 times more likely than non-smoking female of age 20
Combining the effects• Smoking male of age 50 is
exp(1.4 + 0.02 + 0.01 30) = 5.58times more likely than a non-smoking female of age 20
pp
1log
Note the encodings!
Interpret based on encodings
Summary
• Large suite of statistical tools for analysing data
• Important to choose the appropriate tools for the kind of data available.
• Most statistical tests require particular assumptions to be valid – need to check these assumptions.