16
Logistic Regression Jacquelyn Victoria & Tamer Wahba 1

Intro to Logistic Regression

Embed Size (px)

Citation preview

Page 1: Intro to Logistic Regression

Logistic RegressionJacquelyn Victoria & Tamer Wahba

1

Page 2: Intro to Logistic Regression

Slide OwnershipJacquelyn Victoria - 3 to 9Tamer Wahba - 10 to 15

2

Page 3: Intro to Logistic Regression

Regression Analysis +

Classification

How can we predict a nominal class using regression analysis?

Consider a binary class:

Each instance x is a vector of feature values

Our output values or class labels are restricted to 0 or 1, i.e. f(x) {0, 1}∈

We need an h(x) where: 0 < h(x) < 1

We need a function which exhibits this behavior

3

Page 4: Intro to Logistic Regression

Logistic Functions Sigmoid Function σ(x)

Asymptotes at y = 1 and y = 0

Easy to specify threshold (σ(0) = .5)

Results are P(y=1)

As a result:

Where θ is a vector of weights

4

Page 5: Intro to Logistic Regression

Cost FunctionNeed to find hθ(x) that is a logistic

function that represents our data

Need to find θ to fit our data

-log(1-x)-log(x)

5

Page 6: Intro to Logistic Regression

Gradient Descent

In order to find the minimum, we can use the partial derivative of J(θ)

do {

}until θ converges

Where α is the learning rate (almost always between 0 and 1, .1-.3 usually a good range)

6

Page 7: Intro to Logistic Regression

Maximum Likelihood Estimation

7

do {

}until θ converges

Can also be calculated using:

Iteratively Reweighted Least Squares

Multinomial data uses Softmax Regression

Page 8: Intro to Logistic Regression

Interpreting hypothesis

8

Recall that σ(0) = .5 and that hθ(x) = σ(θTx)

x1

x2

Page 9: Intro to Logistic Regression

Interpreting hθ

I want to create a model to give me the probability that I will pass a test given how many hours I have studied

Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50

Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1

Using this generated model, calculate my probability of passing given I have studied 3 hours

P(passing| study time = 3) = .61

9source

Page 10: Intro to Logistic Regression

Logistic Regression

Compared to Other

Classifiers

Naive Bayes

Support Vector Machines

Decision Trees

10

Page 11: Intro to Logistic Regression

vs Decision TreeAssumptions

DT: decision boundaries parallel to axes

LR: one smooth boundary

Decision trees can be used when there are multiple decision boundaries

11

Page 12: Intro to Logistic Regression

Feature Weights

NB: each set independently depending on class

LR: together such that decision function tends to be high for positive classes and low for negative classes

Correlated features have no effect on logistic regression

vs Naive Bayes

12

Page 13: Intro to Logistic Regression

vs Support Vector Machine

13

Both attempt to find hyperplane separating training samples

SVM: find the solution with maximum margin

LR: find any solution that separates the instances

SVM is a hard classified while LR is probabilistic

Page 14: Intro to Logistic Regression

AdvantagesWorks well with diagonal decision boundaries

Does not give undue weight to correlated features

Probabilistic outcomes

14

Requires large sample size for stable results

Disadvantages

Page 15: Intro to Logistic Regression

Use CasesCategorical outcomes

Large sample data

Minimal preprocessing

15