Intro to Logistic Regression

Preview:

Citation preview

Logistic RegressionJacquelyn Victoria & Tamer Wahba

1

Slide OwnershipJacquelyn Victoria - 3 to 9Tamer Wahba - 10 to 15

2

Regression Analysis +

Classification

How can we predict a nominal class using regression analysis?

Consider a binary class:

Each instance x is a vector of feature values

Our output values or class labels are restricted to 0 or 1, i.e. f(x) {0, 1}∈

We need an h(x) where: 0 < h(x) < 1

We need a function which exhibits this behavior

3

Logistic Functions Sigmoid Function σ(x)

Asymptotes at y = 1 and y = 0

Easy to specify threshold (σ(0) = .5)

Results are P(y=1)

As a result:

Where θ is a vector of weights

4

Cost FunctionNeed to find hθ(x) that is a logistic

function that represents our data

Need to find θ to fit our data

-log(1-x)-log(x)

5

Gradient Descent

In order to find the minimum, we can use the partial derivative of J(θ)

do {

}until θ converges

Where α is the learning rate (almost always between 0 and 1, .1-.3 usually a good range)

6

Maximum Likelihood Estimation

7

do {

}until θ converges

Can also be calculated using:

Iteratively Reweighted Least Squares

Multinomial data uses Softmax Regression

Interpreting hypothesis

8

Recall that σ(0) = .5 and that hθ(x) = σ(θTx)

x1

x2

Interpreting hθ

I want to create a model to give me the probability that I will pass a test given how many hours I have studied

Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50

Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1

Using this generated model, calculate my probability of passing given I have studied 3 hours

P(passing| study time = 3) = .61

9source

Logistic Regression

Compared to Other

Classifiers

Naive Bayes

Support Vector Machines

Decision Trees

10

vs Decision TreeAssumptions

DT: decision boundaries parallel to axes

LR: one smooth boundary

Decision trees can be used when there are multiple decision boundaries

11

Feature Weights

NB: each set independently depending on class

LR: together such that decision function tends to be high for positive classes and low for negative classes

Correlated features have no effect on logistic regression

vs Naive Bayes

12

vs Support Vector Machine

13

Both attempt to find hyperplane separating training samples

SVM: find the solution with maximum margin

LR: find any solution that separates the instances

SVM is a hard classified while LR is probabilistic

AdvantagesWorks well with diagonal decision boundaries

Does not give undue weight to correlated features

Probabilistic outcomes

14

Requires large sample size for stable results

Disadvantages

Use CasesCategorical outcomes

Large sample data

Minimal preprocessing

15

Recommended