Upload
vunhi
View
229
Download
5
Embed Size (px)
Citation preview
FSAN815/ELEG815:Foundations of Statistical
Learning
Gonzalo R. ArceChapter 14: Logistic Regression
Fall 2014
Course Objectives & Structure
Course Objectives & Structure
The course provides an introduction to the mathematics of data analysis anda detailed overview of statistical models for inference and prediction.Course Structure:
Weekly lectures [notes: www.ece.udel.edu/∼arce/Courses]Homework & computer assignments [30%]Midterm & Final examinations [70%]
Textbooks:Papoulis and Pillai, Probability, random variables, and stochasticprocesses.Hastie, Tibshirani and Friedman, The elements of statistical learning.Haykin, Adaptive Filter Theory.
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 2 / 18
Logistic Regression
Logistic Regression
Logistic Regression
The logistic regression model arises from the desire to model the posteriorprobabilities of the K classes via linear functions in x , while at the same timeensuring that they sum to one and remain in [0,1].
logPr(G = 1|X = x)Pr(G = K |X = x)
= β10 +βββT1 x
logPr(G = 2|X = x)Pr(G = K |X = x)
= β20 +βββT2 x
.
.
logPr(G = K −1|X = x)
Pr(G = K |X = x)= β(K−1)0 +βββ
T1 x
(1)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 3 / 18
Logistic Regression
Logistic Regression
Here βββ k = [βk1,βk2, ...,βkp]T , x = [x1,x2, ...,xp]
T .The model is specified in terms of K −1 log-odds or logittransformations(reflecting the constraint that the probabilities sum to one).A simple calculation shows that
Pr(G = k |X = x) =exp(βk0 +βββ
Tk x)
1+∑K−1l=1 exp(βl0 +βββ
Tl x)
,k = 1,2, ...,K −1,
Pr(G = K |X = x) =1
1+∑K−1l=1 exp(βl0 +βββ
Tl x)
(2)
To emphasize the dependence on the entire parameter setθ = {β10,βββ
Y1 , ...,β(K−1)0,βββ
TK−1}, we denote the probabilities
Pr(G = k |X = x) = pk (x;θ)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 4 / 18
Fitting Logistic Regression Models
Fitting Logistic Regression Models
Logistic regression models are usually fit by maximum likelihood, usingthe conditional likelihood of G given X .
The log-likelihood for N observations is
l(θ) =N
∑i=1
logpgi (xi ;θ) (3)
where pk (xi ;θ) = Pr(G = k |X = xi ;θ)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 5 / 18
Fitting Logistic Regression Models
Fitting Logistic Regression ModelsDiscuss in detail the two-class case.
Code the two-class gi via a 0/1 response yi , where yi = 1 when gi = 1,and yi = 0 when gi = 2.Let p1(x;θ) = p(x;θ), p2(x;θ) = 1−p(x;θ)
The log-likelihood can be written as
l(θ) =N
∑i=1{yi logp(xi ;βββ )+(1−yi)log(1−p(xi ;βββ ))}
=N
∑i=1{yi(logp(xi ;βββ )− log(1−p(xi ;βββ )))+ log(1−p(xi ;βββ ))}
=N
∑i=1{yi log
Pr(G = 1|X = xi)
Pr(G = 2|X = xi)+ log(Pr(G = 2|X = xi))}
=N
∑i=1{yiβββ
T xi − log(1+eβββT xi )}
(4)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 6 / 18
Fitting Logistic Regression Models
Fitting Logistic Regression Models
Here βββ = {β10,βββ 1}, and we assume that the vector of inputs xi includes theconstant term 1 to accommodate the intercept.
To maximize the log-likelihood, we set its derivatives to zero. These scoreequations are
∂ l(βββ )∂βββ
=N
∑i=1
xi(yi −p(xi ;βββ )) = 0 (5)
which are p+1 equations nonlinear in βββ .
Since the first component of xi is 1, the first score equation specifies that∑
Ni=1 yi = ∑
Ni=1 p(xi ;βββ ), the expected number of class ones matches the
observed number.
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 7 / 18
Fitting Logistic Regression Models
Newton−Raphson algorithm
To solve the score equations (5), we use the Newton−Raphson algorithm,which requires the second-derivative or Hessian matrix
∂ 2l(βββ )
∂βββ∂βββT =−
N
∑i=1
xixTi p(xi ;βββ )(1−p(xi ;βββ )) (6)
Starting with βββold , a single Newton update is
βββnew = βββ
old − (∂ 2l(βββ )
∂βββ∂βββT )−1 ∂ l(βββ )
∂βββ(7)
where the derivatives are evaluated at βββold .
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 8 / 18
Fitting Logistic Regression Models
Newton Raphson algorithm
Write the score and Hessian in matrix notation.y: N vector of yi values.xi : p+1 vector of input.X: N× (p+1) matrix of xi values.
p: N vector of fitted probabilities with i th element p(xi ;βββold ).
W: N×N diagonal matrix of weights with i th diagonal elementp(xi ;βββ
old )(1−p(xi ;βββold )).
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 9 / 18
Fitting Logistic Regression Models
Fitting Logistic Regression Models
Then we have
∂ l(βββ )∂βββ
= XT (y−p)
∂ 2l(βββ )
∂βββ∂βββT =−XT WX
(8)
The Newton step is thus
βββnew = βββ
old +(XT WX)−1XT (y−p)
= (XT WX)−1XT W(Xβββold +W−1(y−p))
= (XT WX)−1XT Wz
(9)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 10 / 18
Fitting Logistic Regression Models
Fitting Logistic Regression Models
In the second and third line we have re-expressed the Newton step as aweighted least squares step, with the response
z = Xβββold +W−1(y−p) (10)
sometimes known as the adjusted response.These equations get solved repeatedly, since at each iteration pchanges, and hence so does W and z.
This algorithm is referred to as iteratively reweighted least squares or IRLS,since each iteration solves the weighted least squares problem:
βββnew ← argmin
βββ
(z−Xβββ )T W(z−Xβββ ) (11)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 11 / 18
Exercise
Exercise 4.10
Weekly data set is part of the ISLR package. It contains 1089 weeklypercentage returns for 21 years, from the beginning of 1990 to the end of2010.
Lag1 to The percentage returns forLag5 each of the five previous trading weeksVolume The number of shares traded
on the previous week, in billionsToday The percentage return
on the week in questionDirection Whether the market was
Up or Down on this week
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 12 / 18
Exercise
Exercise
(b) Use the full data set to perform a logistic regression.
library(ISLR)attach(Weekly)glm.fit=glm(Direction∼Lag1+Lag2+Lag3+Lag4+Lag5+Volume,data=Weekly,family=binomial)summary(glm.fit)
According to p-value, lag1, lag2 and lag4 appear to be statistically significant.
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 13 / 18
Exercise
Exercise
(c) Compute the confusion matrix and overall fraction of correct predictions.
glm.probs=predict(glm.fit,type="response")glm.pred =rep("Down",1089)glm.pred [glm.probs>.5]="Up"table(glm.pred,Direction)mean(glm.pred==Direction)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 14 / 18
Exercise
Exercise
(d) Fit the logistic regression model using a training data then predict.
train=(Year<2009)A=(Year==2009)B=(Year==2010)Weekly.2009=Weekly[A,]Weekly.2010=Weekly[B,]Direction.2009=Direction[A]Direction.2010=Direction[B]glm.fit=glm(Direction∼Lag2,data=Weekly,family=binomial,subset=train)glm.probs=predict(glm.fit,Weekly.2009,type="response")glm.pred =rep("Down",52)glm.pred [glm.probs>.5]="Up"table(glm.pred,Direction.2009)mean(glm.pred==Direction.2009)
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 15 / 18
Exercise
Exercise
Figure: prediction for 2009 Figure: prediction for 2010
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 16 / 18
Exercise
Exercise(d) Fit the LDA model using a training data then predict.
library(MASS)lda.fit=lda(Direction∼Lag2,data=Weekly,subset=train)lda.pred=predict(lda.fit,Weekly.2009)lda.class=lda.pred$classtable(lda.class,Direction.2009)mean(lda.class==Direction.2009)
Figure: prediction for 2009 Figure: prediction for 2010
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 17 / 18
Exercise
Exercise
(e) Fit the QDA model using a training data then predict.
qda.fit=qda(Direction∼Lag2,data=Weekly,subset=train)qda.pred=predict(qda.fit,Weekly.2009)qda.class=qda.pred$classtable(qda.class,Direction.2009)mean(qda.class==Direction.2009)
Figure: prediction for 2009 Figure: prediction for 2010
Gonzalo R. Arce (ECE, Univ. of Delaware) FSAN-815/ELEG-815: Statistical Learning Fall 2014 18 / 18