37
EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia [email protected] January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28, 2020 1 / 37

EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia [email protected] January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

EPSE 682: Multivariate Analysis

Ed Kroc

University of British Columbia

[email protected]

January 28, 2020

Ed Kroc (UBC) EPSE 682 January 28, 2020 1 / 37

Page 2: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

If your response data is binary, then using ordinary linear regression isgoing to be a bad idea: you will get predicted/fitted values that makeno sense!

Moreover, when Y is binary, it is really only PrpY “ 1q that we careabout (see next slides). But probabilities have to live on r0, 1s.

Ed Kroc (UBC) EPSE 682 January 28, 2020 2 / 37

Page 3: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

If a random quantity Y is binary, then is it necessarily a Bernoullirandom variable:

Y „ Berpπq, where π :“ PrpY “ 1q.

Notice that π P r0, 1s.

Notice that π completely characterizes the distribution of Y ; e.g.

PrpY “ 0q “ 1´ PrpY “ 1q “ 1´ π.

EpY q “ π, VarpY q “ πp1´ πq.

This is in stark contrast to a normal random variable, where there aretwo free parameters: µ P R that determines the mean, and σ P R`that (independently) determines the variance.

Ed Kroc (UBC) EPSE 682 January 28, 2020 3 / 37

Page 4: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

The sum of independent and identically distributed (i.i.d.) Bernoullirandom variables is called a Binomial random variable:

Y “

mÿ

i“1

Yi „ Binpm, πq, where Yi „ Berpπq.

Such a random variable naturally counts the number of “successful”Bernoulli trials.

Notice: m P N, π P r0, 1s.

Can easily show:

EpY q “ mπ, VarpY q “ mπp1´ πq.

Notice: both parameters jointly determine the mean and variance;contrast with a normal random variable.

Ed Kroc (UBC) EPSE 682 January 28, 2020 4 / 37

Page 5: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

The associated probability mass function for Binomial Y is:

PrpY “ yq “

ˆ

m

y

˙

πy p1´ πqm´y

Suppose we observe sample data on Y binary and predictorsX :“ tX1, . . . ,Xpu.

Given any particular values for the predictors, our sample data wouldyield a certain number of “successes” (Y “ 1) and a certain numberof “failures” (Y “ 0).

Assuming a Binomial model for Y (given the values of any set ofpredictors X ), we could then write

π :“ PrpY “ 1 | X q “pÿ

j“0

βjXj

Ed Kroc (UBC) EPSE 682 January 28, 2020 5 / 37

Page 6: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

Therefore, the likelihood function associated to this set of sampledata (size n) is:

`pπ | x1, . . . , xp, yq “nź

i“1

ˆ

mi

yi

˙

πyi p1´ πqmi´yi

There is a problem with this formulation though: recall that we set

π “

pÿ

j“0

βjXj

Recall that probabilities always live on r0, 1s. But those predictorsXj ’s can be anything at all!

Ed Kroc (UBC) EPSE 682 January 28, 2020 6 / 37

Page 7: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Towards the generalized linear model: logistic regression

There are many different functions that will transform probabilities(#’s between 0 and 1) into numbers that span all of R. The mostcommonly used are:

Logit or inverse logistic function:

g1pπq “ log

ˆ

π

1´ π

˙

Probit or inverse Normal function:

g2pπq “ Φ´1pπq, pinverse CDF of standard Normalq

Complementary log-log function:

g3pπq “ logp´ logp1´ πqq

Log-log function:g4pπq “ ´ logp´ logpπqq

Ed Kroc (UBC) EPSE 682 January 28, 2020 7 / 37

Page 8: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression

The standard logistic regression model is:

log

ˆ

π

1´ π

˙

pÿ

j“0

βjXj ,

where π :“ PrpY “ 1 | X q.

It’s often of interest to solve this explicitly for the conditional“success” probability π:

π “expp

řpj“0 βjXjq

1` exppřp

j“0 βjXjq“

1

1` expp´řp

j“0 βjXjq

This gives us an explicit expression for PrpY “ 1 | X q.

Note: a function of the form exppxq1`exppxq is called a logistic function.

Ed Kroc (UBC) EPSE 682 January 28, 2020 8 / 37

Page 9: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: MLE

Recall that we already have a likelihood function for this model:

`pπ | x1, . . . , xp, yq “nź

i“1

ˆ

mi

yi

˙

πyi p1´ πqmi´yi

One can use this function then to derive the MLEs of the modelcoefficients, the βj ’s, just as in ordinary linear regression.

In general though, the MLE solutions won’t have nice closed formexpressions like we had in linear regression, but they can be very wellapproximated by a variety of numerical techniques (more on thislater).

Ed Kroc (UBC) EPSE 682 January 28, 2020 9 / 37

Page 10: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: interpretation of model coefficients

Recall: the model coefficients in ordinary linear regression were ofteneasy to interpret. For example, the classic interpretation that β1 tellsyou how much the average value of Y should change if X changes by1 unit in the simple model (see HW1 also):

Y “ β0 ` β1X ` ε.

A particularly nice feature of logistic regression is that the modelcoefficients are also relatively easy to interpret; i.e. they have someinherent, stand-alone meaning.

Ed Kroc (UBC) EPSE 682 January 28, 2020 10 / 37

Page 11: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Simply exponentiating the standard logistic regression model, we find:

π

1´ π“ eβ0`β1X1`¨¨¨`βpXp

The lefthand side of the equation is just the odds of observing Y “ 1,conditional on X :

oddspY “ 1 | X q “PrpY “ 1 | X qPrpY “ 0 | X q

Odds are not the same thing as probability (they can be anynon-negative value), but encode similar information. For example:

The probability of flipping heads on a fair coin is 12 .

The odds of flipping heads on a fair coin are 1{21{2 “ 1, or we often say

are “1 to 1.”

Ed Kroc (UBC) EPSE 682 January 28, 2020 11 / 37

Page 12: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Simply exponentiating the standard logistic regression model, we find:

π

1´ π“ eβ0`β1X1`¨¨¨`βpXp

The lefthand side of the equation is just the odds of observing Y “ 1,conditional on X :

oddspY “ 1 | X q “PrpY “ 1 | X qPrpY “ 0 | X q

Odds are not the same thing as probability (they can be anynon-negative value), but encode similar information. For example:

The probability of drawing an ace from a standard deck of 52 cards is452 “

113 .

The odds of drawing an ace from a standard deck of 52 cards are4{5248{52 “

112 , or we often say are “1 to 12.”

Ed Kroc (UBC) EPSE 682 January 28, 2020 12 / 37

Page 13: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Simply exponentiating the standard logistic regression model, we find:

π

1´ π“ eβ0`β1X1`¨¨¨`βpXp

The lefthand side of the equation is just the odds of observing Y “ 1,conditional on X :

oddspY “ 1 | X q “PrpY “ 1 | X qPrpY “ 0 | X q

Odds are not the same thing as probability (they can be anynon-negative value), but encode similar information.

Generally speaking, oddspY “ 1q « PrpY “ 1q only when PrpY “ 1qis small. This fact has clinical importance sometimes, e.g. whenmodelling the incidence rate of a rare disease.

Ed Kroc (UBC) EPSE 682 January 28, 2020 13 / 37

Page 14: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Let’s look at a simple logistic regression model on one predictor:

oddspY “ 1 | X “ xq “ eβ0`β1x

Now consider the odds ratio:

oddspY “ 1 | X “ xq

oddspY “ 1 | X “ x ´ 1q“

eβ0`β1x

eβ0`β1px´1q“ eβ1

Thus, the quantity eβ1 tells us how much we would expect the oddsof conditional “success” to change multiplicatively when X changesby 1 unit.

For instance, if β1 “ 0 (corresponding to no estimated linear effect ofX on the logit of Y ), then the odds ratio equals 1; i.e. changing Xdoesn’t seem to change the expected odds of success in Y at all.

Ed Kroc (UBC) EPSE 682 January 28, 2020 14 / 37

Page 15: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Consider the odds ratio:

oddspY “ 1 | X “ xq

oddspY “ 1 | X “ x ´ 1q“

eβ0`β1x

eβ0`β1px´1q“ eβ1

Thus, the quantity eβ1 tells us how much we would expect the oddsof conditional “success” to change multiplicatively when X changesby 1 unit.

If β1 is a large positive number, say β1 “ 10 (corresponding to alarge, positive estimated linear effect of X on the logit of Y ), thenthe odds ratio equals about 22,000; i.e. increasing X by a single unitgreatly increases the expected odds of success in Y .

Simiarly, if β1 is a large negative number, say β1 “ ´8 (correspondingto a large, negative estimated linear effect of X on the logit of Y ),then the odds ratio equals about 0.00033; i.e. increasing X by a singleunit greatly decreases the expected odds of success in Y .

Ed Kroc (UBC) EPSE 682 January 28, 2020 15 / 37

Page 16: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression: odds and odds ratios

Equivalently, one can talk about the log-odds ratio rather than theordinary odds ratio:

β1 “ log

oddspY “ 1 | X “ xq

oddspY “ 1 | X “ x ´ 1q

“ logroddspY “ 1 | X “ xqs ´ logroddspY “ 1 | X “ x ´ 1qs

The analogous interpretations hold (since the logarithm is amonotone increasing function), but now we speak about increases ordecreases in log-odds rather than just odds.

You will see different people use either scale to interpret their modelcoefficients. Regardless, it is always about describing if a change in apredictor X corresponds to an increase or decrease in theodds/probability of a success in Y . [Notice: the odds can increase ifand only if the probability of success increases.]

Ed Kroc (UBC) EPSE 682 January 28, 2020 16 / 37

Page 17: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLM

Note: all this readily extends to multiple predictors.

Ed Kroc (UBC) EPSE 682 January 28, 2020 17 / 37

Page 18: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Generalized linear models (GLMs)

Both ordinary linear regression and logistic regression are special casesof a much more general and flexible modelling framework: generalizedlinear models.

Note: the basic GLM framework always assumes that data points areindependently sampled; e.g. random sampling.

GLMs are specified in two pieces:

(1) Specify a probability distribution for the data process; this willgenerate a likelihood function for any set of sample data.

(2) Specify a link function, gpEpY | X qq that relates the mean of theresponse process to a (fixed) linear combination of the predictors.

We then use the likelihood function to estimate the conditional meanof the response process (conditional on the predictors).

Ed Kroc (UBC) EPSE 682 January 28, 2020 18 / 37

Page 19: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Ordinary linear regression as GLM

For the ordinary linear regression model, we first wrote

Y “

pÿ

j“0

βjXj ` ε,

where, as before, we define X0 “ 1, and ε „ Np0, σ2q.

Recall that an equivalent formulation of this model was to say thatthe probability distribution of the response Y (conditional on thepredictors) is Normal with mean

řpj“0 βjXj and variance σ2:

Y „ N

˜

pÿ

j“0

βjXj , σ2

¸

.

This is equivalent to Step 1 of specifying a GLM; i.e. we haveproposed a probability distribution for our data process.

Ed Kroc (UBC) EPSE 682 January 28, 2020 19 / 37

Page 20: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Ordinary linear regression as GLM

Note:

Y „ N

˜

pÿ

j“0

βjXj , σ2

¸

is sloppy (but fairly universal) notation. It would be far more accurateto write:

Y |X „ N

˜

pÿ

j“0

βjXj , σ2

¸

,

because we have defined a distribution for Y conditional on X . Thisis not the same thing as the unconditional distribution of Y(regardless of X ).

What this means is that you do not need the response variable to benormally distributed in ordinary linear regression (we already knewthis), but we do need the response variable to be normally distributedfor every set of fixed values of the predictors X .

Ed Kroc (UBC) EPSE 682 January 28, 2020 20 / 37

Page 21: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Ordinary linear regression as GLM

As we saw last time, the probability specification

Y |X „ N

˜

pÿ

j“0

βjXj , σ2

¸

generates a corresponding likelihood function for any set of sampledata:

`pβ0, . . . , βp, σ | x1, . . . , xp, yq “nź

i“1

1?

2πσe´pyi´

řpj“0

βj xj,iq2

2σ2

Notation: remember that bold-face denotes a vector. Since every oneof our n sample units generates an observation for x1, . . . , xp and y , Ihave just written

x1 :“ tx1,1, x1,2, . . . , x1,nu

x2 :“ tx2,1, x2,2, . . . , x2,nu, etc...

y :“ ty1, y2, . . . , ynu

Ed Kroc (UBC) EPSE 682 January 28, 2020 21 / 37

Page 22: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Ordinary linear regression as GLM

Moreover, the above probability specification implicitly defines thelink function relating the mean of the response process to the linearcombination of predictors. For ordinary linear regression, the linkfunction is just the identity function:

gpEpY | X qq “ EpY | X q

Recall that we wrote this out explicitly in Lecture 2: if

Y |X „ N

˜

pÿ

j“0

βjXj , σ2

¸

,

then

EpY | X q “pÿ

j“0

βjXj

Ed Kroc (UBC) EPSE 682 January 28, 2020 22 / 37

Page 23: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Ordinary linear regression as GLM

Thus, ordinary linear regression corresponds to a GLM with aGaussian (i.e. normal) likelihood and the identity link.

Note that a very special feature of Gaussian GLMs is that thevariance of the responses Y for any fixed X is totally indepedent ofthe (conditional) mean of the responses. This is a very helpful featurethat explains the popularity of Gaussian models.

As we will see though, such a modelling assumption is oftenunreasonable; i.e. we often expect the variance of responses about amean to actually depend on the value of the mean in some way.

Moreover, some data structures require such dependency.

Ed Kroc (UBC) EPSE 682 January 28, 2020 23 / 37

Page 24: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression as GLM

Recall the standard logistic regression model is:

log

ˆ

π

1´ π

˙

pÿ

j“0

βjXj ,

where π :“ PrpY “ 1 | X q.

Here, the link function is

gpEpY | X qq “ logitpEpY | X qq.

Why? Recall that for Bernoulli (0 or 1) responses,PrpY “ 1 | X q “ EpY | X q.

Thus,

logitpEpY | X qq “ log

ˆ

EpY | X q1´ EpY | X q

˙

“ log

ˆ

PrpY “ 1 | X q1´ PrpY “ 1 | X q

˙

Ed Kroc (UBC) EPSE 682 January 28, 2020 24 / 37

Page 25: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Logistic regression as GLM

Since each Y conditional on X is assumed Binomialpm, πq, we havethe likelihood function:

`pπ | x1, . . . , xp, yq “nź

i“1

ˆ

mi

yi

˙

πyi p1´ πqmi´yi

Recall that usually mi “ 1 always, but it is possible that you havemultiple observed responses for the same values of your predictors(especially true when your predictors are categorical).

Thus, logistic regression corresponds to a GLM with a Binomiallikelihood and the logit link.

Note also that for such a model, the mean and variance of theconditional response process Y are inherently dependent, since forBernoulli Y we have

VarpY | X q “ πp1´ πq “ EpY | X q ¨ p1´ EpY | X qq.

Ed Kroc (UBC) EPSE 682 January 28, 2020 25 / 37

Page 26: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLMNow it should be clearer as to why we do not need to write down anexplicit error term for our GLM: it is implicitly defined once we specifythe link function.

For Gaussian regression with the identity link (ordinary linearregression), the model is totally defined via

Y |X „ N

˜

pÿ

j“0

βjXj , σ2

¸

.

This says that the response data are normally distributed about the

conditional mean EpY | X q “pÿ

j“0

βjXj with constant variance σ2.

Therefore, the individual model errors (theoretical residuals) are:

ε “ Y ´ EpY | X q

“ Y ´pÿ

j“0

βjXj “ Y ´ pY

Ed Kroc (UBC) EPSE 682 January 28, 2020 26 / 37

Page 27: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLM

This matches exactly with the traditional construal of ordinary linearregression:

Note: assuming Y |X „ N

˜

pÿ

j“0

βjXj , σ2

¸

implies ε „ Np0, σ2q, or

more precisely, implies ε | X „ Np0, σ2q.

Ed Kroc (UBC) EPSE 682 January 28, 2020 27 / 37

Page 28: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLM

Similarly, for Binomial/Bernoulli regression with the logit link (logisticregression), the individual model errors will simply be the deviationsof the observed responses, Y , from the conditional mean of theresponse, EpY | X q “ PrpY “ 1 | X q:

ε “ Y ´ EpY | X q

“ Y ´1

1` expp´řp

j“0 βjXjq

Note: because we are modelling

gpEpY | X qq “ logitpEpY | X qq “pÿ

j“0

βjXj ,

when we solved this equation for EpY | X q we derived the expitfunction:

EpY | X q “1

1` expp´řp

j“0 βjXjq

Ed Kroc (UBC) EPSE 682 January 28, 2020 28 / 37

Page 29: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLM

Because the response process is binary, these model errors are alsobinary for any fixed values of the predictors X . This looks verydifferent than ordinary linear regression!

Fix a particular set of values for the predictors X . Then our modelsays that our best guess for the (conditional) mean of the response(i.e. the conditional probability that Y “ 1) is:

EpY | X q “1

1` expp´řp

j“0 βjXjq

But since Y can only equal 0 or 1, the (conditional) model error canonly assume one of two values:

0´1

1` expp´řp

j“0 βjXjq

or

1´1

1` expp´řp

j“0 βjXjq

Ed Kroc (UBC) EPSE 682 January 28, 2020 29 / 37

Page 30: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Residuals of a GLM

Thus, the error term is a transformed Bernoulli random variable.

Note: this directly parallels ordinary linear regression (assumedGaussian response) where the error term was a transformed Gaussianrandom variable.

We will return to diagnosing model fit by examining residuals soon.

Ed Kroc (UBC) EPSE 682 January 28, 2020 30 / 37

Page 31: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Some important and common GLMs

Multinomial regression: categorical response with more than twocategories (usually AVOID at all costs).

Poisson regression, ZIP regression, negative binomial regression:count response data.

Gamma regression, lognormal regression, inverse Gaussian regression:continuous, but skewed response data.

Beta regression, ZOIB regression: proportion/percentage (0-100%)response data.

...and many others.

We will talk about most of these at least a little.

Ed Kroc (UBC) EPSE 682 January 28, 2020 31 / 37

Page 32: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial regression

As a general rule, multinomial regression should be avoided at allcosts, unless you are working with at least 1000’s of observations.

It often relies on suspicious assumptions, is notoriously difficult toperform diagnostics with, and suffers from extremely low power.

But you still occasionally see it used (poorly) in small data settings,so let’s just briefly introduce it.

Ed Kroc (UBC) EPSE 682 January 28, 2020 32 / 37

Page 33: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial regression

The response data Y are now categorical on more than twocategories, so take values in t0, 1, 2, . . . , ku.

Just as in regression with Binomial (0/1) data, if we can model theprobabilities of Y falling into any of its k ` 1 categories, then weknow everything about the random phenomenon.

Now though, there are k probabilities that we need to model:

π1 :“ PrpY “ 1 | X qπ2 :“ PrpY “ 2 | X q

......

πk :“ PrpY “ k | X q

Note that if we knew these, then we would automatically know

π0 :“ PrpY “ 0 | X q “ 1´ π1 ´ π2 ´ ¨ ¨ ¨ ´ πk

Ed Kroc (UBC) EPSE 682 January 28, 2020 33 / 37

Page 34: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial regression

The most typical way you see this modelled is via multinomial logisticregression, another kind of GLM.

Remember: 2 pieces to define a GLM:(1) a likelihood for the data, and(2) a link function relating the mean of the response (conditional on the

predictors) to a linear combination of the predictors.

Here, we assume a Multinomial model for the response data (samplesize n).

This is the natural generalization of a Binomial phenomenon on morethan two categories; in particular, it assumes that each sampleobservation of Y P t0, 1, . . . , ku is i.i.d. (independent and identicallydistributed).

Thus, while somewhat flexible, this does not characterize all possibleprobability distributions on the theoretical sample space.

Ed Kroc (UBC) EPSE 682 January 28, 2020 34 / 37

Page 35: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial regression

To make this easier to write down, suppose our n sample observationsgenerate z0 counts of Y “ 0, z1 counts of Y “ 1, . . . , and zk countsof Y “ k. Then:

`pπ0, π1, . . . , πk | x1, . . . , xp, yq “nź

i“1

ˆ

n

z0, z1, . . . , zk

˙

πz00 πz11 ¨ ¨ ¨π

zkk ,

where

ˆ

n

z0, z1, . . . , zk

˙

is a multinomial coefficient that simply counts

the number of ways to simultaneously select z0 objects of type 0, z1objects of type 1, . . . , and zk objects of type k from n objects.[Think: how many ways could you arrange 2 green balls, 3 red balls,and 1 blue ball in a straight line?]

Notice: if you sub-in k “ 1, this model reduces back to the Binomialmodel.

Ed Kroc (UBC) EPSE 682 January 28, 2020 35 / 37

Page 36: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial logistic regression

Standard multinomial regression now requires you to declare areference category and then posits a link function between the meannumber of outcomes in each of the k other categories.

In Binomial regression, the choice was arbitrary (i.e. either “0” or “1”could be declared a Bernoulli “success”), but now we have options;the decision is made according to the research problem.

We will declare Y “ 0 as the reference category. Then multinomiallogistic regression posits:

log

ˆ

πiπ0

˙

pÿ

j“0

βi ,jXj , 1 ď i ď k

Notice that we are now modelling k different logit responses, eachwith their own set of coefficients that capture the linear associationsbetween the predictor and logit response.

Ed Kroc (UBC) EPSE 682 January 28, 2020 36 / 37

Page 37: EPSE 682: Multivariate Analysis - Weebly · EPSE 682: Multivariate Analysis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 28, 2020 Ed Kroc (UBC) EPSE 682 January 28,

Multinomial logistic regression

Everything now proceeds as with Binomial logistic regression, exceptwe would no longer talk about the “odds of success” increasing ordecreasing with values of X (estimated via the model coefficients).

Instead, now we would talk about the “odds of being in category irelative to the reference category 0”. E.g. odds of voting NDPvs. Conservative, Green vs. Conservative, and Liberal vs. Conservative.

One major problem with this kind of model is that it assumesindependence of irrelevant alternatives (IIA). That is, removing onecategory from consideration should not change the odds ofresponding in another category.

This assumption is often unreasonable in practice. E.g. if there wereno Green candidates, it is very likely that more people would voteNDP and/or Liberal vs. Conservative, thus, changing these odds.

Some other multinomial regression models relax this assumption, butit is very difficult to do well.

Ed Kroc (UBC) EPSE 682 January 28, 2020 37 / 37