32
8/12/2019 SA3 Tutorial 2 http://slidepdf.com/reader/full/sa3-tutorial-2 1/32 Count Data Models

SA3 Tutorial 2

Embed Size (px)

Citation preview

Page 1: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 1/32

Count Data Models

Page 2: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 2/32

Poisson distribution - Examples

• Insurance claims in an year• Number of customers in a waiting line

• Number of defects in a given surface area

• Number of road construction projects in a city at a given tim

• Number of road accidents that occur on a particular stretcha week

Notice that all the above are counts.

Page 3: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 3/32

Poisson Distribution - Properties

!/)(Pr    xe x X ob   x  

Mean =  Variance =  

Page 4: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 4/32

Page 5: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 5/32

Page 6: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 6/32

Page 7: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 7/32

Regression model - I

• Linear regression• Assumptions

• Linear

• Independent

• Normal

• Equal Variance

For each value of the regressor (X), the distribution of the response (Y) is thefor a linear shift

Page 8: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 8/32

Page 9: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 9/32

Poisson Regression - Assumptions

• Assumptions

1. Probability function of response (given ) is Poisson withparameter .

2.

3. Observations are independently distributed.

  

k k  X  X e

        

    ...110

Page 10: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 10/32

Regression model  – II: PoissonRegression

• The mean of the distribution is positive. In poisson regressiothe response is modeled as a linear function of the regresso

Page 11: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 11/32

Interpretation of β 

Page 12: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 12/32

Interpretation of β (contd)

Page 13: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 13/32

Page 14: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 14/32

Linear regression - Results

Page 15: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 15/32

Poisson Regression - Results

Page 16: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 16/32

Page 17: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 17/32

For poisson model,

Deviance,  

  

 

i

in

i   i

 y y D

 ˆlog

1

Page 18: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 18/32

Property of deviance

•Deviance has approximately Chi-square distribution with n  

degrees of freedom where n is the number of observations the number of parameters.

Page 19: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 19/32

Interpretation of the outputs

• Consider linear regression.

• The regression equation is:

• predicted number of claims = -1.12 +0.51 numveh +0.02 age.• For numveh = 1 and age = 25, the expected number of claims is -0.22

can never be negative!

• Consider the poisson regression.

• The regression equation is

•  predicted number of log claims = -3.20 + 0.74 numveh + 0.03 age

• For numveh = 1 and age = 25, the expected number of claims is

a positive number

Page 20: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 20/32

Interpretation of the outputs

• Linear regression

• If the number of vehicles increases by 1 (whether it is from 1 to 2 or frthe age remains the same, the number of claims increases by about 0

• Poisson Regression

• If the number of vehicles increases by 1, while the age remains the sanumber of claims get multiplied by =2.1.

• For age =40 and numveh = 2, the predicted number of claims is =

• For age = 40 and numveh =3, the predicted no. of claims is 0.59 X 2.1=

• For age = 40 and numveh =4, the predicted no. of claims is 1.24 X 2.1=

74.0e

52.0e

Page 21: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 21/32

Which model to use?

• We already saw that linear regression can give negative valupredicted count.

• Poisson Regression always gives a positive value.

• We can look at the plot of Pearson residuals against the regand see which looks better.

• We can also look at the plots of predicted vs actual observeeach case and see which fit is better.

Page 22: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 22/32

When is poisson regression appropriate?

• The response is count data.

• The conditional distribution of Y given X is poisson.

• Mean of Y is less than 10, preferably between 1 and 5.

• If mean is greater than 10 one can try the linear regression of log Y agregressors.

Page 23: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 23/32

Caveat

• Heterogeneity in data

• Data collected in clusters

• Missing regressors

• Consequence  –  Overdispersion (variance > mean)

Page 24: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 24/32

Overdispersion

Page 25: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 25/32

Overdispersion

• Overall Poisson look• Different clusters

• Overdispersion: variance > mean

Page 26: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 26/32

Page 27: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 27/32

Page 28: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 28/32

Negative binomial model

• Y follows negative binomial with mean µ and variance µ+k µ

• Now use the poisson type model.

• How to detect overdispersion?

• Fit a poisson model and obtain plots of Pearson residuals vs regres

• If one or more of these show a funnel type shape, then there isoverdispersion. If you find this use negative binomial regression.

Page 29: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 29/32

Zero-inflated Poisson model

• Prob(Y=0)=p

• Prob(Y ~Poisson(α)=1  –  p.

• So Prob(Y=0) =p+ (1-p)e^-α 

• And Prob (Y=r)= (1-p)(e^-α)(α^r)/r!

Page 30: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 30/32

Page 31: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 31/32

Hurdle model

• Prob(Y=0)=p

• Prob(Y>0)=1-p

• (Y=r|Y>0) has a truncated poisson truncated at 0.

• So ,...2,1),!)1/(()1()(Pr      r r e pr Y ob   r      

Page 32: SA3 Tutorial 2

8/12/2019 SA3 Tutorial 2

http://slidepdf.com/reader/full/sa3-tutorial-2 32/32

Thank You!