Models for counts Extensionstorepeatedmeasurementspublicifsv.sund.ku.dk/~jufo/courses/nfa2016/counts2015-nup.pdf · u d FacultyofHealthSciences Models for counts Analysis of repeated

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Faculty of Health Sciences

Models for counts

Analysis of repeated measurements, NFA 2016

Julie Lyng Forman & Lene Theil Skovgaard

Department of Biostatistics, University of Copenhagen

2016

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Contents

Repetition of generalized linear models (GLMs).

◮ with emphais on log-linear models for count data.

Extensions to repeated measurements.

◮ Marginal models and generalized estimating equations (GEE).

◮ Generalized linear mixed models.

◮ Comparison of the two types of models.

Case studies with data from randomized clinical trials:

◮ Effect of antibiotics on leprosy bacilli

◮ Epileptic seizures.

Suggested reading: Fitzmaurice et al. (2011): chapters 11–16.2 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Outline

Repetition of generalized linear models

Marginal models

Generalized linear mixed models

Comparison of Marginal models and GLMMs

3 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Repetition: Generalized linear models

For outcomes that are binary, interger valued, or always positive alinear model may not be reasonable since the mean is constrainedto the [0; 1]-interval or the positive numbers.

A generalized linear model is specified by:

1. A distributional assumption: binomial (especially binary),poisson, negative binomial, gamma, normal (!!), etc(There is a large selection in PROC GENMOD).

2. A link function g that links the expected value µi = E(Yi) tothe covariates on an apropriate scale, i.e.

g(µi) = β0 + β1X1i + . . . + βkXki = XTi β

Note: So far we consider only i.i.d. data – no repetitions yet!4 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

What’s the link?

Generalized linear models are similar to multiple regression models- - but on a scale that matches the data:

◮ Continous data (dist=normal, link=identity)

Linear regression: E(Y ) = α + βX.

◮ Count data (dist=poisson, link=log)

Log-linear regression: log{E(Y )} = α + βX

◮ Binary data (dist=binomial, link=logit), (lecture 6)

Logistic regression: log{

E(Y )1−E(Y )

}

= α + βX

5 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Case: Epileptic seizures

Randomized clinical trial:

◮ 30 treated with progabide.

◮ 28 treated with placebo.

Outcome: number of epileptic seizures during

◮ An 8-week interval before treatment

◮ Four consecutive 2-week intervals following treatment

◮ We consider rates of seizures per week

Reference: Thall, P.F. and Vail, S.C. (1990). Some covariance models for

longitudinal count data with overdispersion. Biometrics.

6 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Spaghettiplot - version 1

Number of seizures for each period.

Higher level in the first period; 8 weeks vs 2 weeks.7 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Spaghettiplot - version 2

Seizures per week for each period.

Lower variability in the first period; 8 weeks vs 2 weeks.8 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Epilepsy: summary statistics

The MEANS Procedure

Analysis Variable : seizures

treatment time Obs N Mean Variance

-------------------------------------------------------------------------

0 0 28 28 30.7857143 681.4338624

1 28 28 9.3571429 102.7566138

2 28 28 8.2857143 66.6560847

3 28 28 8.7857143 215.2857143

4 28 28 8.0000000 57.9259259

1 0 31 31 31.6451613 783.6365591

1 31 31 8.5806452 332.7182796

2 31 31 8.4193548 140.6516129

3 31 31 8.1290323 193.0494624

4 31 31 6.7419355 126.6645161

-------------------------------------------------------------------------

Note: Data is overdispersed (variance is > mean).9 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Distribution at final follow-up

Do we see a difference between the groups . . .

◮ Need a model for the expected rate of seizures.

◮ The distribution is obviously not normal; it is integer valuedand markedly skew, with a major outlier.

◮ Maybe a Poisson model fits?10 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

The poisson distribution

Counts with no well-defined upper limit, e.g.:

◮ Number of cancer cases during a specific year.

◮ Number of positive swabs over a certain period of time.

Poisson distribution with mean value: µ =1,2,5 and 20

Note: The mean and variance are equal

◮ This fact is unfortunately often overlooked....11 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Variance in GLMs

Every choice of distribution implies a particular variance function:

Var(Y ) = φ · ν(µ)

where µ is the mean and φ is the so-called dispersion parameter.

Examples:

◮ Normal distribution: ν(µ) = 1 and φ = σ2.The variance is not related to the mean.

◮ Poisson distribution ν(µ) = µ and φ = 1.The variance is identical to the mean.

◮ Negative binomial distribution ν(µ) = µ + φµ2.The variance is proportional to the squared mean.

12 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Methods for handling overdispersion

In the Poisson model the variance is identical to the mean.

◮ But what if data has overdispersion?

. . . caused by heterogeneity due to omitted covariates orunrecognized clustering (such as a group of non-susceptibles)

Alternatives to the Poisson model:

◮ A negative binomial model (next slide).

◮ A semi-parametric model with mean structure and variancefunction as in the Poisson model and with an additionaldispersion parameter⋆.

⋆ This has consequences for statistical inferece; parameters must be fitted usinga quasi likelihood method instead of conventional maximum likelihood (ML).

13 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

The negative binomial distributionAn alternative to the Poisson model.

◮ similar model for the mean (link-function: log)

◮ with overdispersion ν(µ) = µ + φµ2

Poisson distribution with mean 10, followed by 3 negative binomialdistributions, with same mean and variances 30, 110, and 210:

14 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Analysis of epilepsy data: proc genmod

Outcome: Rate of seizures in last period (no repetitions yet!).

title1 ’Poisson model: treatment effect in period 4’;

PROC GENMOD DATA=epilepsy;

WHERE time=4; /* only period 4 */

CLASS treatment (REF=’0’);

MODEL seizures=treatment / DIST=poisson LINK=log

OFFSET=logweeks TYPE3;

RUN;

Want to compare with:

◮ Negative binomial model: change to DIST=negbin.

◮ Overdispersion correction: Add SCALE=PEARSON to the optionsfollowing the model-statement.

15 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Programming notes

Similar to linear models for continuous/normal data:

◮ CLASS specifies the categorical variables and MODEL therelation between the outcome and the covariates.

Specific to generalized linear models:

◮ DIST specifies the distribution (e.g. poisson) and the relateddefault-link unless this is overuled by LINK.

◮ OFFSET allows for comparison of rates (rather than counts)over periods of variable length. Note that the offset-variable(logweeks) must be the log of the length of the period.

◮ Predicted values⋆ (only two distinct) can be saved with

OUTPUT OUT=glmfit PREDICTED=pred;

⋆ We don’t want to save the residuals since they are difficult to interpretwhen data is not normally distributed.

16 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Results from poisson model

Analysis Of Maximum Likelihood Parameter Estimates

Standard Wald 95% Confidence Wald

Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq

Intercept 1 1.3863 0.0668 1.2553 1.5172 430.49 <.0001

treatment 1 1 -0.1711 0.0962 -0.3596 0.0174 3.17 0.0752

treatment 0 0 0.0000 0.0000 0.0000 0.0000 . .

Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

LR Statistics For Type 3 Analysis

Source DF Chi-Square Pr > ChiSq

treatment 1 3.17 0.0751

Estimated difference in rate of seizures with progabide compared toplacebo is -0.17 with 95% CI (-0.36;0.02) - on log-scale!!!

◮ Back-transformation: exp(−0.1711) = 0.8427.

◮ We see 16% lower seizure rate in the Progabide group(95% CI: between 30% lower and 2% higher).

◮ Actually we can make SAS do this for us . . .17 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Estimate statements

PROC GENMOD DATA=epilepsy;

WHERE time=4;

CLASS treatment (REF=’0’);

MODEL seizures=treatment / NOINT DIST=poisson OFFSET=logweeks;

ESTIMATE ’Rate Progabide’ treatment 1 0;

ESTIMATE ’Rate Placebo’ treatment 0 1;

ESTIMATE ’Rate Ratio’ treatment 1 -1;

RUN;

Compares the parameters (log-means) of two treatment groups:

◮ Easiest to keep track of comparisons if we can identify thelog-means in the output, so we leave out the intercept.

◮ The ordering matters: first progabide, last placebo (ref).We specify "1" for progabide compared to " -1" for placebo(if there were more groups they would get a "0")

18 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Output from estimate statements

◮ L’Beta: usual estimate on log-scale.

◮ Mean: has been back-transformed.

Contrast Estimate Results

Mean Mean L’Beta Standard L’Beta

Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits

Rate Progabide 3.3710 2.9436 3.8604 1.2152 0.0692 0.05 1.0796 1.3508

Rate Placebo 4.0000 3.5090 4.5597 1.3863 0.0668 0.05 1.2553 1.5172

Rate Ratio 0.8427 0.6980 1.0176 -0.1711 0.0962 0.05 -0.3596 0.0174

We also get p-values for the hypothesis H0 : "parameter=0" .


Chi-

Label Square Pr > ChiSq

Rate Progabide 308.63 <.0001

Rate Placebo 430.49 <.0001

Rate Ratio 3.17 0.0752

19 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Overdispersion

BUT: Is the Poisson model realistic?!?

Estimates of treatment effect from the three different models

Model Rate ration (95% CI) P-value

Poisson (no overdispersion) 0.8427 (0.6980;1.0176) 0.0752Overdispersion correction 0.8427 (0.4236;1.6766) 0.6259Negative binomial 0.8427 (0.4964;1.4307) 0.5264

When overdispersion is disregarded:

◮ standard errors are downwards biased (CIs are too narrow).

◮ type I error rate is inflated (p-values are too small).

20 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

What if there is an upper limit for the counts?

E.g. number of days away from work in a month.

◮ The distribution of number out of n is binomialMore about binomial models in lecture 6).

◮ Unless n is large, then the binomial model becomesuntractable and we might use the Poisson as anapproximation.

The law of rare events:If the count parameter n in the Binomial distribution is large andthe probability parameter p is small (i.e. close to 0), the Binomialdistribution is approximately the same as the Poisson distribution:

P (Y = m) ≃µm

m!exp(−µ)

where µ = np is the mean value.21 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Outline


Marginal models



22 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

What about repeated measurements?

In case of normally distributed data:

◮ Replace the univariate normal model by a multivariate.

But what if data is binary or consists of counts?

◮ No multivariate binomial or poisson distribution!

Alternative: specify a semi-parametric model by

1. Mean E(Yij) = µij = g−1(XTijβ) with link-function g.

2. Variance V(Yij) = φ · ν(µij) with variance function ν.

3. A correlation pattern matrix Corr(Yi) = Λ.

Mean and variance is similar to a conventional GLM but we don’tassume any particular distribution for the data. This partial modelspecification is in fact sufficient to make valid inference for β.23 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Generalized estimating equations (GEE)

Find the generalized least squares estimator β̂ by minimizing

N∑

i=1

{yi − µi(β)}T V −1i ({yi − µi(β)} = 0

where Vi is the working covariance for subject i specified by

◮ The variances φν(µ1i), . . . , φν(µki).

◮ The working correlation matrix Λ.

In practice, alternate between two steps until convergence:

1. Solve for β with the current estimate of Vi fixed.

2. Estimate φ and the correlation parameters in Λ from thescaled residuals eij = ν(µ̂ij)−1/2(Yij − µ̂ij) and update Vi.

24 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Properties of GEE

◮ The GEE-estimator is robust. I.e. it is valid even when theworking covariance is misspecified.

◮ But only if SEs are estimated with the sandwichcovariance estimator (default in proc genmod) which alsohandles overdispersion and time-varying dispersion correctly.

◮ When there are missing data that are missing at random butnot missing completely at random inverse probabilityweighting is needed to get unbiased estimates and SEs.

Caution:

◮ The sandwich covariance estimator is known to performpoorly in small datasets; it is anti-conservative!

◮ And it cannot be used at all if the design is unbalanced.

25 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Choosing a working covariance

Possibile choices in proc genmod, all giving valid inference:

◮ unstructured (type=un)

◮ compound symmetry (type=cs)

◮ autoregressive (type=ar)

◮ working independence (type=ind)

Why not avoid choosing the correlation alltoghether by alwaysusing working independence (which is easiest to compute)?

◮ Because we get more power when Vi is close to the truth.

26 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Case study: Counts of leprosy bacilli

Randomized clinical trial:

◮ 10 patients treated with antibiotic drug=A

◮ 10 patients treated with antibiotic drug=B

◮ 10 patients treated with placebo drug=P

Outcome: total number of bacilli at six sites of the body,

◮ just before treatment (time=0)

◮ several months after treatment, (time=1)

Reference: Snedecor, G.W. and Cochran, W.G. (1967), Statistical Methods,

(6th edn), Iowa State University Press

27 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: spaghettiplots

Number of bacilli at baseline and follow-up.

28 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: summary statistics

Analysis Variable : bacilli

drug time Obs N Mean Variance

-------------------------------------------------------------

A 0 10 10 9.3000000 22.6777778

1 10 10 5.3000000 21.5666667

B 0 10 10 10.0000000 27.5555556

1 10 10 6.1000000 37.8777778

P 0 10 10 12.9000000 15.6555556

1 10 10 12.3000000 51.1222222

-------------------------------------------------------------

Note: overdispersion and higher variance at follow-up.

29 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Model reflections

◮ We are dealing with counts, so it is natural to consider aPoisson-like distribution with log-link, even though data ismost likely not exactly Poisson due to the overdispersion .

◮ The design is that of a classical baseline follow-up study.Covariates are drug and time and the treatment effect isexpected to show as a drug*time-interaction.

◮ Because treatment was randomized, the mean values atbaseline should be identical across the three groups

◮ We have replicate measurements on each subject. Thus weneed to account for correlation to get valid results.

◮ In fact, there is only one correlation, so type=un, type=cs,and type=ar all describe the same correlation pattern.

30 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Model for the treatment effects

Mean values on log-scale:

Treatment Period Mean (on log scale)

P Baseline β1

P Follow-up β1 + β2

A Baseline β1

A Follow-up β1 + β2 + β3

B Baseline β1

B Follow-up β1 + β2 + β4

β2 denotes time-change from baseline to follow-up on placebo.

β3 and β4 denote differences in time-effect of drug A and Bcompared to placebo: the treatment effects.

31 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: proc genmod

1. First add the variable adjdrug (effective treatment) to the data.

DATA leprosy;

SET leprosy;

drugadj = drug;

IF time = 1 THEN drugadj = ’P’;

RUN;

2. Next do the baseline follow-up analysis in proc genmod:

PROC GENMOD DATA=leprosy;

CLASS id time (REF=’0’) drugadj (REF=’P’);

MODEL bacilli=time drugadj*time / DIST=poisson LINK=log TYPE3;

REPEATED SUBJECT=id / WITHINSUBJECT=time TYPE=un CORRW;

RUN;

32 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Programming notes

◮ Relation between outcome and covariates is specified similiarto continuous/normal data (i.e. like in proc mixed).

◮ DIST=poisson by default specifies the LINK-function as log,and the variance function ν(µ) = µ. Note that we don’tactually assume that data has a poisson distribution because amarginal model (gee) is default with the repeated statement.

◮ REPEATED identifies subjects (or clusters) and specifies theworking correlation for the repeated measurements (looks alot like proc mixed but note the different ordering of options).

◮ WITHINSUBJECT=time is needed to get the correct temporalordering of replicates in longitudinal data. In clustered datawhere individuals are exchangeable this is omitted.

◮ CORRW prints the estimated working correlation.33 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: output from proc genmod

Model Information

Data Set WORK.LEPROSY

Distribution Poisson

Link Function Log

Dependent Variable bacilli

Number of Observations Read 60

Number of Observations Used 60

Class Level Information

Class Levels Values

id 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30

time 2 1 0

drugadj 3 A B P

Parameter Information

Parameter Effect time drugadj

Prm1 Intercept

Prm2 time 1

Prm3 time 0

Prm4 time*drugadj 1 A

Prm5 time*drugadj 1 B

Prm6 time*drugadj 1 P

Prm7 time*drugadj 0 P34 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �


GEE Model Information

Correlation Structure Unstructured

Within-Subject Effect time (2 levels)

Subject Effect id (30 levels)

Number of Clusters 30

Correlation Matrix Dimension 2

Maximum Cluster Size 2

Minimum Cluster Size 2

Algorithm converged.

Working Correlation Matrix

Col1 Col2

Row1 1.0000 0.7966

Row2 0.7966 1.0000

GEE Fit Criteria

QIC -405.7657

QICu -403.6905

ATT: Always check that the numerical optimisation has converged.35 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �


Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Standard 95% Confidence

Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.3734 0.0801 2.2163 2.5304 29.62 <.0001

time 1 -0.0138 0.1573 -0.3222 0.2946 -0.09 0.9300

time 0 0.0000 0.0000 0.0000 0.0000 . .

time*drugadj 1 A -0.5406 0.2186 -0.9690 -0.1122 -2.47 0.0134

time*drugadj 1 B -0.4791 0.2279 -0.9257 -0.0325 -2.10 0.0355

time*drugadj 1 P 0.0000 0.0000 0.0000 0.0000 . .

time*drugadj 0 P 0.0000 0.0000 0.0000 0.0000 . .

Score Statistics For Type 3 GEE Analysis

Source DF Chi-Square Pr > ChiSq

time 1 13.90 0.0002

time*drugadj 2 4.56 0.1024

Do we see a significant p-value for the treatment effect - ???

◮ Main interest: compare each drug in turn with placebo.

◮ A and B are have similar effect, so we don’t see the significancewhen comparing all three to each other (like in an ANOVA).

36 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Back-transformed estimates

1. It is easier to identify means for treatments over time whenmain effect of time and intercept are left out of MODEL.

2. Order: "A time 1", "B time 1", "P time 1", "All time 0"(ref).

PROC GENMOD DATA=leprosy;

CLASS id time (REF=’0’) drugadj (REF=’P’);

MODEL bacilli=drugadj*time / NOINT DIST=poisson LINK=log;

REPEATED SUBJECT=id / WITHINSUBJECT=time TYPE=un CORRW;

ESTIMATE ’change with A’ drugadj*time 1 0 0 -1;

ESTIMATE ’change with B’ drugadj*time 0 1 0 -1;

ESTIMATE ’change with P’ drugadj*time 0 0 1 -1;

ESTIMATE ’A vs placebo ’ drugadj*time 1 0 -1 0;

ESTIMATE ’B vs placebo ’ drugadj*time 0 1 -1 0;

ESTIMATE ’drug A vs B ’ drugadj*time 1 -1 0 0 ;

RUN;37 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: Results

◮ L’Beta: usual estimate on log-scale.

◮ Mean: has been back-transformed.


Mean Mean L’Beta Standard L’Beta

Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits

change with A 0.5744 0.4281 0.7706 -0.5544 0.1499 0.05 -0.8483 -0.2605

change with B 0.6109 0.4478 0.8333 -0.4929 0.1585 0.05 -0.8035 -0.1823

change with P 0.9863 0.7245 1.3425 -0.0138 0.1573 0.05 -0.3222 0.2946

A vs placebo 0.5824 0.3795 0.8939 -0.5406 0.2186 0.05 -0.9690 -0.1122

B vs placebo 0.6194 0.3963 0.9681 -0.4791 0.2279 0.05 -0.9257 -0.0325

A vs B 0.9403 0.6148 1.4381 -0.0615 0.2168 0.05 -0.4864 0.3633

(P-values omitted due to lack of space)

Conclusion:Mean count of bacilli decresed by -43% with drug A (CI: -68%;-23%),-39% with drug B (-56%;-17%), and -1% with placobo (-28%;+34%).Thus with drug A an additional reduction of -42% (-62%;-11%, P=0.01)was seen compared with placebo. Similarly with drug B the additionalreduction was -38% (-60%;-3%, P=0.04) compared with placebo.38 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Leprosy bacilli: Modeling choices

Note: Assumed indepence give similar estimates to workingindependence, but qualitatively different inference . . .39 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Estimates and SEs for "B" vs Placebo (log-scale)

Model for repeated Poisson Negative binomialmeasurements distribution distribution

Semi-parametric:GEE Unstructured -0.48 (0.23) -0.50 (0.24)GEE Working indep. -0.70 (0.35) -0.70 (0.35)

Specific distribution:Assumed independence -0.70 (0.16) -0.70 (0.29)

◮ GEE can account for correlation, overdispersion, time- andtreatment-dependent dispersion.

◮ Choice of distribution has little impact on GEE. But workingindependence is a less efficient choice than unstructured.

◮ Assumed independence is unrealistic; Poisson likewise.40 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Pros and cons of marginal models

Advantages

◮ Minimal model assumptions, hence fewer wrong assumptions.

◮ High flexibility; can model many different kinds of data.

◮ Computationally simple.

Drawbacks

◮ The sandwich covariance estimator is anti-conservative whensamples are small and model based covariance estimates areonly valid if the working covariance is correct. In particular,compound symmetry must be true if the design is unbalanced.

◮ Need a model for the missingness to handle missing data.

◮ Power calculations demands a fullly specified model.

◮ Possible to specify mean-covariance combinations that do notmatch any multivariate distribution (binary data in particular).

41 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Outline


Marginal models



42 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

GLMMs are variance component models

In parallel to linear mixed models, we specify a model with subjectspecific parameters bi (random effects):

g(µij) = XTijβ + ZT

ijbi

Model assumptions:

1. bi ∼ N(0, G) and independent of the covariates Xi.

2. For any subject the repeated measurements are conditionallyindpendent given the random effect(s) and distributedaccording to a GLM with prespecified distribution, linkfunction g, and variance function Var(Yij |bi) = φ · ν(µij).

◮ This fully specifies a multivariate model.

◮ Correlation between replicates is induced by and solelydue to the random effect(s).

43 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Interpretion of regression coefficients in a GLMM

The effect of a covariate is interpreted for fixed values of allother factors including the random effect.

◮ It makes sense to interpret the effect of time on the individualas time is usually a within subject covariate.

◮ It makes sense to compare treatments that has beenrandomized (the distribution of random effects is identicalacross treatment groups).

◮ But what about covariates such as gender; Is it reasonable tocompare a man of averge frailty to a woman of average frailty?

We say that GLMM regression parameters are subject specific.

44 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Interpretation of variance components

Variability among subject is modeled by the random effects bi

which are assumed to follow a normal distribution.

◮ It is difficult to check the validity of the normal assumption,but fortunately inference for the regression parameters is notvery sensitive to misspecification of the distribution.

◮ We could use the normal range for the bi’s to visualizevariation between subjects, but be aware that this is amodel-based extrapolation strongly influenced by the normalassumption.

◮ The random effects can be estimated by BLUPs but againthese estimates are also strongly influenced by the normalassumption and thus no good for checking it!

45 / 59

u � � � � � � � � � � � � � � � � � � d � � � � � � � � � � � � � � � � � � � �

Technical note on inference for GLMMs⋆

Maximize the likelihood function:

N∏

i=1

∫

f(yi|bi, β, φ)f(bi|G)dbi

This is a computational challenge.

◮ Integrals must be evaluated by numerical integration. Thepreferable option is Gaussian quadrature with a suitablylarge number of quadrature points.

◮ Infeasible with many random effects or when data is large.

◮ Beware of local maxima! Ideally an initial grid search shouldbe performed over a range of covariance parameters.

46 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Leprosy bacilli: proc glimmix

PROC GLIMMIX DATA=leprosy METHOD=QUAD(QPOINTS=50);

CLASS id time drugadj;

MODEL bacilli = time*drugadj / NOINT DIST=poisson LINK=log;

RANDOM intercept / SUBJECT=id G;

OUTPUT OUT=glmmfit PRED(ILINK)=ipred PRED(ILINK NOBLUP)=pred;

ESTIMATE ’change with A’ drugadj*time 1 0 0 -1 / EXP CL;

ESTIMATE ’change with B’ drugadj*time 0 1 0 -1 / EXP CL;

ESTIMATE ’change with P’ drugadj*time 0 0 1 -1 / EXP CL;

ESTIMATE ’A vs placebo ’ drugadj*time 1 0 -1 0 / EXP CL;

ESTIMATE ’B vs placebo ’ drugadj*time 0 1 -1 0 / EXP CL;

ESTIMATE ’drug A vs B ’ drugadj*time 1 -1 0 0 / EXP CL ;

RUN;

47 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Programming notes

Overall similar to the marginal model, save from:

◮ METHOD=QUAD: approximates the likelihood function byGaussian quadrature.

◮ QPOINTS=50: more quadrature points → better accuracy.

◮ RANDOM: here we only have a random intercept, so it is notnecessary to specify a TYPE.

(otherwise choose un or perhaps vc if it is reasonable toassume independence between the random effects).

◮ G: prints the estimated G-matrix; here σ2b .

◮ OUTPUT OUT= saves the predicted values in an dataset.Predictions are individual unless the NOBLUP-option is used.The ILINK-option makes back-transformation.

48 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Leprosy: GLMM estimtates

Exponentiated Exponentiated Exponentiated

Label Estimate Lower Upper

change with placebo 1.0031 0.7786 1.2924

change with A 0.5475 0.3897 0.7692

change with B 0.5947 0.4312 0.8202

A vs placebo 0.5458 0.3594 0.8288

B vs placebo 0.5929 0.3963 0.8869

A vs B 0.9206 0.5812 1.4583

(Estimates on log-scale and P-values omitted due to lack of space)

Conclusion:The expected increase in individual bacilli count is estimated as-43% with drug A (CI: -61%;-23%), -40% with drug B(-57%;-18%), and +1% with placobo (-22%;+29%).With drug A an additional reduction of -45% (-64%;-17%)is expected compared with placebo . . .49 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Poisson or negative binomial distribution

Not that obvious a choice since the random effect inducesoverdispersion . . .

Poisson distribution Negative binomialEffect Estimate (SE) P Estimate (SE) P

change with P 0.01 (0.12) 0.98 0.14 (0.23) 0.55change with A -0.60 (0.17) 0.001 -0.71 (0.25) 0.007change with B -0.52 (0.16) 0.003 -0.57 (0.24) 0.025A vs placebo -0.61 (0.20) 0.006 -0.84 (0.30) 0.006B vs placebo -0.52 (0.20) 0.013 -0.70 (0.29) 0.02A vs B -0.08 (0.22) 0.72 -0.14 (0.31) 0.65

For comparison of non-nested models use Akaike’s criterion (AIC).

◮ Poission fits better than negative binomial (362.25 vs 379.11).

50 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Leprosy: Individual predictions from GLMM

51 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Pros and cons of GLMMs

Advantages

◮ Suited for making individual predictions.

◮ Fully specified model allows for exact likelihood inference,model comparisons, and simulation for e.g. power calculations.

◮ Likelihood inference automatically handles data that aremissing at random optimally.

Drawbacks

◮ More model assumptions → higher risk of misspecification.Difficult to check assumptions about the random effectsand tempting to extrapolate from these very assumptions.

◮ Not optimal for making inference for population averages(due to having to average over the random effect).

◮ Computationally infeasible when the number of randomeffects or the overall size of the data becomes large.

52 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Outline


Marginal models



53 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Marginal models vs GLMMs

Normal outcomes:

◮ Variance component model with random intercept is the sameas repeated measurements model with compound symmetry.

Non-normal outcomes:

◮ Marginal models and GLMMs are inherently different models!

◮ They differ in interpretation.

◮ They differ in actual figures, i.e. estimates and SEs.

54 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Leprosy bacilli: GLMM vs Marginal model

Regression coefficients are similar but GLMM estimates a smallerintercept than GEE.

GEE GLMMParameter Estimate (SE) Pval Estimate (SE) Pval

Intercept 2.37 (0.08) - 2.24 (0.11) -time -0.01 (0.16) 0.93 0.01 (0.12) 0.98time*A -0.54 (0.22) 0.013 -0.61 (0.20) 0.006time*B -0.48 (0.23) 0.036 -0.52 (0.20) 0.013

Data is small, so maybe GEE underestimates the SEs . . .

– Should we prefer GLMM then?

◮ Maybe not, SEs from GLMM are even smaller

(due to stronger model assumptions we cannot verify).

55 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Predicted means GLMM vs Marginal model

Note the difference in interpretation:Population average vs expected for person of average frailty .

56 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Technical explanation: conditional vs marginal mean⋆

The GLMM with log-link implies that

E(Yij) = E{E(Yij |bi)} = E(eXTij

β+ZTij

bi) = eXT

ijβ · E(eZT

ibi)

◮ Regression coefficients are the same as in the marginal model.

◮ Only the intercept differs (it is smaller in the GLMM).

But a conditional poisson distribution does not imply amarginal poisson distribution.

◮ E.g. the marginal variance is

Var(Yij) = eXT

ijβ + (eXT

ijβ)2 · Var(eZT

ijbi)

with overdispersion similar to the negative binomial(which differ in other aspects, though).

57 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Recap: The two model types

Marginal models aka population average models:

◮ Describe covariate effects on the population mean,

◮ I.e. the expected effect of treatment on the entire population.

◮ Analyzed via repeated-statement in proc genmod.

Generalized linear mixed models aka subject specific models:

◮ Describe covariate effects on the individual,

◮ I.e. the expected effect of treatment on the individualconditioning on his or her general health (frailty).

◮ Analyzed via random-statement in proc glimmix.

58 / 59

� � � � � � � � � � � � � � � � � � � � � ! � � � � � " � � � � � # � � � � � � � � � � � �

Marginal model or GLMM – which should I prefer?

◮ Different specifications of the joint distribution of Yi lead toregression coefficients with quite distinct interpretations.

◮ Marginal models aim at inference for the population means.The models are semi-parametric. They merely acknowledgethe correlation and do not seek to explain it.

◮ GLMMs assume that the correlation among repeatedmeasurements arise from the sharing of random effect orsubject specific parameters. The parameters in GLMMs havesubject specific but not population average interpretations.

Choice of model should be made on subject matter grounds . . . andthere is no contradiction in repporting both subject-specific andpopulation averaged effects if both are of interests.

59 / 59

Documents

Models for counts Extensionstorepeatedmeasurementspublicifsv.sund.ku.dk/~jufo/courses/nfa2016/counts2015-nup.pdf · u d FacultyofHealthSciences Models for counts Analysis of repeated