Upload
homer
View
93
Download
0
Embed Size (px)
DESCRIPTION
BIO226 Lab Session 8: Generalized Linear Mixed Effects Models (GLMMs). Professor Brent Coull TA: Shira Mitchell May 3, 2012. Key Points of GLMM. 1. GLMMs extend the approach of linear mixed effects models to categorical data. - PowerPoint PPT Presentation
Citation preview
BIO226 Lab Session 8: Generalized Linear Mixed Effects
Models(GLMMs)
Professor Brent CoullTA: Shira Mitchell
May 3, 2012
Key Points of GLMM
1. GLMMs extend the approach of linear mixed effects models to categorical data.
2. GLMMs assume heterogeneity across individuals in a subset of regression coefficients (e.g. intercepts and slopes).
3. While Marginal Models (GEEs) focus on inferences about populations, GLMMs focus on inferences about individuals.
4. Regression parameters from GLMMs have ‘subject specific’ interpretations in terms of changes in the transformed mean response for a specific individual.
Specification of GLMM
The GLMM can be considered in 2 steps:1. Assume conditional distribution of each Yij,
given individual-specific effects bi, belongs to exponential family with conditional mean
g(E[Yij |bi]) = X′ij β + Z′ijbi,
where g(.) is known link function and Zij is known design vector (a subset of Xij) linking random effects bi to Yij.
Specification of GLMM
2. The bi are assumed to vary independently from one individual to another and bi ∼N(0,G), where G is covariance matrix for random effects.
Note: additional assumption of “conditional independence”, i.e. given bi, the responses Yi1, Yi2, ..., Yini
are assumed to be mutually independent.
GLMM ExampleLongitudinal Binary Response to Depression Medication
DX TRT NNN NNA NAN NAA ANN ANA AAN AAA
Mild Standard 16 13 9 3 14 4 15 6
Mild New 31 0 6 0 22 2 9 0
Severe Standard 2 2 8 9 9 15 27 28
Severe New 7 2 5 2 31 5 32 6
Reprinted by Agresti (2002) with permission from original source (Koch et al., 1977, Biometrics)The data stored in ‘depress.txt’ have already been converted into long form and contains the following 6 variables:
ID, Y (0=Abnormal , 1=Normal )
Severe (0=mild, 1=severe), Drug (0=standard, 1=new), Time, and Drug*Time.
Cross-classification of responses on depression at 3 times (N=Normal, A=Abnormal)
SAS CodeDATA depress;INFILE ‘depress.txt’;INPUT id y severe drug time dt;RUN;
DATA depress;SET depress;t=time; \* create categorical time variable*\RUN;
PROC PRINT DATA=depress(WHERE=(id=65 or id=101));RUN;
SAS Output
Obs id y severe drug time dt1 t
193 65 0 0 0 0 0 0
194 65 0 0 0 1 0 1
195 65 1 0 0 2 0 2
301 101 1 0 1 0 0 0
302 101 1 0 1 1 1 1
303 101 1 0 1 2 2 2
Question of interest: Do patient-specific changes in probability of normaldiffer between the two treatments?
1. drug*time
Marginal Model for Depression DataTo obtain initial parameter values for the GLMM, we fit the following Marginal Model (GEE):
logit{Pr(Yij = 1)} = ηij = β1 + β2severei + β3drugi + β4timej + β5drugi time∗ j
where:• Yij = 0 subject i is abnormal in period j; 1 subject i is normal in period j
• severei = 0 mild depression, initial diagnosis; 1 severe depression, initial diagnosis
• drugi = 0 standard; 1 new drug
• timej = 0 if baseline; 1 if time 1; 2 if time 2
and we assume:• Yij Bernoulli (∼ eηij/(1+eηij))
• Var(Y ij) = E(Yij)(1 − E(Yij)), note that Pr(Yij = 1) = E(Yij) because Yij is binary.
• log OR(Yij,Yik) = αjk
SAS Code
PROC GENMOD DESCENDING DATA=depress;CLASS id t;MODEL y=severe drug time dt / DIST=binomial
LINK=logit;REPEATED SUBJECT=id / WITHINSUBJECT=t
LOGOR=fullclust;RUN;
SAS Output Log Odds Ratio
Parameter Information
Parameter Group Alpha1 (1, 2) Alpha2 (1, 3) Alpha3 (2, 3)
GLMM for Depression Data
• Consider the following GLMM:logit{Pr(Yij = 1| bi1)} = ηij = β1 + β2severei + β3drugi + β4timej + β5drugi ∗
timej+ bi1
where bi1 is a random intercept that allows a different baseline probability of normal (vs abnormal) for each subject.
and we assume:• Yij|bi1 Bernoulli (∼ e ηij/(1+eηij)) which implies that Var(Yij|bi1) = E(Yij|bi1)(1
− E(Yij| bi1)). Note: E(Yij|bi1) = Pr(Yij = 1| bi1) because Yij is binary.
• Given bi1, the responses Yi0, Yi1, Yi2, are mutually independent.
• The bi1 are assumed to vary independently from one individual to another and bi1 N(0, ∼ σ2
b).
NLMIXED in SAS
PROC NLMIXED DATA=depress QPOINTS=20;PARAMS beta1=-0.03 beta2=-1.3 beta3=-0.05 beta4=0.48 beta5=1.01 sigma=0.07;
eta = beta1 + beta2*severe + beta3*drug + beta4*time + beta5*dt + b1;p = (exp(eta)/(1 + exp(eta));
MODEL y ~ BINARY(p);RANDOM b1 ~ NORMAL(0, sigma*sigma) SUBJECT = id;
ESTIMATE ’treatment effect, time 1’ beta3 + beta5;ESTIMATE ’treatment effect, time 2’ beta3 + 2*beta5;ESTIMATE ’time trend standard treatment’ beta4;ESTIMATE ’time trend new treatment’ beta4 + beta5;RUN;
NLMIXED in SAS• PARAMS statement: lists all parameters (fixed effects
and covariance for random effects) and their initial values (default initial value is 1).
• Program statements: defines linear predictor eta (includes fixed and random effects) and relates mean response (p) to linear predictor (eta).
• MODEL statement: specifies response variable and conditional distribution of response given random effects (e.g. BINARY).
• RANDOM effects distribution SUBJECT=variable: ∼defines random effects (RANDOM) and variable that determines clustering of observations within an individual (SUBJECT).
Note: PROC NLMIXED does not have a CLASS statement, therefore, it is critical that the dataset is sorted by ID prior to analysis.
Estimate StatementsTreatment effect, time 1
logit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3drugi + β4timej + β5drugi time∗ j+ bi1
For drugi = 0 and timej = 1,
logit{Pr(Yij = 1| bi1)} = β1 + β2severei + β4 + bi1
For drug=1 and time=1,logit{Pr(Yi’j = 1| bi’1)} = β1 + β2severei’ + β3 + β4 + β5 + bi’1
Thus, difference = β3 + β5 assuming bi1 = bi’1 and severei = severei’
Estimate StatementsTreatment effect, time 2
logit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3drugi + β4timej + β5drugi time∗ j+ bi1
For drug=0 and time=2,logit{Pr(Yij = 1| bi1)} = β1 + β2severei + 2β4 + bi1.
For drug=1 and time=2,logit{Pr(Yi’j = 1| bi’1)} = β1 + β2severei’ + β3 + 2β4 + 2β5 + bi1.
Thus, the difference = β3 + 2β5 assuming bi1 = bi’1 and severei = severei’
Estimate Statementslogit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3drugi + β4timej +
β5drugi time∗ j+ bi1
Time Trend, Standard Treatmentlogit{Pr(Yij = 1| bi1)} = β1 + β2severei + β4timeij + bi1.
Time Trend, New Treatmentlogit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3 + (β4 +β5)timeij +
bi1.
Specifications
Data Set WORK.DEPRESS
Dependent Variable y
Distribution for Dependent Variable
Binary
Random Effects b0
Distribution for Random Effects Normal
Subject Variable id
Optimization Technique Dual Quasi-Newton
Integration Method Adaptive Gaussian Quadrature
Dimensions
Observations Used 1020
Observations Not Used
0
Total Observations 1020
Subjects 340
Max Obs Per Subject 3
Parameters 6
Quadrature Points 20
NL Mixed Output
Parameters
beta1 beta2 beta3 beta4 beta5 sigma NegLogLike
-0.03 -1.3 -0.05 0.48 1.01 0.07 580.980217
Iteration History
Iter Calls NegLogLike Diff MaxGrad Slope
1 4 580.977896 0.002321 1.640809 -12.2797
8 20 580.969876 1.541E-7 0.000097 -3.16E-7
NOTE: GCONV convergence criterion satisfied.
Fit Statistics
-2 Log Likelihood 1161.9
AIC (smaller is better) 1173.9
AICC (smaller is better)
1174.0
BIC (smaller is better) 1196.9
NL Mixed Output
NL Mixed OutputParameter Estimates
Parameter EstimateStandard
Error DF t Value Pr > |t| Alpha Lower Upper Gradient
beta1 -0.02795 0.1641 339 -0.17 0.8649 0.05 -0.3508 0.2949 -0.0001
beta2 -1.3152 0.1546 339 -8.50 <.0001 0.05 -1.6194 -1.0110 -0.00002
beta3 -0.05970 0.2225 339 -0.27 0.7886 0.05 -0.4973 0.3779 -1.1E-6
beta4 0.4828 0.1160 339 4.16 <.0001 0.05 0.2547 0.7109 -0.00008
beta5 1.0184 0.1924 339 5.29 <.0001 0.05 0.6400 1.3969 -7.03E-7
sigma 0.06583 1.2417 339 0.05 0.9578 0.05 -2.3766 2.5083 0.000012
logit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3drugi + β4timej + β5drugi time∗ j+bi1
Not significant at baseline (RCT)
NL Mixed OutputAdditional Estimates
Label EstimateStandard
Error DF t Value Pr > |t| Alpha Lower Upper
treatment effect, time 1 0.9587 0.1523 339 6.30 <.0001 0.05 0.6592 1.2582
treatment effect, time 2 1.9771 0.2663 339 7.42 <.0001 0.05 1.4533 2.5010
time trend standard treatment 0.4828 0.1160 339 4.16 <.0001 0.05 0.2547 0.7109
time trend new treatment 1.5013 0.1608 339 9.34 <.0001 0.05 1.1850 1.8175
logit{Pr(Yij = 1| bi1)} = β1 + β2severei + β3drugi + β4timej + β5drugi time∗ j+bi1
Conclusions• Research question: are patient-specific changes in probability of normal
different between the two treatments over time? This corresponds to a testing
H0: β5 = 0
• β5 = 1.0184 (p-value<.0001). Thus, we reject H0 of no treatment effect and conclude that there are greater patient-specific changes in probability of normal for the new treatment.
• The estimated odds ratio of normal comparing a patient on the new treatment to a patient on the standard treatment with the same random intercept and severity of initial diagnosis is 2.61 (1.93, 3.52) [e0.9587(e.659,e1.258)] for time 1, and 7.22 (4.28, 12.19) [e1.977(e1.453, e2.501)] for time 2.
Conclusions, continued• We estimate that the odds of normal for a subject on standard treatment increases by a factor of 1.62 (e0.483) for each time period. We estimate that the odds of normal for a subject on the new treatment increases by a factor of 4.49 (e1.501) for each time period.
• The odds of normal of a subject with an initial diagnosis of severe depression are 0.27 (e−1.315) times the odds of normal of a subject with mild depression and the same random intercept (i.e., a lower odds of normal).
• There appears to be little heterogeneity among subjects (σb = 0.06583). Approximately 95% of patients in the standard group with an initial diagnosis of mild depression are expected to have a baseline (time=0) log odds of normal between -0.1568 and 0.1011 (−0.02795 ± 1.96 × 0.06583) or baseline probability of normal between 0.461 = e-0.1568/(1+e-0.1568) and 0.525 = e0.1011/(1+e0.1011). (Lecture 20 slide 23)
Conclusions, continued Note that, when we interpret the parameter estimates from the mixed model, we interpret them at the patient level. When we report odds ratios comparing two patients, we assume that they have the same random intercepts (i.e. the same baseline propensity for normal).