Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Faculty of Health Sciences
Correlated data
NFA, May 22, 2014Longitudinal measurements
Julie Lyng Forman & Lene Theil SkovgaardDepartment of BiostatisticsUniversity of Copenhagen
1 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Repeated measurements over time
I Presentation of data.I Simple approaches.I Correlation structures.I Random regression.I Baseline considerations.
Home page:http://staff.pubhealth.ku.dk/~jufo/RepeatedMeasuresNFA2014.htmlE-mail: [email protected], [email protected]
2 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Typical set-up for repeated measurements
I Two or more groups of subjects(typically receiving different treatments)
I Randomization at baselineI Longitudinal measurements of the same quantity over time for
each subject, typically as a function ofI time (duration of treatment)I ageI cumulative dose of some drug
Level 1: Single observationsLevel 2: Patients/Subjects
3 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Traditional presentation of data
Example: Aspirin absorption for healthy and ill subjects(Matthews et.al.,1990)
4 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Problems with the traditional presentation of data
I Comparison of groups for each time point separatelyI is inefficientI has a high risk of leading to chance significanceI Tests are not independent,
since they are carried out on the same subjectsI Interpretation may be difficult
I Changes over timeI cannot be evaluated ’by eye’
because we cannot see the pairing
5 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Take care with average curves
I They may hide important structuresI They give no indication of the variation in the time profilesI Always make a picture of individual time profilesI Do not average over individual profiles, unless these have
identical shapes, i.e. only shifts in level are seen betweenindividuals.
I Alternative:Calculate individual characteristics
6 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Individual time profiles
Spaghettiplot - divided into groups
Do we see time profilesof identical shape?
Are the averagesrepresentative?
7 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Commonly used characteristics
I The response for selected times, e.g. endpointI AverageI The slope, perhaps for a specific periodI Peak valueI Time to peakI The area under the curve (AUC).I A measure of cyclic behaviour.
These are analyzed as new observations.
8 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Example: Aspirin absorption
Characteristics:I time to peakI peak value
Conclusion:P=0.02 for identity ofpeak values.
Remember quantifications!
9 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Repeated measurement designs
MeritsI It is much more powerful in detecting time changes
(data are ’paired’ with the subject as its own control)I We may discover that subjects have different time courses
(In designs with only cross-sectional data, this may also be thecase, but we have no way of knowing!)
I We may identify important characteristics of the time courses,specific for each subject (trend, peak etc.)
10 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Repeated measurement designs
DrawbacksI Traditional independence assumption is violated since repeated
observations on the same individual are correlated (look alike)I Traditional anova-models become impossibleI Comparison of time averages (or other characteristics) cannot
incorporate time dependent covariates
11 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Example: Calcium supplements
A total of 112 11-year old girls were randomized to receive eithercalcium or placebo.
Outcome:BMD=bone mineral density, in g
cm2 ,measured every 6 months (5 visits)
Scientific question:Does calcium improve the rate of bone gain for adolescent women?
12 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Individual profiles
13 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Factor diagram
[I ] = [Girl ∗Visit]
Grp ∗Visit
[Girl]
Visit
Grp
HHHHj
����*
�����3
-
-
Two-level model with:I Observations Ygit (group=g, girl=individual=i, visit=time=t)I Systematic (fixed) effects of group and visit, with a possible
interactionI Random girl-level, Var(agi) = ω2
B, i.e., random interceptI Residual variation, within girls, Var(εgit) = σ2
W
14 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Multilevel model structure
Level 1 2Unit single measurements girlsVariation within girls between girls
σ2W ω2
B
Covariates x zvisit, grp*visit grp
15 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random girl level in SAS
Ygit = µgt + Agi + εgit
proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth s;random girl(grp);run;
I Girls are nested in groups, specified by the notationrandom girl(grp);
I satterth could be replaced by kenwardroger
16 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
The options
ddfm=satterth (- or kenwardrogers):
I When the distributions are exact, they have no effectI in balanced situations
I When approximations are necessary,these two are considered best
I in unbalanced situations,i.e for almost all observational designs
I in case of missing observationsI It may give rise to fractional degrees of freedomI The computations may require a little more time,
but in most cases this will not be noticableI When in doubt, use it!
17 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random girl level, output
Covariance Parameter Estimates (REML)
Cov Parm Estimate
GIRL(GRP) 0.00443925Residual 0.00023471
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
GRP 1 110 2.63 0.1078VISIT 4 382 619.42 0.0001GRP*VISIT 4 382 5.30 0.0004
No doubt, we see an interaction GRP*VISIT
18 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random girl level, output, continued
Solution for Fixed EffectsStandard
Effect grp visit Estimate Error DF t Value Pr > |t|Intercept 0.9576 0.009131 122 104.87 <.0001grp C 0.02951 0.01304 122 2.26 0.0254grp P 0 . . . .visit 1 -0.08750 0.003100 382 -28.22 <.0001visit 2 -0.06748 0.003103 381 -21.75 <.0001visit 3 -0.04342 0.003117 381 -13.93 <.0001visit 4 -0.01619 0.003148 381 -5.14 <.0001visit 5 0 . . . .grp*visit C 1 -0.01912 0.004445 382 -4.30 <.0001grp*visit C 2 -0.01255 0.004448 381 -2.82 0.0050grp*visit C 3 -0.00622 0.004480 381 -1.39 0.1661grp*visit C 4 -0.00679 0.004517 381 -1.50 0.1337grp*visit C 5 0 . . . .grp*visit P 1 0 . . . .grp*visit P 2 0 . . . .grp*visit P 3 0 . . . .grp*visit P 4 0 . . . .grp*visit P 5 0 . . . .
I Difference between groups at visit 5I Difference increases over time
19 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model checks
Two types of residuals:
Ordinary Observed minus predicted group mean(only systematic effects)
Ygit − µgt
Conditional Observed minus predicted individual mean value(systematic and random effects)
εgit = Ygit − µgt − Agi
20 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model check, ordinary residuals
21 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model check, conditional residuals
22 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Correlated observations
If we fail to take the correlation into account,we will experience:
I possible bias in the mean value structurein unbalanced situations
I low efficiency (type 2 error) for estimationof level 1 covariates (time-related effects)
I too small standard errors (type 1 error) for estimatesof level 2 effects (treatments)
23 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Level 1 covariates (unit: single observations)
I Time itselfI Covariates varying with time:
blood pressure, heart rate, ageI Interaction between group and time
If correlation is not taken into account, we ignore the pairedsituation, leading to low efficiency, i.e. too large P-valuesType 2 error
Effects may go undetected!
24 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Incorrect analysis
Correlation ignored
proc glm data=calcium;class grp visit;model bmd=grp visit grp*visit / solution;
run;
Source DF Type III SS Mean Square F Value Pr > Fgrp 1 0.05063714 0.05063714 11.05 0.0010visit 4 0.64369784 0.16092446 35.10 <.0001grp*visit 4 0.00649557 0.00162389 0.35 0.8411
The interaction is not detectedsince we “forgot” the pairing
25 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Level 2 covariates (unit: individuals)
I TreatmentI Gender, age
If correlation is ignored, we act as if we have more informationthan we actually have, leading to too small P-valuesType 1 error
’Noise’ may be taken to be real effects!
26 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Model synonyms
I Two-level model
I Model with random subject levelsI Model with random intercepts
I Model with compound symmetry correlation structureI Model with exchangeability correlation structure
27 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Compound symmetry = Exchangeability
The two-level model assumes compound symmetry,i.e., that all measurements on the same individual areequally correlated:
Corr(Ygit1 ,Ygit2) = ρ = ω2B
ω2B + σ2
W
This means that the distance in time is not taken into account!!
Observations are exchangeable
28 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Compound symmetry structure
ω2B + σ2
W ω2B ω2
B ω2B ω2
Bω2
B ω2B + σ2
W ω2B ω2
B ω2B
ω2B ω2
B ω2B + σ2
W ω2B ω2
Bω2
B ω2B ω2
B ω2B + σ2
W ω2B
ω2B ω2
B ω2B ω2
B ω2B + σ2
W
= (ω2B + σ2
W )
1 ρ ρ ρ ρρ 1 ρ ρ ρρ ρ 1 ρ ρρ ρ ρ 1 ρρ ρ ρ ρ 1
But:Observations taken close to each other in time will often be moreclosely correlated than observations taken further apart!
29 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Estimation of correlation ρ
The correlation ρ is estimated by using output from p. 18
ρ = Corr(Ygdt1 ,Ygdt2) = ω2B
ω2B + σ2
W
≈ 0.004439250.00443925 + 0.00023471 = 0.9498
High “tracking”, meaning that each girl stays more or less atthe same rank in the group
30 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Alternative specifications
Note, that the specification ’random girl(grp);’can be written in two other ways:
random intercept / subject=girl(grp);
repeated visit / type=CS subject=girl(grp);
CS: Compound symmetryIn the following, we shall see generalizationsof the constructions above.
31 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Compound symmetry analysis (just checking...)
proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth outpredm=fit_cs;
repeated visit / type=CS subject=girl(grp) rcorr;run;
Estimated R Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9498 0.9498 0.9498 0.94982 0.9498 1.0000 0.9498 0.9498 0.94983 0.9498 0.9498 1.0000 0.9498 0.94984 0.9498 0.9498 0.9498 1.0000 0.94985 0.9498 0.9498 0.9498 0.9498 1.0000
32 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Compound symmetry analysis, continued
Covariance Parameter Estimates
Cov Parm Subject EstimateCS girl(grp) 0.004439Residual 0.000235
Fit Statistics-2 Res Log Likelihood -2188.8 <-----------used later
(p. 38+47)Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fgrp 1 110 2.63 0.1078visit 4 382 619.42 <.0001grp*visit 4 382 5.30 0.0004
33 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Empirical correlation structure
Row COL1 COL2 COL3 COL4 COL51 1.00000000 0.96987049 0.94138162 0.92499715 0.898654542 0.96987049 1.00000000 0.97270895 0.95852788 0.939871853 0.94138162 0.97270895 1.00000000 0.98090996 0.959193484 0.92499715 0.95852788 0.98090996 1.00000000 0.975538495 0.89865454 0.93987185 0.95919348 0.97553849 1.00000000
Is compound symmetry reasonable? Hardly....
Other possibilities:I Unstructured: T(T+1)
2 = 15 covariance parameters (T = 5)I ’Patterned’, e.g. an autoregressive structureI Random regression
34 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Unstructured covariance
If we do not assume any specific structure for the covariance, wemay let it be arbitrary = unstructured
This is done in MIXED by using ’type=UN’
proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth outpredm=fit_un;
repeated visit / type=UN subject=girl(grp) rcorr;run;
35 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output from TYPE=UN model
Estimated R Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9699 0.9414 0.9250 0.89872 0.9699 1.0000 0.9727 0.9585 0.93993 0.9414 0.9727 1.0000 0.9809 0.95924 0.9250 0.9585 0.9809 1.0000 0.97555 0.8987 0.9399 0.9592 0.9755 1.0000
Fit Statistics-2 Res Log Likelihood -2346.3 <-----------used later
(p. 38+47)Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fgrp 1 109 2.55 0.1129visit 4 97.1 258.08 <.0001grp*visit 4 97.1 2.79 0.0303
36 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Unstructured covariance
Advantages with unstructured covarianceI We do not force a wrong covariance structure upon our
observations.I We gain some insight in the actual structure of the covariance.
Drawbacks of the unstructured covarianceI We use quite a lot of parameters to describe the covariance
structure. The result may therefore be unstable.I It cannot be used for small data setsI It can only be used in case of balanced data
(all subjects have to be measured at identical times)
Can we do something ’in between’?37 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Comparison of covariance structures
Use the likelihood:I Good models have large values of likelihood L and therefore
small values of deviance: −2 log LI Use differences in deviances (∆ = −2 log Q) and compare toχ2 with degrees of freedom equal to the difference inparameters
Comparison of CS and UN:
−2 log Q = 2346.3− 2188.8= 157.5 ∼ χ2(15− 2) = χ2(13)⇒ P < 0.0001
Compound symmetry is not suitable
38 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Different forms of the likelihood
I Default likelihood is the REML-likelihood, where the meanvalue structure has been ’eliminated’
I The traditional likelihood may be obtained using an extraoption:proc mixed method=ml;
I Comparison of covariance structures:Use either of the two likelihoods
I Comparison of mean value structures:Use only the traditional likelihood (ML) only!
I But mostly, just use the default F-tests
39 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Autoregressive structure - of first order
In case of equidistant times, this specifies the followingcovariance structure
σ2
1 ρ ρ2 ρ3 ρ4
ρ 1 ρ ρ2 ρ3
ρ2 ρ 1 ρ ρ2
ρ3 ρ2 ρ 1 ρρ4 ρ3 ρ2 ρ 1
i.e. the correlation decreases (in powers) with the distance betweenobservations.
The non-equidistant analogue is Corr(Ygit1 ,Ygit2) = ρ|t1−t2|
40 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Autoregressive correlation
– as a function of distance between the measurementsfor ρ = 0.1, . . . , 0.9
41 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Autoregressiove covariance structure in SAS
proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit /
ddfm=satterth outpredm=fit_ar1;repeated visit / type=AR(1)
subject=girl(grp) rcorr;run;
42 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output from TYPE=AR(1) structure
Estimated R Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9708 0.9425 0.9150 0.88832 0.9708 1.0000 0.9708 0.9425 0.91503 0.9425 0.9708 1.0000 0.9708 0.94254 0.9150 0.9425 0.9708 1.0000 0.97085 0.8883 0.9150 0.9425 0.9708 1.0000
Covariance Parameter Estimates
Cov Parm Subject EstimateAR(1) girl(grp) 0.9708Residual 0.004412
Fit Statistics-2 Res Log Likelihood -2318.6 <-----------used later
(p. 47)
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fgrp 1 113 2.74 0.1005visit 4 382 233.91 <.0001grp*visit 4 382 2.86 0.0232
43 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Note:Comparison of models with different covariance structuresrequires, that the models are nested
This is not the case for CS and AR(1)!
Therefore, we have to compare both of them with the model whichcombines the two covariance structures:
proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit /
ddfm=satterth outpredm=fit_ar1;random intercept / subject=girl(grp) g;repeated visit / type=AR(1)
subject=girl(grp) rcorr;run;
44 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Combination of CS and AR(1)
In case of equidistant times, this combined model specifies thefollowing covariance structure
ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2 ω2 + σ2ρ3 ω2 + σ2ρ4
ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2 ω2 + σ2ρ3
ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2
ω2 + σ2ρ3 ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρω2 + σ2ρ4 ω2 + σ2ρ3 ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2
Dividing all entries with ω2 + σ2 gives the correlation matrix,with correlations
ω2 + σ2ρd
ω2 + σ2
45 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output from CS+AR(1)
Estimated R Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9708 0.9425 0.9150 0.88832 0.9708 1.0000 0.9708 0.9425 0.91503 0.9425 0.9708 1.0000 0.9708 0.94254 0.9150 0.9425 0.9708 1.0000 0.97085 0.8883 0.9150 0.9425 0.9708 1.0000
Covariance Parameter EstimatesCov Parm Subject Estimate
Intercept girl(grp) 0AR(1) girl(grp) 0.9708Residual 0.004413
Fit Statistics-2 Res Log Likelihood -2318.6 <-----------used later
(p. 47)
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fgrp 1 113 2.74 0.1005visit 4 382 233.91 <.0001grp*visit 4 382 2.86 0.0232
46 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Comparison of covariance structures
cov. ∆ =Model -2 log L par. −2 log Q df PUN 2346.3 15
AR(1) + CS 2318.6 3 27.7 12 0.006
AR(1) 2318.6 2 0 1 1
CS 2188.8 2 129.8 1 < 0.0001
Conclusions?
I The autoregressive structure is definitely better than CS,but not good quite enough...
47 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Test of no interaction GRP*VISIT
– for various choices of covariance structure
Covariance structure Test statistic ∼ distribution P valueIndependence 0.35 ∼ F(4,491) 0.84Compound symmetry 5.30 ∼ F(4,382) 0.0004Autoregressive 2.86 ∼ F(4,382) 0.023Unstructured 2.72 ∼ F(4,107) 0.034
Not quite identical conclusions!
48 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Adding an extra level
What, if we had had double or triple measurementsat each visit?
I If we always have the same number of repetitions,a correct and optimal approach is to analyze averages
I If the number of repetitions vary, analysis of averages maystill be valid (depends on the reason for the unbalance),although not optimal
I The easiest approach is to modify the random-statement to:
random girl girl*visit;
49 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Predicted mean time profiles
In SAS: Use the outpredm=newdata-optionand make pictures using this new data set
Profiles are almost identical for all choices of covariance structuresI For balanced designs, they agree completely
and equal simple averagesI They agree for time points with no missing values
(here the first visit)
50 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Predicted profiles for the unstructured covariance
I The evolution over time looks pretty linearI Include time=visit as a quantitative covariate?I What about the baseline difference?
later....
51 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Individual growth rates?
The time course is reasonably linear, but maybe the girls havedifferent growth rates (slopes)?
If we let Ygit denote BMD for the i’th girl (in the g’th group)at time t (t=1,· · · ,5), we could look at the model:
ygit = Agi + Bgit + εgit , εgit ∼ N (0, σ2)
i.e., with different intercepts and different slopes for each girl
52 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Fit a straight line for each girl
Scatterplot of slopes vs. levels at first visit:
Slopes in the Calcium-group (blue dots) seem to be bigger....
53 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Results from individual regression
Estimates with standard errors in brackets:
Group Level at age 11 SlopeP 0.8697 (0.0086) 0.0206 (0.0014)C 0.8815 (0.0088) 0.0244 (0.0014)
Difference 0.0118 (0.0123) 0.0039 (0.0019)P-value 0.34 0.050
54 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random regression
– a generalization of the idea of a random level
We let each individual (girl)have
I her own level AgiI her own slope Bgi
but...
55 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
... we bind these individual ’parameters’ (Agi and Bgi) together bynormal distributions
(AgiBgi
)∼ N2
((αgβg
),G)
G =(τ2
a ωω τ2
b
)=(
τ2a ρτaτb
ρτaτb τ2b
)
G describes the population variation of the lines, i.e. theinter-individual variation (as seen on p. 53).
56 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Estimation in random regression
proc mixed covtest data=calcium;class grp girl;model bmd=grp time time*grp / ddfm=satterth s;random intercept time /
type=un subject=girl(grp) g vcorr;run;
type=un in the random-statement refers to the matrix Gon the previous slide
57 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output from random regression
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr ZUN(1,1) girl(grp) 0.004105 0.000575 7.13 <.0001UN(2,1) girl(grp) 3.733E-6 0.000053 0.07 0.9435UN(2,2) girl(grp) 0.000048 8.996E-6 5.30 <.0001Residual 0.000125 0.000010 11.99 <.0001
Fit Statistics-2 Res Log Likelihood -2341.6
Estimated G Matrix
Row Effect grp girl Col1 Col2
1 Intercept C 101 0.004105 3.733E-62 time C 101 3.733E-6 0.000048
58 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output II: Estimated covariance and correlation
Estimated V Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col5
1 0.004285 0.004211 0.004263 0.004314 0.0043662 0.004211 0.004435 0.004410 0.004509 0.0046083 0.004263 0.004410 0.004681 0.004703 0.0048504 0.004314 0.004509 0.004703 0.005022 0.0050925 0.004366 0.004608 0.004850 0.005092 0.005459
Estimated V Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9660 0.9518 0.9300 0.90272 0.9660 1.0000 0.9677 0.9553 0.93643 0.9518 0.9677 1.0000 0.9700 0.95944 0.9300 0.9553 0.9700 1.0000 0.97255 0.9027 0.9364 0.9594 0.9725 1.0000
59 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output III: Estimated mean value structure
Solution for Fixed Effects
StandardEffect grp Estimate Error DF t Value Pr > |t|Intercept 0.8471 0.008645 110 97.98 <.0001grp C 0.007058 0.01234 110 0.57 0.5685grp P 0 . . . .time 0.02242 0.001098 95.8 20.42 <.0001time*grp C 0.004494 0.001571 96.4 2.86 0.0052time*grp P 0 . . . .
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fgrp 1 110 0.33 0.5685time 1 96.4 985.55 <.0001time*grp 1 96.4 8.18 0.0052
Thus, we find an extra increase in BMD of 0.0045(0.0016) g percm3 per half year, when giving calcium supplement.
60 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Note concerning MIXED-notation
I It is necessary to use TYPE=UN in the RANDOM-statementin order to allow intercept and slope to be arbitrarily correlated
I Default option in RANDOM is TYPE=VC, which only specifiesvariance components with different variances
I If TYPE=UN is omitted, we may experience convergenceproblems and sometimes totally incomprehensible results.
In this particular case, the correlation between intercept and slopeis not that impressive - actually only 0.0084 -(intercept is not completely out of range in this example,since it refers to visit=0).
61 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Individual regressions approach
Merits:I Easy to understand and interpret
Drawbacks:I Suboptimal in case of unequal sample sizesI Only simple models feasibleI Difficult/impossible to include covariates
62 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random regression approach
Merits:I Uses all available informationI Optimal procedure if the model holdsI Easy to include covariates
Drawbacks:I Biased in case of informative missing values or informative
sample sizes
63 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
What to choose in this example?
I Random regression givesI steeper slopeI almost identical levels at age 11
I The girls with flat low profiles tend to be shorter:
Why??
64 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Actually – as it always happens –
I The girls are only seen approximately twice a yearI The actual dates are available and are translated into ctime,
the internal date representation in SAS, denoting days since....
I We can no longer use the construction type=UN,but still the random-statement and the CS in therepeated-statement.
I A lot of other covariance structures will still be possible, e.g.The non-equidistant analogue to the autoregressive structureis Corr(Ygit1 ,Ygit2) = ρ|t1−t2|
which is written as TYPE=SP(POW)(ctime)
65 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Furthermore,
I the girls were not precisely 11 years at the first visit
As a covariate, we ought to have the specific age of the girl,but unfortunately, these are not available.
Note, that this will mostly affect the intercept estimates!
66 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Using the newly constructed ctime as covariate
proc mixed covtest data=calcium;class grp girl;model bmd=grp ctime ctime*grp / ddfm=satterth s;random intercept ctime / type=un subject=girl(grp) g;run;
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 -1221.358005311 2 -2316.64715219 0.020232292 1 -2316.64847895 0.020111173 1 -2316.64847962 0.02010938
48 1 -2317.30142030 0.0173756149 1 -2317.30142036 0.0173756150 1 -2317.30142043 0.01737561
WARNING: Did not converge.
67 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Numerical instabilityThe variable ctime has much too large values, with a very smallrange.
We normalize, to approximate age or age11:
age=(ctime-11475)/365.25+12;age11=age-11; /* intercept at age 11 */
Variable N Mean Minimum Maximum-------------------------------------------------------ctime 501 11475.08 11078.00 11931.00bmd 501 0.9219202 0.7460000 1.1260000visit 560 3.0000000 1.0000000 5.0000000age 501 12.0002186 10.9130732 13.2484600age11 501 1.0002186 -0.0869268 2.2484600-------------------------------------------------------
68 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random regression, covariate age11:
ygit = Agi + Bgi(age-11) + εgit
69 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Random regression, using actual ageIntercept at age 11: age11=age-11:
proc mixed covtest data=calcium;class grp girl;model bmd=grp age11 age11*grp / ddfm=satterth s outpm=predicted_mean;random intercept age11 /
type=un subject=girl(grp) g vcorr;run;
Estimated G Matrix
Row Effect grp girl Col1 Col2
1 Intercept C 101 0.004215 0.0000952 age11 C 101 0.000095 0.000180
Estimated V Correlation Matrix for girl(grp) 101 C
Row Col1 Col2 Col3 Col4 Col5
1 1.0000 0.9664 0.9537 0.9321 0.90562 0.9664 1.0000 0.9687 0.9566 0.93853 0.9537 0.9687 1.0000 0.9697 0.95904 0.9321 0.9566 0.9697 1.0000 0.97235 0.9056 0.9385 0.9590 0.9723 1.0000
70 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Covariance Parameter Estimates
Standard ZCov Parm Subject Estimate Error Value Pr Z
UN(1,1) girl(grp) 0.004215 0.000580 7.26 <.0001UN(2,1) girl(grp) 0.000095 0.000104 0.91 0.3617UN(2,2) girl(grp) 0.000180 0.000034 5.21 <.0001Residual 0.000124 0.000010 12.01 <.0001
Solution for Fixed Effects
StandardEffect grp Estimate Error DF t Value Pr > |t|
Intercept 0.8667 0.008688 110 99.75 <.0001 /grp C 0.01113 0.01240 110 0.90 0.3715 ----grp P 0 . . . . \age11 0.04529 0.002152 96 21.05 <.0001 /age11*grp C 0.008891 0.003076 96.6 2.89 0.0048 ----age11*grp P 0 . . . . \
In this model, we quantify the effect of a calcium supplement to0.0089 (0.0031) g per cm3 per year.
71 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Results from random regression
Group Level at age 11 SlopeP 0.8667 (0.0087) 0.0453 (0.0022)C 0.8778 (0.0088) 0.0542 (0.0022)
Difference 0.0111 (0.0124) 0.0089 (0.0031)P 0.37 0.0048
Compare to results from individual regressions (p. 54):
72 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Comparison of slopes for different covariance structures
Covariance −2 log L Cov.par. Differencestructure in slopes P
Independence -1245.0 1 0.0094 (0.0086) 0.27
Compound -2251.7 2 0.0089 (0.0020) < 0.0001Symmetry
Power -2372.0 2 0.0094 (0.0032) 0.0038(Autoregressive)
Random -2350.1 4 0.0089 (0.0031) 0.0048Regression
73 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Reparametrization, intercept at age 13
Using age13=age-13 as covariate, and no baseline yet:
proc mixed covtest noclprint data=calcium;class grp girl;model bmd=grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;
Solution for Fixed Effects
StandardEffect grp Estimate Error DF t Value Pr > |t|Intercept 0.9573 0.009819 108 97.49 <.0001grp C 0.02891 0.01402 108 2.06 0.0416grp P 0 . . . .age13 0.04529 0.002152 96 21.05 <.0001age13*grp C 0.008891 0.003076 96.6 2.89 0.0048age13*grp P 0 . . . .
Estimated gain at the age 13: 0.0289 (0.0140) g per cm3
74 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Predicted values from random regression
It looks as if there is a difference right from the start (although wehave previously seen this to be insignificant, P=0.37).
Baseline adjustment?
75 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
It the first visit is a baseline measurement (which it is),and randomization has been performed:
I The two groups are known to be equal at baselineI To include this measurement in the comparison between
groupsI may weaken a possible difference between these
(type 2 error)I may convert a treatment effect to an interaction
I Dissimilarities may be present in small studiesI For ’slowly varying’ outcomes, even a small difference may
produce non-treatment related differences, i.e. bias
76 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Hypothetical comparison of two treatment groups, A
I Truth: Constant difference between the treatmentsI Finding: Interaction between time and treatment
77 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Hypothetical comparison of two treatment groups, B
I Truth: No effect of treatmentI Finding: Constant difference between treatments
78 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Baseline difference I: Observational studies
Research question:Compare the outcomes for individuals from different groups(e.g. gender or illness groups):
I The groups are likely to differ in many respects, includingbaseline outcome value.
I Differences in the outcome may be due to any of thesecharacteristics, and the answer will depend on which of theseare included in the model.
I Adjust for the covariates that are sensible in the context.The scientific question answered depends upon the model
79 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Baseline difference II: Randomized studies
Research question:Compare the outcomes for individuals treated differently, butotherwise identical (with respect to all baseline characteristics,including baseline outcome value):
I There ought to be no difference in either covariates orbaseline outcome.
I Even so, small chance differences in baseline may createimportant outcome differences that may erroneously be takento be treatment effects, if the covariate (or baseline) is highlypredictive of the outcome.
I Use baseline measurement as a covariate (Ancova)to adjust for chance differences.
I Adjust also for other important covariates.The scientific question answered depends upon the model80 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Approaches for handling baseline
in randomized studiesI Use follow-up data only (exclude baseline from analysis)
- most reasonable if correlation between repeatedmeasurements is very low
I Subtract baseline from successive measurements- most reasonable if correlation between repeatedmeasurements is very high
I Use baseline measurement as a covariate (Ancova)- may be used for any degree of correlationand gives the most sensible interpretation
81 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Excluding baseline (4 follow-up visits only)
without baseline as covariate:
proc mixed covtest noclprint data=calcium; where visit>1;class grp girl;model bmd=grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;
Solution for Fixed Effects
StandardEffect grp Estimate Error DF t Value Pr > |t|
Intercept 0.9574 0.009721 102 98.49 <.0001grp C 0.02474 0.01383 102 1.79 0.0765grp P 0 . . . .age13 0.04634 0.002288 92.3 20.25 <.0001age13*grp C 0.007456 0.003277 92.5 2.28 0.0252age13*grp P 0 . . . .
Estimated gain at the age 13: 0.0247 (0.0138) g per cm3
82 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Including baseline as covariate
proc mixed covtest noclprint data=calcium; where visit>1;class grp girl;model bmd=baseline grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;
Solution for Fixed Effects
StandardEffect grp Estimate Error DF t Value Pr > |t|
Intercept 0.01825 0.02690 106 0.68 0.4989baseline 1.0797 0.03054 102 35.36 <.0001grp C 0.01728 0.006236 101 2.77 0.0067grp P 0 . . . .age13 0.04597 0.002287 93.1 20.11 <.0001age13*grp C 0.007419 0.003276 93.2 2.26 0.0258age13*grp P 0 . . . .
Estimated gain at the age 13: 0.0173 (0.0062) g per cm3
83 / 84
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Effect of including baseline as covariate
I explains some (but not all) of the difference between groupsat age 13without baseline: 0.0247 (0.0138)baseline as covariate: 0.0173 (0.0062)
I increases the precision of the estimated difference(standard error becomes smaller)It even becomes significant!
84 / 84