21
university of copenhagen department of biostatistics Faculty of Health Sciences Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen 1 / 84 university of copenhagen department of biostatistics Repeated measurements over time Presentation of data. Simple approaches. Correlation structures. Random regression. Baseline considerations. Home page: http://staff.pubhealth.ku.dk/~jufo/RepeatedMeasuresNFA2014. E-mail: [email protected], [email protected] 2 / 84 university of copenhagen department of biostatistics Typical set-up for repeated measurements Two or more groups of subjects (typically receiving different treatments) Randomization at baseline Longitudinal measurements of the same quantity over time for each subject, typically as a function of time (duration of treatment) age cumulative dose of some drug Level 1: Single observations Level 2: Patients/Subjects 3 / 84 university of copenhagen department of biostatistics Traditional presentation of data Example: Aspirin absorption for healthy and ill subjects (Matthews et.al.,1990) 4 / 84

Correlated data I NFA, May 22, 2014

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Faculty of Health Sciences

Correlated data

NFA, May 22, 2014Longitudinal measurements

Julie Lyng Forman & Lene Theil SkovgaardDepartment of BiostatisticsUniversity of Copenhagen

1 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Repeated measurements over time

I Presentation of data.I Simple approaches.I Correlation structures.I Random regression.I Baseline considerations.

Home page:http://staff.pubhealth.ku.dk/~jufo/RepeatedMeasuresNFA2014.htmlE-mail: [email protected], [email protected]

2 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Typical set-up for repeated measurements

I Two or more groups of subjects(typically receiving different treatments)

I Randomization at baselineI Longitudinal measurements of the same quantity over time for

each subject, typically as a function ofI time (duration of treatment)I ageI cumulative dose of some drug

Level 1: Single observationsLevel 2: Patients/Subjects

3 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Traditional presentation of data

Example: Aspirin absorption for healthy and ill subjects(Matthews et.al.,1990)

4 / 84

Page 2: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Problems with the traditional presentation of data

I Comparison of groups for each time point separatelyI is inefficientI has a high risk of leading to chance significanceI Tests are not independent,

since they are carried out on the same subjectsI Interpretation may be difficult

I Changes over timeI cannot be evaluated ’by eye’

because we cannot see the pairing

5 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Take care with average curves

I They may hide important structuresI They give no indication of the variation in the time profilesI Always make a picture of individual time profilesI Do not average over individual profiles, unless these have

identical shapes, i.e. only shifts in level are seen betweenindividuals.

I Alternative:Calculate individual characteristics

6 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Individual time profiles

Spaghettiplot - divided into groups

Do we see time profilesof identical shape?

Are the averagesrepresentative?

7 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Commonly used characteristics

I The response for selected times, e.g. endpointI AverageI The slope, perhaps for a specific periodI Peak valueI Time to peakI The area under the curve (AUC).I A measure of cyclic behaviour.

These are analyzed as new observations.

8 / 84

Page 3: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Aspirin absorption

Characteristics:I time to peakI peak value

Conclusion:P=0.02 for identity ofpeak values.

Remember quantifications!

9 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Repeated measurement designs

MeritsI It is much more powerful in detecting time changes

(data are ’paired’ with the subject as its own control)I We may discover that subjects have different time courses

(In designs with only cross-sectional data, this may also be thecase, but we have no way of knowing!)

I We may identify important characteristics of the time courses,specific for each subject (trend, peak etc.)

10 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Repeated measurement designs

DrawbacksI Traditional independence assumption is violated since repeated

observations on the same individual are correlated (look alike)I Traditional anova-models become impossibleI Comparison of time averages (or other characteristics) cannot

incorporate time dependent covariates

11 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Calcium supplements

A total of 112 11-year old girls were randomized to receive eithercalcium or placebo.

Outcome:BMD=bone mineral density, in g

cm2 ,measured every 6 months (5 visits)

Scientific question:Does calcium improve the rate of bone gain for adolescent women?

12 / 84

Page 4: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Individual profiles

13 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Factor diagram

[I ] = [Girl ∗Visit]

Grp ∗Visit

[Girl]

Visit

Grp

HHHHj

����*

�����3

-

-

Two-level model with:I Observations Ygit (group=g, girl=individual=i, visit=time=t)I Systematic (fixed) effects of group and visit, with a possible

interactionI Random girl-level, Var(agi) = ω2

B, i.e., random interceptI Residual variation, within girls, Var(εgit) = σ2

W

14 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Multilevel model structure

Level 1 2Unit single measurements girlsVariation within girls between girls

σ2W ω2

B

Covariates x zvisit, grp*visit grp

15 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random girl level in SAS

Ygit = µgt + Agi + εgit

proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth s;random girl(grp);run;

I Girls are nested in groups, specified by the notationrandom girl(grp);

I satterth could be replaced by kenwardroger

16 / 84

Page 5: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The options

ddfm=satterth (- or kenwardrogers):

I When the distributions are exact, they have no effectI in balanced situations

I When approximations are necessary,these two are considered best

I in unbalanced situations,i.e for almost all observational designs

I in case of missing observationsI It may give rise to fractional degrees of freedomI The computations may require a little more time,

but in most cases this will not be noticableI When in doubt, use it!

17 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random girl level, output

Covariance Parameter Estimates (REML)

Cov Parm Estimate

GIRL(GRP) 0.00443925Residual 0.00023471

Tests of Fixed Effects

Source NDF DDF Type III F Pr > F

GRP 1 110 2.63 0.1078VISIT 4 382 619.42 0.0001GRP*VISIT 4 382 5.30 0.0004

No doubt, we see an interaction GRP*VISIT

18 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random girl level, output, continued

Solution for Fixed EffectsStandard

Effect grp visit Estimate Error DF t Value Pr > |t|Intercept 0.9576 0.009131 122 104.87 <.0001grp C 0.02951 0.01304 122 2.26 0.0254grp P 0 . . . .visit 1 -0.08750 0.003100 382 -28.22 <.0001visit 2 -0.06748 0.003103 381 -21.75 <.0001visit 3 -0.04342 0.003117 381 -13.93 <.0001visit 4 -0.01619 0.003148 381 -5.14 <.0001visit 5 0 . . . .grp*visit C 1 -0.01912 0.004445 382 -4.30 <.0001grp*visit C 2 -0.01255 0.004448 381 -2.82 0.0050grp*visit C 3 -0.00622 0.004480 381 -1.39 0.1661grp*visit C 4 -0.00679 0.004517 381 -1.50 0.1337grp*visit C 5 0 . . . .grp*visit P 1 0 . . . .grp*visit P 2 0 . . . .grp*visit P 3 0 . . . .grp*visit P 4 0 . . . .grp*visit P 5 0 . . . .

I Difference between groups at visit 5I Difference increases over time

19 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model checks

Two types of residuals:

Ordinary Observed minus predicted group mean(only systematic effects)

Ygit − µgt

Conditional Observed minus predicted individual mean value(systematic and random effects)

εgit = Ygit − µgt − Agi

20 / 84

Page 6: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model check, ordinary residuals

21 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model check, conditional residuals

22 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Correlated observations

If we fail to take the correlation into account,we will experience:

I possible bias in the mean value structurein unbalanced situations

I low efficiency (type 2 error) for estimationof level 1 covariates (time-related effects)

I too small standard errors (type 1 error) for estimatesof level 2 effects (treatments)

23 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Level 1 covariates (unit: single observations)

I Time itselfI Covariates varying with time:

blood pressure, heart rate, ageI Interaction between group and time

If correlation is not taken into account, we ignore the pairedsituation, leading to low efficiency, i.e. too large P-valuesType 2 error

Effects may go undetected!

24 / 84

Page 7: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Incorrect analysis

Correlation ignored

proc glm data=calcium;class grp visit;model bmd=grp visit grp*visit / solution;

run;

Source DF Type III SS Mean Square F Value Pr > Fgrp 1 0.05063714 0.05063714 11.05 0.0010visit 4 0.64369784 0.16092446 35.10 <.0001grp*visit 4 0.00649557 0.00162389 0.35 0.8411

The interaction is not detectedsince we “forgot” the pairing

25 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Level 2 covariates (unit: individuals)

I TreatmentI Gender, age

If correlation is ignored, we act as if we have more informationthan we actually have, leading to too small P-valuesType 1 error

’Noise’ may be taken to be real effects!

26 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model synonyms

I Two-level model

I Model with random subject levelsI Model with random intercepts

I Model with compound symmetry correlation structureI Model with exchangeability correlation structure

27 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Compound symmetry = Exchangeability

The two-level model assumes compound symmetry,i.e., that all measurements on the same individual areequally correlated:

Corr(Ygit1 ,Ygit2) = ρ = ω2B

ω2B + σ2

W

This means that the distance in time is not taken into account!!

Observations are exchangeable

28 / 84

Page 8: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Compound symmetry structure

ω2B + σ2

W ω2B ω2

B ω2B ω2

Bω2

B ω2B + σ2

W ω2B ω2

B ω2B

ω2B ω2

B ω2B + σ2

W ω2B ω2

Bω2

B ω2B ω2

B ω2B + σ2

W ω2B

ω2B ω2

B ω2B ω2

B ω2B + σ2

W

= (ω2B + σ2

W )

1 ρ ρ ρ ρρ 1 ρ ρ ρρ ρ 1 ρ ρρ ρ ρ 1 ρρ ρ ρ ρ 1

But:Observations taken close to each other in time will often be moreclosely correlated than observations taken further apart!

29 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimation of correlation ρ

The correlation ρ is estimated by using output from p. 18

ρ = Corr(Ygdt1 ,Ygdt2) = ω2B

ω2B + σ2

W

≈ 0.004439250.00443925 + 0.00023471 = 0.9498

High “tracking”, meaning that each girl stays more or less atthe same rank in the group

30 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Alternative specifications

Note, that the specification ’random girl(grp);’can be written in two other ways:

random intercept / subject=girl(grp);

repeated visit / type=CS subject=girl(grp);

CS: Compound symmetryIn the following, we shall see generalizationsof the constructions above.

31 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Compound symmetry analysis (just checking...)

proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth outpredm=fit_cs;

repeated visit / type=CS subject=girl(grp) rcorr;run;

Estimated R Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9498 0.9498 0.9498 0.94982 0.9498 1.0000 0.9498 0.9498 0.94983 0.9498 0.9498 1.0000 0.9498 0.94984 0.9498 0.9498 0.9498 1.0000 0.94985 0.9498 0.9498 0.9498 0.9498 1.0000

32 / 84

Page 9: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Compound symmetry analysis, continued

Covariance Parameter Estimates

Cov Parm Subject EstimateCS girl(grp) 0.004439Residual 0.000235

Fit Statistics-2 Res Log Likelihood -2188.8 <-----------used later

(p. 38+47)Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fgrp 1 110 2.63 0.1078visit 4 382 619.42 <.0001grp*visit 4 382 5.30 0.0004

33 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Empirical correlation structure

Row COL1 COL2 COL3 COL4 COL51 1.00000000 0.96987049 0.94138162 0.92499715 0.898654542 0.96987049 1.00000000 0.97270895 0.95852788 0.939871853 0.94138162 0.97270895 1.00000000 0.98090996 0.959193484 0.92499715 0.95852788 0.98090996 1.00000000 0.975538495 0.89865454 0.93987185 0.95919348 0.97553849 1.00000000

Is compound symmetry reasonable? Hardly....

Other possibilities:I Unstructured: T(T+1)

2 = 15 covariance parameters (T = 5)I ’Patterned’, e.g. an autoregressive structureI Random regression

34 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Unstructured covariance

If we do not assume any specific structure for the covariance, wemay let it be arbitrary = unstructured

This is done in MIXED by using ’type=UN’

proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit / ddfm=satterth outpredm=fit_un;

repeated visit / type=UN subject=girl(grp) rcorr;run;

35 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output from TYPE=UN model

Estimated R Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9699 0.9414 0.9250 0.89872 0.9699 1.0000 0.9727 0.9585 0.93993 0.9414 0.9727 1.0000 0.9809 0.95924 0.9250 0.9585 0.9809 1.0000 0.97555 0.8987 0.9399 0.9592 0.9755 1.0000

Fit Statistics-2 Res Log Likelihood -2346.3 <-----------used later

(p. 38+47)Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fgrp 1 109 2.55 0.1129visit 4 97.1 258.08 <.0001grp*visit 4 97.1 2.79 0.0303

36 / 84

Page 10: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Unstructured covariance

Advantages with unstructured covarianceI We do not force a wrong covariance structure upon our

observations.I We gain some insight in the actual structure of the covariance.

Drawbacks of the unstructured covarianceI We use quite a lot of parameters to describe the covariance

structure. The result may therefore be unstable.I It cannot be used for small data setsI It can only be used in case of balanced data

(all subjects have to be measured at identical times)

Can we do something ’in between’?37 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Comparison of covariance structures

Use the likelihood:I Good models have large values of likelihood L and therefore

small values of deviance: −2 log LI Use differences in deviances (∆ = −2 log Q) and compare toχ2 with degrees of freedom equal to the difference inparameters

Comparison of CS and UN:

−2 log Q = 2346.3− 2188.8= 157.5 ∼ χ2(15− 2) = χ2(13)⇒ P < 0.0001

Compound symmetry is not suitable

38 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Different forms of the likelihood

I Default likelihood is the REML-likelihood, where the meanvalue structure has been ’eliminated’

I The traditional likelihood may be obtained using an extraoption:proc mixed method=ml;

I Comparison of covariance structures:Use either of the two likelihoods

I Comparison of mean value structures:Use only the traditional likelihood (ML) only!

I But mostly, just use the default F-tests

39 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Autoregressive structure - of first order

In case of equidistant times, this specifies the followingcovariance structure

σ2

1 ρ ρ2 ρ3 ρ4

ρ 1 ρ ρ2 ρ3

ρ2 ρ 1 ρ ρ2

ρ3 ρ2 ρ 1 ρρ4 ρ3 ρ2 ρ 1

i.e. the correlation decreases (in powers) with the distance betweenobservations.

The non-equidistant analogue is Corr(Ygit1 ,Ygit2) = ρ|t1−t2|

40 / 84

Page 11: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Autoregressive correlation

– as a function of distance between the measurementsfor ρ = 0.1, . . . , 0.9

41 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Autoregressiove covariance structure in SAS

proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit /

ddfm=satterth outpredm=fit_ar1;repeated visit / type=AR(1)

subject=girl(grp) rcorr;run;

42 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output from TYPE=AR(1) structure

Estimated R Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9708 0.9425 0.9150 0.88832 0.9708 1.0000 0.9708 0.9425 0.91503 0.9425 0.9708 1.0000 0.9708 0.94254 0.9150 0.9425 0.9708 1.0000 0.97085 0.8883 0.9150 0.9425 0.9708 1.0000

Covariance Parameter Estimates

Cov Parm Subject EstimateAR(1) girl(grp) 0.9708Residual 0.004412

Fit Statistics-2 Res Log Likelihood -2318.6 <-----------used later

(p. 47)

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fgrp 1 113 2.74 0.1005visit 4 382 233.91 <.0001grp*visit 4 382 2.86 0.0232

43 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Note:Comparison of models with different covariance structuresrequires, that the models are nested

This is not the case for CS and AR(1)!

Therefore, we have to compare both of them with the model whichcombines the two covariance structures:

proc mixed data=calcium;class grp girl visit;model bmd=grp visit grp*visit /

ddfm=satterth outpredm=fit_ar1;random intercept / subject=girl(grp) g;repeated visit / type=AR(1)

subject=girl(grp) rcorr;run;

44 / 84

Page 12: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Combination of CS and AR(1)

In case of equidistant times, this combined model specifies thefollowing covariance structure

ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2 ω2 + σ2ρ3 ω2 + σ2ρ4

ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2 ω2 + σ2ρ3

ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρ ω2 + σ2ρ2

ω2 + σ2ρ3 ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2 ω2 + σ2ρω2 + σ2ρ4 ω2 + σ2ρ3 ω2 + σ2ρ2 ω2 + σ2ρ ω2 + σ2

Dividing all entries with ω2 + σ2 gives the correlation matrix,with correlations

ω2 + σ2ρd

ω2 + σ2

45 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output from CS+AR(1)

Estimated R Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9708 0.9425 0.9150 0.88832 0.9708 1.0000 0.9708 0.9425 0.91503 0.9425 0.9708 1.0000 0.9708 0.94254 0.9150 0.9425 0.9708 1.0000 0.97085 0.8883 0.9150 0.9425 0.9708 1.0000

Covariance Parameter EstimatesCov Parm Subject Estimate

Intercept girl(grp) 0AR(1) girl(grp) 0.9708Residual 0.004413

Fit Statistics-2 Res Log Likelihood -2318.6 <-----------used later

(p. 47)

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fgrp 1 113 2.74 0.1005visit 4 382 233.91 <.0001grp*visit 4 382 2.86 0.0232

46 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Comparison of covariance structures

cov. ∆ =Model -2 log L par. −2 log Q df PUN 2346.3 15

AR(1) + CS 2318.6 3 27.7 12 0.006

AR(1) 2318.6 2 0 1 1

CS 2188.8 2 129.8 1 < 0.0001

Conclusions?

I The autoregressive structure is definitely better than CS,but not good quite enough...

47 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Test of no interaction GRP*VISIT

– for various choices of covariance structure

Covariance structure Test statistic ∼ distribution P valueIndependence 0.35 ∼ F(4,491) 0.84Compound symmetry 5.30 ∼ F(4,382) 0.0004Autoregressive 2.86 ∼ F(4,382) 0.023Unstructured 2.72 ∼ F(4,107) 0.034

Not quite identical conclusions!

48 / 84

Page 13: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Adding an extra level

What, if we had had double or triple measurementsat each visit?

I If we always have the same number of repetitions,a correct and optimal approach is to analyze averages

I If the number of repetitions vary, analysis of averages maystill be valid (depends on the reason for the unbalance),although not optimal

I The easiest approach is to modify the random-statement to:

random girl girl*visit;

49 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Predicted mean time profiles

In SAS: Use the outpredm=newdata-optionand make pictures using this new data set

Profiles are almost identical for all choices of covariance structuresI For balanced designs, they agree completely

and equal simple averagesI They agree for time points with no missing values

(here the first visit)

50 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Predicted profiles for the unstructured covariance

I The evolution over time looks pretty linearI Include time=visit as a quantitative covariate?I What about the baseline difference?

later....

51 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Individual growth rates?

The time course is reasonably linear, but maybe the girls havedifferent growth rates (slopes)?

If we let Ygit denote BMD for the i’th girl (in the g’th group)at time t (t=1,· · · ,5), we could look at the model:

ygit = Agi + Bgit + εgit , εgit ∼ N (0, σ2)

i.e., with different intercepts and different slopes for each girl

52 / 84

Page 14: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Fit a straight line for each girl

Scatterplot of slopes vs. levels at first visit:

Slopes in the Calcium-group (blue dots) seem to be bigger....

53 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Results from individual regression

Estimates with standard errors in brackets:

Group Level at age 11 SlopeP 0.8697 (0.0086) 0.0206 (0.0014)C 0.8815 (0.0088) 0.0244 (0.0014)

Difference 0.0118 (0.0123) 0.0039 (0.0019)P-value 0.34 0.050

54 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random regression

– a generalization of the idea of a random level

We let each individual (girl)have

I her own level AgiI her own slope Bgi

but...

55 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

... we bind these individual ’parameters’ (Agi and Bgi) together bynormal distributions

(AgiBgi

)∼ N2

((αgβg

),G)

G =(τ2

a ωω τ2

b

)=(

τ2a ρτaτb

ρτaτb τ2b

)

G describes the population variation of the lines, i.e. theinter-individual variation (as seen on p. 53).

56 / 84

Page 15: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimation in random regression

proc mixed covtest data=calcium;class grp girl;model bmd=grp time time*grp / ddfm=satterth s;random intercept time /

type=un subject=girl(grp) g vcorr;run;

type=un in the random-statement refers to the matrix Gon the previous slide

57 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output from random regression

Covariance Parameter Estimates

Standard ZCov Parm Subject Estimate Error Value Pr ZUN(1,1) girl(grp) 0.004105 0.000575 7.13 <.0001UN(2,1) girl(grp) 3.733E-6 0.000053 0.07 0.9435UN(2,2) girl(grp) 0.000048 8.996E-6 5.30 <.0001Residual 0.000125 0.000010 11.99 <.0001

Fit Statistics-2 Res Log Likelihood -2341.6

Estimated G Matrix

Row Effect grp girl Col1 Col2

1 Intercept C 101 0.004105 3.733E-62 time C 101 3.733E-6 0.000048

58 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output II: Estimated covariance and correlation

Estimated V Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col5

1 0.004285 0.004211 0.004263 0.004314 0.0043662 0.004211 0.004435 0.004410 0.004509 0.0046083 0.004263 0.004410 0.004681 0.004703 0.0048504 0.004314 0.004509 0.004703 0.005022 0.0050925 0.004366 0.004608 0.004850 0.005092 0.005459

Estimated V Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col51 1.0000 0.9660 0.9518 0.9300 0.90272 0.9660 1.0000 0.9677 0.9553 0.93643 0.9518 0.9677 1.0000 0.9700 0.95944 0.9300 0.9553 0.9700 1.0000 0.97255 0.9027 0.9364 0.9594 0.9725 1.0000

59 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Output III: Estimated mean value structure

Solution for Fixed Effects

StandardEffect grp Estimate Error DF t Value Pr > |t|Intercept 0.8471 0.008645 110 97.98 <.0001grp C 0.007058 0.01234 110 0.57 0.5685grp P 0 . . . .time 0.02242 0.001098 95.8 20.42 <.0001time*grp C 0.004494 0.001571 96.4 2.86 0.0052time*grp P 0 . . . .

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fgrp 1 110 0.33 0.5685time 1 96.4 985.55 <.0001time*grp 1 96.4 8.18 0.0052

Thus, we find an extra increase in BMD of 0.0045(0.0016) g percm3 per half year, when giving calcium supplement.

60 / 84

Page 16: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Note concerning MIXED-notation

I It is necessary to use TYPE=UN in the RANDOM-statementin order to allow intercept and slope to be arbitrarily correlated

I Default option in RANDOM is TYPE=VC, which only specifiesvariance components with different variances

I If TYPE=UN is omitted, we may experience convergenceproblems and sometimes totally incomprehensible results.

In this particular case, the correlation between intercept and slopeis not that impressive - actually only 0.0084 -(intercept is not completely out of range in this example,since it refers to visit=0).

61 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Individual regressions approach

Merits:I Easy to understand and interpret

Drawbacks:I Suboptimal in case of unequal sample sizesI Only simple models feasibleI Difficult/impossible to include covariates

62 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random regression approach

Merits:I Uses all available informationI Optimal procedure if the model holdsI Easy to include covariates

Drawbacks:I Biased in case of informative missing values or informative

sample sizes

63 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

What to choose in this example?

I Random regression givesI steeper slopeI almost identical levels at age 11

I The girls with flat low profiles tend to be shorter:

Why??

64 / 84

Page 17: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Actually – as it always happens –

I The girls are only seen approximately twice a yearI The actual dates are available and are translated into ctime,

the internal date representation in SAS, denoting days since....

I We can no longer use the construction type=UN,but still the random-statement and the CS in therepeated-statement.

I A lot of other covariance structures will still be possible, e.g.The non-equidistant analogue to the autoregressive structureis Corr(Ygit1 ,Ygit2) = ρ|t1−t2|

which is written as TYPE=SP(POW)(ctime)

65 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Furthermore,

I the girls were not precisely 11 years at the first visit

As a covariate, we ought to have the specific age of the girl,but unfortunately, these are not available.

Note, that this will mostly affect the intercept estimates!

66 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Using the newly constructed ctime as covariate

proc mixed covtest data=calcium;class grp girl;model bmd=grp ctime ctime*grp / ddfm=satterth s;random intercept ctime / type=un subject=girl(grp) g;run;

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 -1221.358005311 2 -2316.64715219 0.020232292 1 -2316.64847895 0.020111173 1 -2316.64847962 0.02010938

48 1 -2317.30142030 0.0173756149 1 -2317.30142036 0.0173756150 1 -2317.30142043 0.01737561

WARNING: Did not converge.

67 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Numerical instabilityThe variable ctime has much too large values, with a very smallrange.

We normalize, to approximate age or age11:

age=(ctime-11475)/365.25+12;age11=age-11; /* intercept at age 11 */

Variable N Mean Minimum Maximum-------------------------------------------------------ctime 501 11475.08 11078.00 11931.00bmd 501 0.9219202 0.7460000 1.1260000visit 560 3.0000000 1.0000000 5.0000000age 501 12.0002186 10.9130732 13.2484600age11 501 1.0002186 -0.0869268 2.2484600-------------------------------------------------------

68 / 84

Page 18: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random regression, covariate age11:

ygit = Agi + Bgi(age-11) + εgit

69 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Random regression, using actual ageIntercept at age 11: age11=age-11:

proc mixed covtest data=calcium;class grp girl;model bmd=grp age11 age11*grp / ddfm=satterth s outpm=predicted_mean;random intercept age11 /

type=un subject=girl(grp) g vcorr;run;

Estimated G Matrix

Row Effect grp girl Col1 Col2

1 Intercept C 101 0.004215 0.0000952 age11 C 101 0.000095 0.000180

Estimated V Correlation Matrix for girl(grp) 101 C

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.9664 0.9537 0.9321 0.90562 0.9664 1.0000 0.9687 0.9566 0.93853 0.9537 0.9687 1.0000 0.9697 0.95904 0.9321 0.9566 0.9697 1.0000 0.97235 0.9056 0.9385 0.9590 0.9723 1.0000

70 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Covariance Parameter Estimates

Standard ZCov Parm Subject Estimate Error Value Pr Z

UN(1,1) girl(grp) 0.004215 0.000580 7.26 <.0001UN(2,1) girl(grp) 0.000095 0.000104 0.91 0.3617UN(2,2) girl(grp) 0.000180 0.000034 5.21 <.0001Residual 0.000124 0.000010 12.01 <.0001

Solution for Fixed Effects

StandardEffect grp Estimate Error DF t Value Pr > |t|

Intercept 0.8667 0.008688 110 99.75 <.0001 /grp C 0.01113 0.01240 110 0.90 0.3715 ----grp P 0 . . . . \age11 0.04529 0.002152 96 21.05 <.0001 /age11*grp C 0.008891 0.003076 96.6 2.89 0.0048 ----age11*grp P 0 . . . . \

In this model, we quantify the effect of a calcium supplement to0.0089 (0.0031) g per cm3 per year.

71 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Results from random regression

Group Level at age 11 SlopeP 0.8667 (0.0087) 0.0453 (0.0022)C 0.8778 (0.0088) 0.0542 (0.0022)

Difference 0.0111 (0.0124) 0.0089 (0.0031)P 0.37 0.0048

Compare to results from individual regressions (p. 54):

72 / 84

Page 19: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Comparison of slopes for different covariance structures

Covariance −2 log L Cov.par. Differencestructure in slopes P

Independence -1245.0 1 0.0094 (0.0086) 0.27

Compound -2251.7 2 0.0089 (0.0020) < 0.0001Symmetry

Power -2372.0 2 0.0094 (0.0032) 0.0038(Autoregressive)

Random -2350.1 4 0.0089 (0.0031) 0.0048Regression

73 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Reparametrization, intercept at age 13

Using age13=age-13 as covariate, and no baseline yet:

proc mixed covtest noclprint data=calcium;class grp girl;model bmd=grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;

Solution for Fixed Effects

StandardEffect grp Estimate Error DF t Value Pr > |t|Intercept 0.9573 0.009819 108 97.49 <.0001grp C 0.02891 0.01402 108 2.06 0.0416grp P 0 . . . .age13 0.04529 0.002152 96 21.05 <.0001age13*grp C 0.008891 0.003076 96.6 2.89 0.0048age13*grp P 0 . . . .

Estimated gain at the age 13: 0.0289 (0.0140) g per cm3

74 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Predicted values from random regression

It looks as if there is a difference right from the start (although wehave previously seen this to be insignificant, P=0.37).

Baseline adjustment?

75 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

It the first visit is a baseline measurement (which it is),and randomization has been performed:

I The two groups are known to be equal at baselineI To include this measurement in the comparison between

groupsI may weaken a possible difference between these

(type 2 error)I may convert a treatment effect to an interaction

I Dissimilarities may be present in small studiesI For ’slowly varying’ outcomes, even a small difference may

produce non-treatment related differences, i.e. bias

76 / 84

Page 20: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Hypothetical comparison of two treatment groups, A

I Truth: Constant difference between the treatmentsI Finding: Interaction between time and treatment

77 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Hypothetical comparison of two treatment groups, B

I Truth: No effect of treatmentI Finding: Constant difference between treatments

78 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Baseline difference I: Observational studies

Research question:Compare the outcomes for individuals from different groups(e.g. gender or illness groups):

I The groups are likely to differ in many respects, includingbaseline outcome value.

I Differences in the outcome may be due to any of thesecharacteristics, and the answer will depend on which of theseare included in the model.

I Adjust for the covariates that are sensible in the context.The scientific question answered depends upon the model

79 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Baseline difference II: Randomized studies

Research question:Compare the outcomes for individuals treated differently, butotherwise identical (with respect to all baseline characteristics,including baseline outcome value):

I There ought to be no difference in either covariates orbaseline outcome.

I Even so, small chance differences in baseline may createimportant outcome differences that may erroneously be takento be treatment effects, if the covariate (or baseline) is highlypredictive of the outcome.

I Use baseline measurement as a covariate (Ancova)to adjust for chance differences.

I Adjust also for other important covariates.The scientific question answered depends upon the model80 / 84

Page 21: Correlated data I NFA, May 22, 2014

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Approaches for handling baseline

in randomized studiesI Use follow-up data only (exclude baseline from analysis)

- most reasonable if correlation between repeatedmeasurements is very low

I Subtract baseline from successive measurements- most reasonable if correlation between repeatedmeasurements is very high

I Use baseline measurement as a covariate (Ancova)- may be used for any degree of correlationand gives the most sensible interpretation

81 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Excluding baseline (4 follow-up visits only)

without baseline as covariate:

proc mixed covtest noclprint data=calcium; where visit>1;class grp girl;model bmd=grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;

Solution for Fixed Effects

StandardEffect grp Estimate Error DF t Value Pr > |t|

Intercept 0.9574 0.009721 102 98.49 <.0001grp C 0.02474 0.01383 102 1.79 0.0765grp P 0 . . . .age13 0.04634 0.002288 92.3 20.25 <.0001age13*grp C 0.007456 0.003277 92.5 2.28 0.0252age13*grp P 0 . . . .

Estimated gain at the age 13: 0.0247 (0.0138) g per cm3

82 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Including baseline as covariate

proc mixed covtest noclprint data=calcium; where visit>1;class grp girl;model bmd=baseline grp age13 grp*age13 / ddfm=satterth s;random intercept age13 / type=un subject=girl(grp) g;run;

Solution for Fixed Effects

StandardEffect grp Estimate Error DF t Value Pr > |t|

Intercept 0.01825 0.02690 106 0.68 0.4989baseline 1.0797 0.03054 102 35.36 <.0001grp C 0.01728 0.006236 101 2.77 0.0067grp P 0 . . . .age13 0.04597 0.002287 93.1 20.11 <.0001age13*grp C 0.007419 0.003276 93.2 2.26 0.0258age13*grp P 0 . . . .

Estimated gain at the age 13: 0.0173 (0.0062) g per cm3

83 / 84

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Effect of including baseline as covariate

I explains some (but not all) of the difference between groupsat age 13without baseline: 0.0247 (0.0138)baseline as covariate: 0.0173 (0.0062)

I increases the precision of the estimated difference(standard error becomes smaller)It even becomes significant!

84 / 84