37
Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends on Pathogen Genomics– Part I Peter B Gilbert Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center and Department of Biostatistics, University of Washington June 22, 2016 PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 1 / 37

Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Session 9: Introduction to Sieve Analysis of PathogenSequences, for Assessing How VE Depends on Pathogen

Genomics– Part I

Peter B Gilbert

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center and Department ofBiostatistics, University of Washington

June 22, 2016

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 1 / 37

Page 2: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Outline of Module 8: Evaluating Vaccine Efficacy

07/14‐16/2014  • 1

Outline of Module 8: Evaluating Vaccine Efficacy

Session 1 (Halloran) Introduction to Study Designs for Evaluating VE

Session 2 (Follmann) Introduction to Vaccinology Assays and Immune Response

Session 3 (Gilbert) Introduction to Frameworks for Assessing Surrogate Endpoints/Immunological Correlates of VE

Session 4 (Follmann) Additional Study Designs for Evaluating VE

Session 5 (Gilbert) Methods for Assessing Immunological Correlates of Risk and Optimal Surrogate Endpoints

Session 6 (Gilbert) Effect Modifier Methods for Assessing Immunological Correlates of VE (Part I)

Session 7 (Gabriel) Effect Modifier Methods for Assessing Immunological Correlates of VE (Part II)

Session 8 (Sachs) Tutorial for the R Package pseval for Effect Modifier Methods for Assessing Immunological Correlates of VE

Session 9 (Gilbert) Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends on Pathogen Genomics

Session 10 (Follmann) Methods for VE and Sieve Analysis Accounting for Multiple Founders

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 2 / 37

Page 3: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

1

5.2.2016 • 0

Sieve Analysis

Figure 1 from Gilbert, Self, Ashby (1998, Biometrics)

Natural Barrier toHIV Infection

Placebo Group Vaccine Group

Vaccine BarrierTo HIV Infection

0 1 2 3 …0 1 2 3 …

5

4

3

2

1

5

4

3

2

1

# Is

ola

tes

# Is

ola

tes

Distribution ofInfecting Strain Distribution of

Infecting Strain

Circulating HIV StrainsIn the setting of the vaccine trial

0, 1, 2, 3, 4 …

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 3 / 37

Page 4: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Outline of Session 9

1 Sieve Analysis Via Cumulative and Instantaneous VE Parameters

2 Cumulative VE Approach: NPMLE and TMLE

3 Mark-Specific Proportional Hazards Model

4 Example 1: RV144 HIV-1 Vaccine Efficacy Trial

5 Example 2: RTS,S Malaria Vaccine Efficacy Trial

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 4 / 37

Page 5: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Cumulative Genotype-Specific VE

• T = time from study entry (or post immunization series) until studyendpoint through to time τ1 (e.g., HIV-1 infection)

• t = fixed time point of interest t < τ1

• Discrete genotype-specific cumulative VE

VE cml/disc(t, j) =

[1− P(T ≤ t, J = j |Vaccine)

P(T ≤ t, J = j |Placebo)

]× 100%, t ∈ [0, τ1]

• Continuous genetic distance-specific cumulative VE

VE cml/cont(t, v) =

[1− P(T ≤ t,V = v |Vaccine)

P(T ≤ t,V = v |Placebo)

]× 100%, t ∈ [0, τ1]

• J = discrete genotype subgroup such as binary, unordered categorical,ordered categorical

• V = (approximately) continuous genetic distance to a vaccine sequence

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 5 / 37

Page 6: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Cumulative VE Sieve Effect Tests

Fix t at the primary time point of interest

• VE cml/disc(t, j):

H0 : VE cml/disc(t, j) constant in j

Hmon1 : VE cml/disc(t, j) decreases in j

Hany1 : VE cml/disc(t, j) has some differences in j

• VE cml/cont(t, v):

H0 : VE cml/cont(t, v) constant in v

Hmon1 : VE cml/cont(t, v) decreases in v

Hany1 : VE cml/cont(t, v) has some differences in v

A “sieve effect” is defined by Hmon1 or Hany

1 being true (i.e., differential VE bypathogen genotype)

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 6 / 37

Page 7: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Illustration: Cumulative VE cml/disc(t = 14, j) for 3-Level J∗

Unadjusted Unadjusted UnadjustedAdjusted Adjusted Adjusted

Full Match Near Distant

−100%

−75%

−50%

−25%

0%

25%

50%

75%

100%

Gen

otyp

e−S

peci

fic C

umul

ativ

e V

EDiscrete Genotype−Specific Cumulative VE at t = 14 Months

No. Cases (V:P): 11:25 No. Cases (V:P): 13:23 No. Cases (V:P): 19:18

0.78

0.56

0.10p=0.033

0.76

0.58

0.14p=0.029

0.71

0.43

−0.13p=0.10

0.68

0.41

−0.12p=0.10

0.44

−0.06

−1.01p=0.87

0.42

−0.04

−0.89p=0.75

p=0.027

p=0.021

∗Aalen-Johansen (1978, Scand J Stat) nonparametric MLE (Aalen, 1978, Ann Stat;

Johansen, 1978, SJS); test for differential VE by Neafsey, Juraska et al. (2015, NEJM)PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 7 / 37

Page 8: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Illustration: Cumulative VE cml/cont(t = 14, v) forContinuous Distance V ∗

0.1 0.2 0.3 0.4 0.5

Genetic Distance to Vaccine Insert Sequence

−100%

−75%

−50%

−25%

0%

25%

50%

75%

100%

Gen

etic

Dis

tanc

e−S

peci

fic C

umul

ativ

e V

EContinuous Genetic Distance−Specific Cumulative VE at t = 14 Months

●●

●●●

●● ●●

●●● ●●

●● ●●

●●

●●● ● ●●

● ● ●●

●●

● ●●●

●●

●●Vaccine

● ●● ●

● ●●●

●● ●

●●●

●●

●●

●●● ●

●●●

● ●●●

● ●●

● ●●●

●● ● ●

● ●● ●

●●● ●●

●● ●● ●●

● ●●● ● ●

●Placebo

H00: p = 0.015H0: p = 0.10

No. Cases (V:P): 44:66

95% pointwise CI

∗Aalen-Johansen (1978, Scand J Stat) nonparametric MLE (Aalen, 1978, Ann Stat;

Johansen, 1978, SJS); test for differential VE by Neafsey, Juraska et al. (2015, NEJM)

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 8 / 37

Page 9: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Estimation of Cumulative VE Parameters: ApproachWithout Covariates

• Nonparametric maximum likelihood estimation and testing

Assumptions Required for Consistent Inference• No interference: Whether a subject experiences the malaria endpoint does

not depend on the treatment assignments of other subjects

• A randomized trial

• Random dropout: Whether a subject drops out by time t does not dependon observed or unobserved subject characteristics

• MCAR genotypes: Endpoint cases with missing pathogen genomes havemissingness mechanism Missing Completely at Random (MCAR)

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 9 / 37

Page 10: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Estimation of Cumulative VE Parameters: With Covariates

• Targeted minimum loss-based estimation (tMLE) and testing

Assumptions Required for Consistent Inference

• No interference

• A randomized trial

• Correct modeling of dropout

• Missing at Random genotypes

Advantages of approach with covariates

• Correct for bias due to covariate-dependent dropout

• Increase precision via covariates predicting the endpoint and/or dropout

• Correct for bias from covariate-dependent missing genotypes (e.g., pathogenload-dependent)

• Increase precision by predicting missing genotypes (the best predictors would bebased on pathogen sequences of later-sampled pathogens)

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 10 / 37

Page 11: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Instantaneous Genotype-Specific VE Parameters

• h(t, j) = Hazard of the malaria endpoint with discrete genotype j

• λ(t, v) = Hazard of the malaria endpoint with continuous genetic distance v

• Discrete genotype-specific instantaneous vaccine efficacy

VEhaz/disc(t, j) =

[1− h(t, j |Vaccine)

h(t, j |Placebo)

]× 100%

• Continuous genetic distance-specific instantaneous vaccine efficacy

VEhaz/cont(t, v) =

[1− λ(t, v |Vaccine)

λ(t, v |Placebo)

]× 100%

• Proportional hazards assumption: VE haz/disc(t, j) = VE haz/disc(j) andVE haz/cont(t, v) = VE haz/cont(v) for all t ∈ [0, τ1]

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 11 / 37

Page 12: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Illustration: Instantaneous VE haz/disc(j) for 3-Level J∗

Unadjusted Unadjusted UnadjustedAdjusted Adjusted Adjusted

Full Match Near Distant

−100%

−75%

−50%

−25%

0%

25%

50%

75%

100%

Gen

otyp

e−S

peci

fic In

stan

tane

ous

VE

Discrete Genotype−Specific Instantaneous VE to 14 Months

No. Cases (V:P): 12:25 No. Cases (V:P): 13:23 No. Cases (V:P): 19:18

0.76

0.52

0.05p=0.036

0.73

0.54

0.12p=0.031

0.71

0.44

−0.11p=0.10

0.69

0.42

−0.10p=0.11

0.45

−0.05

−1.01p=0.87

0.41

0.04

−0.95p=0.79

p=0.03

p=0.023

∗Gilbert (2000, Stat Med): genotype-specific Cox model

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 12 / 37

Page 13: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Illustration: Instantaneous VE haz/cont(v) for ContinuousDistance V ∗

0.1 0.2 0.3 0.4 0.5

Genetic Distance to Vaccine Insert Sequence

−100%

−75%

−50%

−25%

0%

25%

50%

75%

100%

Gen

etic

Dis

tanc

e−S

peci

fic In

stan

tane

ous

VE

Continuous Genetic Distance−Specific Instantaneous VE to 14 Months

●● ● ●●

●●

●● ●● ● ●●

●●

●●

● ● ●●

● ●●● ●●●

●●

●●●

●● ● ●●

●●

● ●●Vaccine

●● ●● ●● ●●●●

● ●● ●●●

●●

●● ●● ● ●

●●

●●● ●

●● ●

●● ● ●

●● ●

●●● ●● ●● ●● ●

● ●●

●● ●● ● ●●●

●Placebo

H00: p = 0.015H0: p = 0.10

No. Cases (V:P): 44:66

95% pointwise CI

∗Juraska and Gilbert (2013, Biometrics): overall endpoint Cox model + semiparametric

biased sampling model

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 13 / 37

Page 14: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Discussion of Instantaneous vs. Cumulative VE Approaches

• Disadvantages:• The instantaneous approach requires the extra assumption of proportional

hazards (typically fails because of waning VE)• The VE parameters are hard to interpret under violation of proportional

hazards• With currently available methods, cannot adjust for covariates without

changing the target parameter to one that is not of main interest• Must rely on a random dropout assumption (cannot allow dropout to depend

on covariates)• Cannot increase statistical power and precision by leveraging covariates, nor

flexibly correct for accidental confounding

• Advantages:• If proportional hazards holds, the VE parameter is interpretable in terms of

leaky genotype-specific vaccine efficacy• If proportional hazards approximately holds, may be reasonably interpretable

and have increased efficiency by aggregating the vaccine efficacy over all timepoints

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 14 / 37

Page 15: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Outline of Session 9

1 Sieve Analysis Via Cumulative and Instantaneous VE Parameters

2 Cumulative VE Approach: NPMLE and TMLE

3 Mark-Specific Proportional Hazards Model

4 Example 1: RV144 HIV-1 Vaccine Efficacy Trial

5 Example 2: RTS,S Malaria Vaccine Efficacy Trial

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 15 / 37

Page 16: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Ongoing Sieve Analysis Statistical Methods Research

• Replace augmented IPW with TMLE (Benkeser, Carone, and Gilbert, 2016)• Unbiased under weaker assumptions; more efficient

• The missing data methods assume a validation set– a subgroup of caseswhere the founding pathogen genotype(s) is known with certainty

• For pathogens that evolve very quickly post-infection (e.g., HIV-1), there maybe no validation set!

• Replace with measurement error methods, incorporating models predicting(imperfectly) founder HIV genotypes

• Targeted learning approaches with data adaptive genotype-specific VEtarget parameters that combine inference with model selection on themarks/genotypes

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 16 / 37

Page 17: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Cumulative Genotype-Specific VE : Aalen-JohansenNPMLE

Discrete genotype-specific cumulative VE

VE cml/disc(t, j) =

[1− P(T ≤ t, J = j |Vaccine)

P(T ≤ t, J = j |Placebo)

]× 100%, t ∈ [0, τ1]

• Observe T ≡ min(T ,C ) and ∆J ≡ I (T = T )J

• With independent censoring, identify P(T ≤ t, J = j |Z = z) via hazards:

Qzj (t) ≡ P(T = t,∆J = j |Z = z , T > t − 1)

Qz· (t) ≡

K∑

i=1

Qzi (t)

P(T ≤ t, J = j |Z = z) =t∑

t′=1

Qz

j (t ′)t′−1∏

s=1

{1− Qz· (s)}

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 17 / 37

Page 18: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Cumulative Genotype-Specific VE : Aalen-JohansenNPMLE

• Aalen-Johansen estimator plugs in empirical estimates

Qzj,n(t) =

No. type j events at t in group z

No. at risk at t-1 in group z

P(T ≤ t, J = j |Z = z) =t∑

t′=1

Qz

j,n(t ′)t′−1∏

s=1

{1− Qz·,n(s)}

Limitations• For consistency need random censoring (cannot depend on covariates)

• Efficient if no prognostic factors

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 18 / 37

Page 19: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Incorporating Covariates: TMLE

P(T ≤ t, J = j |Z = z) = EW [P(T ≤ t, J = j |Z = z ,W )]

=∑

w

P(T ≤ t, J = j |Z = z ,W = w)P(W = w |Z = z)

• TMLE optimizes bias-variance trade-off for estimating P(T ≤ t, J = j |Z = z)

• Incorporates flexible models of P(T ≤ t, J = j |Z = z ,W ) and ofP(C ≤ t|Z = z ,W )

• TMLEs are doubly robust and asymptotically normal• Also asymptotically efficient if both P(T ≤ t, J = j |Z = z ,W ) and

P(C ≤ t|Z = z ,W ) are estimated consistently

• Benkeser, Carone and Gilbert (2016) developed this TMLE, with R code

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 19 / 37

Page 20: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Mean Squared Error TMLE vs. Aalen-JohansenMean Squared ErrorTMLE vs. Aalen-Johansen

0.7

0.9

1.1

0

Low

0

High

Hig

h

0.7

0.9

1.1

None Med. High None Med High

Non

e

Covariate predictiveness of events

Level of censoring

Cov

aria

te p

redi

ctiv

enes

s of

cen

sorin

g

Rel

ativ

e m

ean

squa

red

erro

r

19 / 28

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 20 / 37

Page 21: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Power of Wald Tests TMLE vs. Aalen-JohansenPower of Wald TestsTMLE vs. Aalen-Johansen

Moderately prognostic covariates

VE(t, j)

Pow

er, %

0 0.25 0.5 0.75

2.5

2550

7510

0TMLEAalen−Johansen

Power relative to Aalen−Johansen

TMLE 1.00 1.02 1.03 1.01 1.00 1.00

20 / 28

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 21 / 37

Page 22: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Power of Wald Tests TMLE vs. Aalen-JohansenPower of Wald TestsTMLE vs. Aalen-Johansen

Strongly prognostic covariates

VE(t, j)

Pow

er, %

0 0.25 0.5 0.75

2.5

2550

7510

0TMLEAalen−Johansen

Power relative to Aalen−Johansen

TMLE 1.06 1.07 1.08 1.04 1.01 1.00

21 / 28PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 22 / 37

Page 23: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Sieve Analysis of RV144 Thai Trial

Background on Thai Trial

• Conducted 2004–2009 in the general population of Thailand

• 16,403 randomized 1:1 vaccine:placebo, primary endpoint HIV-1 infection by3.5 years

• VE = 31%, 95% CI 1% to 51%, p = 0.04 (Rerks-Ngarm et al., 2009, NEJM)

Thai Trial RV144

• 16,402 participants enrolled, results reported in 20091

• VE = 31% (1%− 52%), insufficient for licensure

1 Rerks-Ngarm et al (2009). NEJM 361(23)3 / 28

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 23 / 37

Page 24: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Sieve Analysis of RV144 Thai Trial

• Cox model (Lunn and McNeil, 1995, Biometrics) and Aalen-Johansen (1978)sieve analysis yielded the inference

VE cml/disc(3.5, v = 0) > VE cml/disc(3.5, v = 1)

with V defined by match (v = 0) vs. mismatch (v = 1) of the infectingHIV-1 with the vaccine sequences at position 169 of HIV-1 Env V2

• TMLE adjusting for rish behaviors, gender, age, gave a similar result withincreased precision (Benkeser, Carone, Gilbert, 2016); next slide

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 24 / 37

Page 25: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

TMLE Cumulative VE Sieve Results: RV144 Thai TrialThai Trial RV144 – TMLE Results

0.00

00.

004

0.00

8AA position 169 matched

Years since entry

Cum

ulat

ive

inci

denc

e

0 1 2 3

VaccinePlacebo

0.00

00.

004

0.00

8

AA position 169 mismatched

Years since entry

0 1 2 3

−0.

50.

00.

51.

0

Years since entry

Vac

cine

Effi

cacy

0 1 2 3

VEmatch(3.5) = 46% (14%,66%), p=0.01

−0.

50.

00.

51.

0

Years since entry

0 1 2 3

VEmismatch(3.5)= −39% (−229%,42%), p=0.46

23 / 28

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 25 / 37

Page 26: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Outline of Session 9

1 Sieve Analysis Via Cumulative and Instantaneous VE Parameters

2 Cumulative VE Approach: NPMLE and TMLE

3 Mark-Specific Proportional Hazards Model

4 Example 1: RV144 HIV-1 Vaccine Efficacy Trial

5 Example 2: RTS,S Malaria Vaccine Efficacy Trial

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 26 / 37

Page 27: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Mark-Specific Proportional Hazards Approach with MissingPathogen Sequences

• Sun and Gilbert (2012, Scand J Stat)

• Gilbert and Sun (2015, JRSS-B)

• These methods pose a continuous mark-specific proportional hazards modeland use inverse probability weighting (IPW) or augmented IPW

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 27 / 37

Page 28: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Competing Risks Model in Vaccine Efficacy Trials

• Conditional mark-specific hazard rate function:

λ(t, v |z)= limh1,h2→0

P{T ∈ [t, t + h1),V ∈ [v , v + h2)|T ≥ t,Z = z}h1h2

• Covariate-adjusted mark-specific vaccine VE:

VE(t, v |z) = 1− λv (t, v |z)

λp(t, v |z),

where λv (t, v |z) and λp(t, v |z) are the conditional mark-specific hazard functionsfor the vaccine and placebo groups, respectively

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 28 / 37

Page 29: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Mark-Specific Proportional Hazards Models

• Stratified mark-specific proportional hazards model:

λk(t, v |zki (t)) = λ0k(t, v)exp{β(v)T zki (t)

}, k = 1, . . . ,K

where λ0k(t, v) is an unspecified baseline function and β(v) is p-dimensionalregression coefficient functions

• z = (z1, z2); z1 = vaccine group indicator; z2 other covariates; β1(v) =coefficient corresponding to z1

Mark-specific vaccine efficacy:

VE (v) = 1− exp(β1(v))

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 29 / 37

Page 30: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Completely Observed Competing Risks Data

Completely observed competing risks data:

(Zki ,Xki , δki , δkiVki ), i = 1, · · · , nk , k = 1, . . . ,K ,

where Xki = min{Tki ,Cki}, δki = I (Tki ≤ Cki )

When the failure time Tki is observed, δki = 1 and the mark Vki is also observed,whereas if Tki is censored, the mark Vki is unknown

Assume Cki is independent of Tki and Vki conditional on Zki

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 30 / 37

Page 31: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Missing Marks in HIV Vaccine Efficacy Trials

Observed data

Oki = {Xki ,Zki , δki ,Rki ,RkiδkiVki , δkiAki}, i = 1 . . . , nk , k = 1, . . . ,K ,

Rki = complete-case indicator; Rki = 1 if Vki is known or if Tki is censored andRki = 0 otherwise

• Auxiliary variables Aki can be used to predict whether the mark is missingand to predict the missing marks

• E.g., Aki = sequence information from a later sampled virus

• Model the relationship between Aki and Vki to predict Vki

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 31 / 37

Page 32: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Inverse Probability Weighted Complete-Case Estimator

• rk(Wki , ψk) = parametric model for the probability of complete-case, whereψk is a q-dimensional parameter

• The IPW estimator βipw (v) solves the estimating equation for β:

Uipw (v , β, ψ) =K∑

k=1

nk∑

i=1

∫ 1

0

∫ τ

0

Kh(u − v)(Zki (t)− Zk(t, β, ψk)

)

Rki

πk(Qki , ψk)Nki (dt, du),

where

Zk(t, β, ψk) = S(1)k (t, β, ψk)/S

(0)k (t, β, ψk),

S(j)k (t, β, ψk) = n−1

k

nk∑

i=1

Rki (πk(Qki , ψk))−1Yki (t) exp{βTZki (t)}Zki (t)⊗j

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 32 / 37

Page 33: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Augmented IPW Complete-Case Estimator

• Wki = (Tki ,Zki ,Aki ) and w = (t, z , a)More efficient estimation can be achieved by incorporating the knowledge ofthe conditional mark distribution:

ρk(w , v) = P(Vki ≤ v |δki = 1,Wki = w)

=

∫ v

0λk(t, u|z)gk(a|t, u, z) du

∫ 1

0λk(t, u|z)gk(a|t, u, z) du

,

where gk(a|t, v , z) = P(Aki = a|Tki = t,Vki = v ,Zki = z , δki = 1)

• Let gk(a|t, u, z) be a parametric / semiparametric estimator of gk(a|t, u, z);then ρk(w , v) can be estimated by

ρipwk (w , v) =

∫ v

0λipwk (t, u|z)gk(a|t, u, z) du

∫ 1

0λipwk (t, u|z)gk(a|t, u, z) du

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 33 / 37

Page 34: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Analysis of the RV144 Thai Trial

• Assessed how VE against subtype CRF01 AE HIV-1 infection depends on aweighted Hamming distance (Nickle et al., 2007, PLoS One) of breakthroughHIV-1 sequences to the A244 reference sequence contained in the vaccine

• Include published gp120 AA sites in contact with broadly neutralizingmonoclonal antibodies

• T = time to HIV-1 infection diagnosis with subtype CRF01 HIV-1• Infection with subtype B or unknown subtype treated as right-censoring

• 106 HIV-1 subtype CRF01 AE infected participants (42 vaccine, 64 placebo);94 (37 vaccine, 57 placebo) with an observed mark

• Between 2 and 13 HIV-1 sequences (total 1030 sequences) per infectedparticipant

• V = participant-specific median distance

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 34 / 37

Page 35: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

HIV-1 Sequence Distances to the Vaccine Sequence A244

Placebo Recipients Vaccine Recipients

0.0

0.2

0.4

0.6

0.8

1.0

Distances of HIV Envelope gp120 sequences to the A244 reference sequence (V)

HIV

seq

uenc

e di

stan

ces

Figure: Boxplots of the marks for the 94 HIV infected subjects in the Thai trial with an observed mark. Themark V is the subject-specific median of weighted Hamming distances between each of the subject’s HIV Envelopegp120 amino acid sequences and the CM244 reference sequence contained in the HIV vaccine regimen.

Y. Sun and P. Gilbert Testing Mark-Specific Vaccine Efficacy with Missing Marks

Figure: Boxplots of the marks/distances V for the 94 HIV-1 CRF01 AE infected subjectsin the Thai trial with an observed mark

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 35 / 37

Page 36: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Vaccine Efficacy by gp120 HIV-1 Sequence Distance

Estimation without using auxiliary variable

Estima

te of VE

(v)

−0.75

−0.5

−0.25

0

0.25

0.5

0.75

1

0 0.2 0.4 0.6 0.8 1v

Figure: AIPW estimation of VE(v) and 95% pointwise confidence bands without using auxiliary variables forthe Thai trial with bandwidths h1 = 0.5, h2 = h = 0.3.

Y. Sun and P. Gilbert Testing Mark-Specific Vaccine Efficacy with Missing Marks

Figure: IPW point and 95% interval estimates of VE(v) for the Thai trial withbandwidths h1 = 0.5, h2 = h = 0.3

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 36 / 37

Page 37: Session 9: Introduction to Sieve Analysis of Pathogen Sequences, … · 2020-01-03 · Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends

Selected Literature on Sieve Analysis Methods

1 Proportional hazards VE for a discrete genotype (Gilbert, 2000, 2001, Stat Med,Cox model)

2 Extension of 1. accounting for missing data on genotypes (Hyun, Lee, and Sun,2012, J Stat Plan Inference, AIPW)

3 Cumulative incidence VE for a discrete genotype (Gilbert, 2000, 2001, Stat Med,Aalen-Johansen NPMLE)

4 Extension of 3. for covariate-adjustment and modeling dropout (Benkeser, Carone,Gilbert, 2016, submitted, tMLE)

5 Cumulative incidence VE for a continuous mark genotype (Gilbert, Sun, andMcKeague, 2008, Biostatistics)

6 Proportional hazards VE for a continuous mark genotype (Sun, Gilbert, andMcKeague, 2009, Ann Stat; local partial likelihood and kernel smoothing)

7 Extension of 6. for multivariate continuous mark genotypes (Sun and Gilbert, 2013,Biostatistics, local partial likelihood and kernel smoothing; Juraska and Gilbert,2013, Biometrics, Cox model + semiparametric biased sampling model)

8 Extension of 6. allowing missing data on genotypes (Sun and Gilbert, 2012, ScandJ Stat, Gilbert and Sun, 2012, JRSS-B, add AIPW; Juraska and Gilbert, 2015,LIDA, add IPW)

PBG (VIDD FHCRC) Sieve Analysis Methods June 22, 2016 37 / 37