Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung

Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development

Feng Gao, Jacob Zhang, Godwin Yung and Ray Liu

19th May 2015

2

Introduction

• Advances in next-generation sequencing (NGS) technologies provide a powerful tool for gaining deeper understanding of biological processes for diseases, but also provide new challenges for statistical analysis.

• In this study, we wanted to identify signatures of tumor somatic mutation that are associated with clinical efficacy.

• In the study:– Sample: archived tumor samples that include 5 different solid tumor types. All

patients are under the same treatment (single arm).– NGS whole exome sequencing (WES): variant calling from tumor-germline

pairs on coding regions (coding exons) to identify cancer somatic mutations. – Clinical endpoints: tumor size change (%), and PFS.– Sample size and power: started with ~120 samples, with adequate power for

testing 20 candidate genes.– Several tiers of NGS data: from WES, from cancer related gene list, from

further reduced top candidate cancer genes.– Types of statistical analyses: univariate analysis, multivariate analysis and

pathway analysis.

｜○○○○　 | 　　 DDMMYY

3

Challenges

• Sample quality: archived, FFPE treated samples had high attrition rate in the NGS process. Samples from clinically non-evaluable subjects were also removed. As a result, sample size with available NGS data and clinical data is reduced to n=47.

• Challenges with small sample size: – NGS data at variant level cannot be analyzed – almost all of them are singletons. We

have to aggregate the data to gene level. We have to make the assumption that all the variants within a gene have the effect in the same direction.

– Many genes were non-variant, or singletons, and have to be removed from univariate analysis, thus reducing the number of genes in the univariate analysis.

– With small sample size, the mutation patterns with several different genes could appear identical in our data, making it difficult to interpret the result.

– Data from different tumor types have to be pooled together, creating heterogeneity issues.

– Univariate analysis did not produce statistically significant findings from the top candidate cancer gene list after multiplicity adjustment. Top genes from univariate analysis on WES list were heavily influenced by a single patient. Multivariate regression approach generated gene pairs that lack biological interpretation. Recursive partition approach did not work well with such a small sample size.

• Preliminary analysis did show that there is definitely biological information in the data. Now the challenge is how do we identify such useful features.

｜○○○○　 | 　　 DDMMYY

Overview of data

5

How we dealt with the challenges

• Using multivariate approach starting with biologically driven gene set/pathways, we created 2434 sets of genes by grouping 6410 genes together if they belong to the same pathway.

• A panel of markers may be more powerful than a single marker– Sparse distribution of somatic mutations – Low information content of a single mutation (binary, 0/1)– Larger coverage of patients

• A panel of markers need to be supported by biology– Bigger confidence– Better interpretability

• Known pathways or protein-protein interaction

(PPI) network have been tested as predictive

marker panels– A pathway = a bag of genes grouped by

biology better chance to find biologically meaningful

markers

｜○○○○　 | 　　 DDMMYY

Pathway DB # of pathways

MetaCoreTM 912

BIOCYC 33

KEGG 794

REACTOME 1358

Wiki Pathways 225

Pathway interaction DB

183

6

Construction of gene-sets

Pathwaydatabase

(ex: KEGG)

2434

3

2

1

Genes Pathways

59325931

167616754465

1111103

1821111103

,

,...,,,

,,

,...,,,

6410

111

110

5

3

g

Pathways Genes

g

g

g

g

344334333

44343921

34711221

43

21

,,

,...,,,

,...,,,

,

,

6410

2

1

Genes

g

g

g

7

Model Assumption for the effect of genes within a pathway

Sequence kernel association test (SKAT) Effect sizes are distributed normal with mean 0

Counting-based burden test (cBT) Effect sizes are the same

Threshold-based burden test (tBT) Single effect when there are more than T mutations

Statistical models used in the analysis

• Tumor size change (%) used as efficacy variable• To increase statistical power, and to accommodate different modes of gene effect within a pathway/gene-set, we assume the following 3 models:

8

More details in statistical models

Model Assumption Test

SKAT Score

cBT Wald

tBT Wald

),0(~, Ngiii jjj

ij

g

threshold1

ij

g

• For all three models, we can test the null hypothesis of no pathway association by testing a single parameter:

• In practice, we may wish to reject H0 if at least one of the three tests is significant. Therefore, let us also consider the omnibus test (OT) whose p-value is the minimum of the p-values from SKAT, cBT, and tBT.

0:0 H

More statistical challenges

• Multiplicity correction, the issues:– Multiple models: multiple models used to test pathway association.– Correlation: 2434 pathways are not independent. Many share the same

genes (more detail in next slide).

• The solution for multiplicity correction in the presence of feature correlation:– Resampling-based multiple testing can adjust p-values to account for

multiple testing by incorporating correlation and other distributional characteristics.

10

Many gene-sets are correlated

• Many of the K=2434 gene-sets have overlapping genes. Some are even identical.

• For example, in 10 gene-sets, the following relation holds:

• It is of interest to efficiently test multiple hypotheses.

1. Diminish the number of gene-sets for association testing

2. Test all K gene-sets and then efficiently correct for multiple testing

10

98765

43

21

G

GGGGG

GG

GG

11

Review: Single-step methods for multiplicity adjustment

• Let us distinguish the random p-value from the experimentally observed p-value using capital and lower cases letters, respectively:

Also, denote the complete null hypothesis by

• Single-step methods are simultaneous test procedures that perform equivalent multiplicity adjustments for all tests. E.g.,– Bonferroni:

– Sidak:)1,min(~

ii Kpp

Kii pp )1(1~

),...,(

),...,(

1

1

K

K

ppp

PPP

K

ii

C HH1

0

12

Bonferroni and Sidak methods require certain assumptions to control the FWER.

•

• Bonferroni:

• Sidak:

Without independence, equality becomes ≤ for very general conditions.

iUHPKK

HKP

HKP

ii

K

i

Ci

CiKi

for )1,0(~| )/(

inequality Bonferroni )|/Pr(

)|Pr(minFWER

1 0

01

iUHP

HP

HP

iiK

K

i

CKi

CKiKi

for )1,0(~| ))1((1

ceIndependen )|)1(1Pr(1

)|)1(1Pr(minFWER

K/1

1 0

01

)|~

Pr(minFWER 01C

iKi HP

• Bonferroni and Sidak methods fail to incorporate dependence and distributional characteristics of the observed p-values. Both limitations are concerns.

Resampling-based multiple testing (RBMT)

13

RBMTIf we knew the joint distribution P, then we could compute

so that

In practice, we may not know P. However, in many cases vectors P*, having the same distribution as P, may be simulated via resampling:

We can therefore compute

)|Pr(min~01C

ijKji HpPp

)|~

Pr(minFWER 01C

iKi HP

)Pr(min~ *1 ijKji pPp

*1min jKj P

CjKj HP 01 |min

*PCHP 0|

14

Illustration

Gene-set(i)

Observedp-value (pi)

R resampled p-values (Pi*)

1 p1 p*1,1 p*

1,2 … p*1,R

2 p2 p*2,1 p*

2,2 … p*2,R

⁞

K pK p*K,1 p*

K,2 … p*K,R

min({p*i,1}) min({p*

i,2}) min({p*i,R})

Distribution of p-value under H0

Dependence between tests

(Multiple testing) adjusted p-value

15

Our approach to simulating P*“Simultaneous” bootstrap test

)ˆ,0(~ˆ,ˆˆ'

ˆ,ˆ,ˆ

),0(~,'

2***

2

2

NXY

NXY

*SKAT,,1 rp

*cBT,,1 rp

*tBT,,1 rp

*OT,,1 rp

Repeat R times

*SKAT,,2 rp

*cBT,,2 rp

*tBT,,2 rp

*OT,,2 rp

*SKAT,,rKp

*cBT,,rKp

*tBT,,rKp

*OT,,rKp

Estimate residualsunder the null model

Generate pseudo-datavia parametric bootstrap

Perform GSAT on pseudo-data

16

Intuition: How does P* estimate P|H0C?

)ˆ,0(~ˆ,ˆˆ'

ˆ,ˆ,ˆ

),0(~,'

2***

2

2

NXY

NXY

*SKAT,,1 rp

*cBT,,1 rp

*tBT,,1 rp

*OT,,1 rp

*SKAT,,2 rp

*cBT,,2 rp

*tBT,,2 rp

*OT,,2 rp

*SKAT,,rKp

*cBT,,rKp

*tBT,,rKp

*OT,,rKp

P*P|H0C

),0(~,' 2 NXY

SKAT,1p cBT,1p tBT,1p OT,1p

SKAT,2p cBT,2p tBT,2p OT,2p

SKAT,Kp cBT,Kp tBT,Kp OT,Kp

17

GS Adjusted p-value

SKAT cBT tBT OT

1 0.931 0.872 0.015 0.018

2 0.094 0.010 0.465 0.022

3 0.798 0.017 0.784 0.037

4 0.803 0.580 0.037 0.044

5 0.875 0.482 0.044 0.052

6 0.038 0.223 0.057 0.067

7 0.022 0.063 0.116 0.134

8 0.007 0.823 0.967 0.178

9 0.044 0.974 0.999 0.432

bold = significant at “α=.05”

Efficiently correcting for multiple testing increases # of significant gene-sets from 0 to 4.

NGS study resultsSignificant gene-sets

Resampling based multiplicity adjusted p-value=0.0184

Top pathway from analysis

19

How does resampling-based approach compare to other single-step methods for p-value adjustment

• Our approach: resampling-based approach for multiplicity adjustment.

• Other single-step methods for multiplicity adjustment:– If we knew the effective number of independent tests Ke, then we could

apply the Bonferroni or Sidak adjustments:

– One can estimate Ke based on biology or from a purely statistical standpoint:

• Linkage disequilibrium structure Ke = number of major LD blocks

• Principal components analysis (PCA) of P* where Ke = number of principal components that explains x% of the variation in Pi*’s

• Like PCA, resampling-based adjustments that use the minimum statistic rely on P*.

)1,min(~iei pKp eK

ii pp )1(1~

20

Simulation to compare resampling-based approach with other single-step methods for p-value adjustment

• Rather than consider all K=2434 gene-sets, we considered only10 gene-sets. These gene-sets are highly related:

• Using parameter estimates from the fitted null model of the original data, we randomly simulated outcomes under a null model or a desired alternative model.

• For the first of 10,000 simulated studies, we performed RBMT by generating R=10,000 bootstrap resamples.

98765

4321 ,

GGGGG

GGGG

21

Simulation resultsEffective number of independent tests (Ke) for OT

MethodCutoff for

pEst.

FWERx%

Sidak, Ke=1 0.0500 0.363 29.8

Sidak, Ke=9 0.0057 0.051 83.4

Sidak, Ke=30 0.0017 0.016 100.0

RBMT 0.0056 0.050 −

Table. In order to control FWER ≤ 0.05, different methods propose different cutoffs for the p-values. We provide the estimated FWER for each proposed cutoff, as well as the x% of variation in Pi* explained by the first Ke PCs (slide 24). Notice how similar RBMT is to the Sidak correction with Ke=9.

)|Pr(min~0

*101

Cjj HpPp

22

Summary

• NGS attrition rate high due to DNA quality and other issues.

• Univariate analysis did not have adequate power due to reduced sample size.

• Multivariate approach based on pathways is considered. We used 4 types of test (SKAT, cBT, tBT, and OT) for the association between 2434 gene-sets and %TSCB.

• We used a resampling-based approach to correct for multiple testing.

• Our efforts led to the identification of a gene-set as a statistically significant and biologically interesting pathway.

23

References

• Elbers C.C., et al. (2009) Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet Epi 33, 419-431.

• Wang K., Li M., and Bucan M. (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81, 1278-1283.

• Yu K., et al. (2009) Pathway analysis by adaptive combination of p-values. Genet Epi 33, 700-709.

• Westfall P.H. and Young S.S. (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons Inc, New York.

• Wu M.C., et al. (2011) Rare variant association testing for sequencing data with the sequence kernel association test. Am J Hum Gen 89,82-93.

｜○○○○　 | 　　 DDMMYY

Documents

Searching for Efficacy Biomarker in Early Clinical Development – An Example of Using NGS in Early Oncology Development Feng Gao, Jacob Zhang, Godwin Yung