12
Precision Medicine and Imaging Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma Gregory M. Chen 1 , Lavanya Kannan 2,3 , Ludwig Geistlinger 2,3 , Victor Koa 1,4,5 , Zhaleh Sakhani 1,4,5 , Deena M.A. Gendoo 1,4 , Giovanni Parmigiani 6 , Michael Birrer 7 , Benjamin Haibe-Kains 1,4,5,8 , and Levi Waldron 2,3 Abstract Purpose: The majority of ovarian carcinomas are of high-grade serous histology, which is associated with poor prognosis. Surgery and chemotherapy are the mainstay of treatment, and molecular characterization is necessary to lead the way to targeted therapeutic options. To this end, various computational methods for gene expressionbased subtyping of high-grade serous ovarian carcinoma (HGSOC) have been proposed, but their overlap and robustness remain unknown. Experimental Design: We assess three major subtype classiers by meta-analysis of publicly available expres- sion data, and assess statistical criteria of subtype robust- ness and classier concordance. We develop a consensus classier that represents the subtype classications of tumors based on the consensus of multiple methods, and outputs a condence score. Using our compendium of expression data, we examine the possibility that a subset of tumors is unclassiable based on currently proposed subtypes. Results: HGSOC subtyping classiers exhibit moderate pairwise concordance across our data compendium (58.9%70.9%; P < 10 5 ) and are associated with overall survival in a meta-analysis across datasets (P < 10 5 ). Current subtypes do not meet statistical criteria for robust- ness to reclustering across multiple datasets (prediction strength < 0.6). A new subtype classier is trained on concordantly classied samples to yield a consensus classi- cation of patient tumors that correlates with patient age, survival, tumor purity, and lymphocyte inltration. Conclusions: A new consensus ovarian subtype classier represents the consensus of methods and demonstrates the importance of classication approaches for cancer that do not require all tumors to be assigned to a distinct subtype. Clin Cancer Res; 24(20); 503747. Ó2018 AACR. Introduction Ovarian carcinoma is a genomically complex disease, for which the accurate characterization of molecular subtypes is difcult but is anticipated to improve treatment and clinical outcome (1). Substantial effort has been devoted to characterize molecularly distinct subtypes of high-grade serous ovarian carcinoma (HGSOC; Table 1). Initial large-scale efforts to classify HGSOC of the ovary did not reveal any reproducible subtypes (2). Tothill and colleagues (3) reported four distinct HGSOC subtypes: (i) an immunoreactive expression subtype associated with inltration of immune cells; (ii) a low stromal expression subtype with high levels of circulating CA125; (iii) a poor prognosis subtype dis- playing strong stromal response, correlating with extensive des- moplasia; and (iv) a mesenchymal subtype with high expression of N/P-cadherins. The Cancer Genome Atlas (TCGA) project also identied four subtypes characterized by (i) chemokine expres- sion in the immunoreactive subtype; (ii) proliferation marker expression in the proliferative subtype; (iii) ovarian tumor marker expression in the differentiated subtype; and (iv) expression of markers suggestive of increased stromal components in the mes- enchymal subtype, but did not report differences in patient survival (4). Further experimental characterization revealed an increased number of samples with inltrating T lymphocytes for the immunoreactive subtype, whereas desmoplasia, associated with inltrating stromal cells, was found more often for the mesenchymal subtype (5). Konecny and colleagues (6) indepen- dently evaluated the TCGA subtypes and also reported the pres- ence of the four transcriptional subtypes using a de novo clustering and classication method. However, robustness and clinical relevance of these subtypes remain controversial (7). The previous subtyping efforts have assessed prognostic signicance in different patient cohorts and have taken different approaches to validate these subtypes in independent datasets. A recent review of HGSOC subtyping schemes highlighted the difculty of comparing results of 1 Princess Margaret Cancer Centre, Toronto, Ontario, Canada. 2 City University of New York School of Public Health, New York, New York. 3 Institute for Imple- mentation Science in Population Health, City University of New York, New York, New York. 4 Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. 5 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 6 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts. 7 University of Alabama Com- prehensive Cancer Center, Birmingham, Alabama. 8 Ontario Institute of Cancer Research, Toronto, Ontario, Canada. Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/). G.M. Chen and L. Kannan contributed equally to this article. Corresponding Authors: Levi Waldron, CUNY Graduate School of Public Health and Health Policy, 55 W 125th St 6th oor, New York, NY 10027. Phone: 646-364- 9616; Fax: 212-396-7639; E-mail: [email protected]; and Benjamin Haibe-Kains, Princess Margaret Cancer Centre, University Health Network, 101 College Street, Toronto, Ontario, M4C2A4. Phone: 416 581-8626; E-mail: [email protected] doi: 10.1158/1078-0432.CCR-18-0784 Ó2018 American Association for Cancer Research. Clinical Cancer Research www.aacrjournals.org 5037 on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

Precision Medicine and Imaging

Consensus on Molecular Subtypes of High-GradeSerous Ovarian CarcinomaGregory M. Chen1, Lavanya Kannan2,3, Ludwig Geistlinger2,3, Victor Kofia1,4,5,Zhaleh Safikhani1,4,5, Deena M.A. Gendoo1,4, Giovanni Parmigiani6,Michael Birrer7, Benjamin Haibe-Kains1,4,5,8, and Levi Waldron2,3

Abstract

Purpose: The majority of ovarian carcinomas are ofhigh-grade serous histology, which is associated with poorprognosis. Surgery and chemotherapy are the mainstay oftreatment, and molecular characterization is necessaryto lead the way to targeted therapeutic options. To thisend, various computational methods for gene expression–based subtyping of high-grade serous ovarian carcinoma(HGSOC) have been proposed, but their overlap androbustness remain unknown.

Experimental Design: We assess three major subtypeclassifiers by meta-analysis of publicly available expres-sion data, and assess statistical criteria of subtype robust-ness and classifier concordance. We develop a consensusclassifier that represents the subtype classifications oftumors based on the consensus of multiple methods, andoutputs a confidence score. Using our compendium ofexpression data, we examine the possibility that a subset

of tumors is unclassifiable based on currently proposedsubtypes.

Results: HGSOC subtyping classifiers exhibit moderatepairwise concordance across our data compendium(58.9%–70.9%; P < 10�5) and are associated with overallsurvival in a meta-analysis across datasets (P < 10�5).Current subtypes do not meet statistical criteria for robust-ness to reclustering across multiple datasets (predictionstrength < 0.6). A new subtype classifier is trained onconcordantly classified samples to yield a consensus classi-fication of patient tumors that correlates with patient age,survival, tumor purity, and lymphocyte infiltration.

Conclusions: A new consensus ovarian subtype classifierrepresents the consensus of methods and demonstrates theimportance of classification approaches for cancer that donot require all tumors to be assigned to a distinct subtype.Clin Cancer Res; 24(20); 5037–47. �2018 AACR.

IntroductionOvarian carcinoma is a genomically complex disease, for which

the accurate characterization ofmolecular subtypes is difficult butis anticipated to improve treatment and clinical outcome (1).Substantial effort has been devoted to characterize molecularlydistinct subtypes of high-grade serous ovarian carcinoma

(HGSOC; Table 1). Initial large-scale efforts to classify HGSOCof the ovary did not reveal any reproducible subtypes (2). Tothilland colleagues (3) reported four distinct HGSOC subtypes: (i) animmunoreactive expression subtype associated with infiltrationof immune cells; (ii) a low stromal expression subtype with highlevels of circulating CA125; (iii) a poor prognosis subtype dis-playing strong stromal response, correlating with extensive des-moplasia; and (iv) a mesenchymal subtype with high expressionof N/P-cadherins. The Cancer Genome Atlas (TCGA) project alsoidentified four subtypes characterized by (i) chemokine expres-sion in the immunoreactive subtype; (ii) proliferation markerexpression in the proliferative subtype; (iii) ovarian tumormarkerexpression in the differentiated subtype; and (iv) expression ofmarkers suggestive of increased stromal components in the mes-enchymal subtype, but did not report differences in patientsurvival (4). Further experimental characterization revealed anincreased number of samples with infiltrating T lymphocytes forthe immunoreactive subtype, whereas desmoplasia, associatedwith infiltrating stromal cells, was found more often for themesenchymal subtype (5). Konecny and colleagues (6) indepen-dently evaluated the TCGA subtypes and also reported the pres-ence of the four transcriptional subtypes using a de novo clusteringand classification method.

However, robustness and clinical relevance of these subtypesremain controversial (7). The previous subtyping efforts haveassessed prognostic significance in different patient cohorts andhave taken different approaches to validate these subtypes inindependent datasets. A recent review of HGSOC subtypingschemes highlighted the difficulty of comparing results of

1Princess Margaret Cancer Centre, Toronto, Ontario, Canada. 2City University ofNew York School of Public Health, New York, New York. 3Institute for Imple-mentation Science in Population Health, City University of New York, New York,New York. 4Department of Medical Biophysics, University of Toronto, Toronto,Ontario, Canada. 5Department of Computer Science, University of Toronto,Toronto, Ontario, Canada. 6Department of Biostatistics and ComputationalBiology, Dana-Farber Cancer Institute, and Department of Biostatistics, HarvardSchool of Public Health, Boston, Massachusetts. 7University of Alabama Com-prehensive Cancer Center, Birmingham, Alabama. 8Ontario Institute of CancerResearch, Toronto, Ontario, Canada.

Note: Supplementary data for this article are available at Clinical CancerResearch Online (http://clincancerres.aacrjournals.org/).

G.M. Chen and L. Kannan contributed equally to this article.

Corresponding Authors: Levi Waldron, CUNY Graduate School of Public HealthandHealth Policy, 55W 125th St 6th floor, NewYork, NY 10027. Phone: 646-364-9616; Fax: 212-396-7639; E-mail: [email protected]; and BenjaminHaibe-Kains, Princess Margaret Cancer Centre, University Health Network, 101College Street, Toronto, Ontario, M4C2A4. Phone: 416 581-8626; E-mail:[email protected]

doi: 10.1158/1078-0432.CCR-18-0784

�2018 American Association for Cancer Research.

ClinicalCancerResearch

www.aacrjournals.org 5037

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 2: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

studies that used different subtyping algorithms, and that bettergeneral agreement on how molecular subtypes are definedwould allow more widespread use of expression data in clinicaltrial design.(1)

Assessing the generalizability of subtyping algorithms is chal-lenging as true subtype classifications remain unknown. Thischallenge is evident in the lack of published validation of theproposed HGSOC subtypes. Subsequent efforts have performedde novo clustering of new datasets and noted similarity in theclusters identified, but they have not reported quantitative mea-sures such as classification accuracy or rate of concordance withpreviously published algorithms (8). In this article, we addressthese limitations by reimplementing threemajor subtypingmeth-ods (3, 5, 9) and assess between-classifier concordance and across-dataset robustness in a widely used database containing 1,770HGSOC tumors (10), whose curation and data consistency has

been independently validated (11). We show that each pair ofsubtype classifiers is significantly concordant and is virtuallyidentical for tumors classified with high certainty. However, thesubtypes do not meet established standards of robustness toreclustering (12), and only approximately one third of tumorsare classified concordantly by all three subtype classifiers. Usingthis core set of tumors concordantly classified by eachmethod, wedevelop consensusOV, a consensus classifier that has high concor-dancewith the three classifiers, therefore providing a standardizedclassification scheme for clinical applications.

Materials and MethodsDatasets

Analysis was carried out on datasets from the curatedOvarian-Data compendium; details of curation and of grading systemsused by individual studies are described elsewhere (10). Datasetswere additionally processed using the MetaGxOvarian package(Supplementary Information; ref. 13). Analysis was restricted todatasets featuring microarray-based whole-transcriptome studiesof at least 40 patients with late-stage, high-grade, primary tumorsof serous histology. This resulted in 15 microarray studies, pro-viding data for 1,774 patients (Table 2). Duplicated samplesidentified by the doppelgangR package were removed (14). Sur-vival analysis was performed for 13 of these datasets, whichincluded 1,581 patients with annotated time to death or last timeof follow-up.

Implementation of subtype classifiersSubtype classifiers were reimplemented in R (15) using orig-

inal data as described by Konecny and colleagues (6), Verhaakand colleagues (5), and Helland and colleagues (9). Theseclassifiers are based on nearest-centroids (6), subtype-specificsingle-sample GSEA (5), and subtype-specific linear coefficients(9), respectively. Implementations were validated by reprodu-cing a result from each of the original publications (Supple-mental File, Section "Reproduction of Published HGSOCSubtype Classifiers").

Survival analysisSubtype calls from all included datasets were combined to

generate a single Kaplan–Meier plot for each subtyping algo-rithm (stratified by subtype). HRs for overall survival between

Translational Relevance

High-grade serous ovarian carcinoma (HGSOC) is the fifthleading cause of cancer-related death in the United States andCanada. The majority of HGSOCs are diagnosed as late-stage,high-grade serous ovarian carcinomas, for which prognosis isgenerally poor and few targeted therapies exist. Significantresearch effort has suggested several molecularly distinct sub-types of HGSOC, yet no consensus in the field exists andcomputational methods to analyze high-dimensional geneexpression datasets differ across studies. Although subtypeshave been shown to differ in overall survival, the lack ofagreement on molecular subtype definition has been cited asa barrier to their investigation through clinical trial. In thecurrent study, we perform an analysis of a large compendiumof HGSOC transcriptomes in order to evaluate the concor-dance of computational methods and address the emergingconsensus in the field. We develop a subtype classifier thatrepresents the consensus of HGSOC subtypes and show thatmany tumors are of intermediate or mixed subtype based oncurrently defined subtypes. These findings improve our under-standing of the molecular basis of high-grade serous carcino-ma, an important step in defining the underlying biology andidentifying therapeutic targets of HGSOC.

Table 1. Subtyping methodology of the algorithms reviewed

Citation Probe/gene filtering for clustering Clustering algorithmProbe/gene filtering forclassification Subtype classifier

Tothill/Helland (3, 9) Probes with at least one sampleexpressed above 7.0, and globalvariance above 0.5

Consensus k-means;diagonal LDA and kNN

Gene ranking by differentiallyexpressed genes between groups

Linear subtype scores

TCGA/Verhaak (4, 5) Filter to genes that correlate above0.7 between three platforms tounified estimate; then, take top1,500 genes by median absolutedeviation (MAD)

Nonnegative matrixfactorization

Filter patients by silhouette width;correlation-based feature subsetselection

Single-sample gene setenrichment analysis

Konecny (6) Top 2,500 probes by MAD, then keep1,850 unique gene symbols

Nonnegative matrixfactorization

Prediction Analysis of Microarrayswith thresholds determined bytenfold cross-validation

Nearest centroidwith Spearman r

consensusOV 100 genes provided by Verhaak (5);convert the features space intobinary matrix of gene-pairassociations

Random Forest usingunanimously classifiedtumors across themethods

100 gene symbols given inVerhaak (5)

Random Forestclassifier

Chen et al.

Clin Cancer Res; 24(20) October 15, 2018 Clinical Cancer Research5038

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 3: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

subtypes were estimated by Cox proportional hazards, andstatistical significance was assessed by log-rank test using thesurvcomp R package (16). HRs were calculated using the lowest-risk subtype as the baseline group, and stratification by datasetwas performed for HRs and significance testing.

Prediction strengthPrediction strength (PS; ref. 12) is defined as a measure of the

similarity between pairwise comemberships of a validation data-set from class labels assigned by (i) a clustering algorithm and (ii)a classification algorithm trained on a training dataset (Supple-mentary Fig. S1). The quantity is an establishedmeasure of clusterrobustnesswith the following interpretation: a value of 0or belowindicates poor concordance, and a value of 1 indicates perfectconcordance between models specified from training and valida-tion data. Tibshirani and Walther (12), and subsequent applica-tions of PS (17), have considered a value of at least 0.8 to be anevidence of robust clusters. PS was computed as implemented inthe genefu Bioconductor package (18).

The tumors in each dataset were clustered de novo using ourreproduced implementations of the algorithms of Konecny,TCGA/Verhaak, and Tothill (Supplemental File, Section "Repro-duction of Subtype Clustering Methods"). Each dataset was alsoclassified using implementation of the originally published sub-type classifiers. This produced two sets of subtype labels for eachsample in each validation dataset; these labels were used tocompute PS.

Concordance analysisFor each pair of classifiers, subtypes were mapped on the basis

of the observed concordance suggested in the original studies:Subtype C2 from Tothill corresponding to Immunoreactive inTCGA/Verhaak and C1_Immunoreactive-like in Konecny; C4corresponding to Differentiated and C2_Differentiated-like; C5corresponding to Proliferative and C3_Proliferative-like; andC1 corresponding to Mesenchymal and C4_Mesenchymal-like.Statistical significance of pairwise concordance was assessed byPearsonc2 test, andCramerVwas assessed to evaluate the strengthof concordance. Two-way concordance was defined as the pro-portion of patients that were classified as the same mappedsubtype acrossmethods. Similarly, overall three-way concordancewas defined as the proportionof tumors sharing the samemapped

subtype across all three classifiers. Subtype-specific three-wayconcordance was defined as the number of tumors concordantlyclassified as that subtype by all three classifiers, divided by thenumber of tumors classified to that subtype by at least onemethod.

Filtering tumors by classification marginEach subtype classifier outputs for each patient a real-valued

score for each subtype. Marginally classifiable tumors were iden-tified on the basis of the difference between the top two subtypescores, denoted as the "margin" value. Thus, a higher marginindicates a more confident classification. For each pair of subtypeclassifiers, classification concordancewas assessedonboth the fulldataset and considering only patients classified with marginsabove a user-defined cutoff.

Building a consensus classifierThe consensusOV classifier was implemented using a random

Forest classifier trained on concordantly subtyped tumors acrossmultiple datasets. The randomForestmethodhas previously beenused for building a multi-class consensus classifier to resolveinconsistencies among published colorectal cancer subtypingschemes (19). In order to avoid normalizing expression valuesacross datasets, binary genepair vectorswere used as feature space,as recently applied for breast cancer subtyping (20, 21). To addressdifferences in gene expression scales due to different experimentalprotocols, consensusOV first standardizes genes in each dataset tothe same mean and variance, and computes binary gene pairsfrom standardized expression values. Because the feature size ofthis classifier increases quadratically with respect to the size of theoriginal gene set, we used the smallest gene set of the originalsubtype classifiers (the gene set of Verhaak and colleagues; ref. 5),which contains 100 gene symbols. The consensusOV classifieroutputs the subtype classification and a real-valued margin scoreto discriminate between patients that are of well-defined orindeterminate subtype. Similarly to previously published subtypeclassifiers, a higher margin score indicates higher confidence ofclassification.

Leave-one-dataset-out cross-validationPerformance of the consensus classifier for identifying concor-

dantly classified subtypeswas assessedusing leave-one-dataset-out

Table 2. Compendium of gene expression datasets.

GEO (34) Accession Sample size Microarray platform No. Features

TCGA (4) 464:452:239 [43] Affymetrix HT HG-U133A 12,833GSE17260 (35) 43:43 22 [29] Agilent-012391 Whole HG Oligo 19,596GSE14764 (36) 41:41:13 [30] Affymetrix HG-U133A 12,752GSE18520 (37) 53:53:41 [21] Affymetrix HG-U133 Plus 2.0 20,282GSE26193 (38) 47:47:39 [34] Affymetrix HG-U133 Plus 2.0 20,282PMID17290060 (39) 59:59:36 [34] Affymetrix HG-U133A 12,752GSE51088 (40) 85:84:69 [44] Agilent-012097 Human 1A Microarray (V2) G4110B 15,299GSE13876 (41) 98:98:72 [22] Operon human v3 �35K 70-mer two-color oligonucleotide microarrays 13,846GSE49997 (42) 132:122:40 [23] ABI HG Survey Microarray Version 2 16,760E.MTAB.386 (43) 128:128:73 [30] Illumina humanRef-8 v2.0 beadchip 10,572GSE32062 (44) 129:129:60 [40] Agilent-014850 Whole HG 4 � 44K G4112F 19,596GSE9891 (3) 142:140:72 [29] Affymetrix HG-U133 Plus 2.0 20,282GSE26712 185:185:129 [39] Affymetrix HG-U133A Array 12,752GSE20565 (45) 89 [0] Affymetrix HG-U133 Plus 2.0 20,282GSE2109 79 [0] Affymetrix HG-U133Plus2 20,282

NOTE: Fifteen whole-transcriptome studies with at least 40 patients with late-stage, high-grade serous histology from the curatedOvarianDatacompendium consisting of 1,770 patients. Thirteen of these datasets provided 1,581 patients with survival data. Sample size column proves the numberof samples: number with survival data: number deceased (median survival in months, shown in brackets).

Consensus on Molecular Subtypes of HGSOC

www.aacrjournals.org Clin Cancer Res; 24(20) October 15, 2018 5039

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 4: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

cross-validation (22). Concordant subtypes were identified totrain the random Forest classifier using 14 of the 15 datasets, andsubtype predictions were tested in the remaining left-out dataset.This process was repeated for all 15 datasets. While predicting thesamples in any given dataset, the training set was subsetted tocontain only the concordant subtypes in other datasets.

Correlation analysis with histopathology and tumor puritySubtype calls from the consensus classifier were analyzed for

correlation with histopathology and tumor purity in the TCGAdataset. In order to best represent the most confident subtypecalls, a default cutoff was used to include only the 25%of patientswith the largest classification margins. Available histopathologyvariables included lymphocyte, monocyte, and neutrophil infil-tration. Tumor purity was assessed using the ABSOLUTE algo-rithm (23), which estimates purity and ploidy from copy numberand SNP allele frequency from SNP genotyping arrays (Synapsedataset syn3242754). Significance of associations was tested byone-way ANOVA for patient age, purity, and immune infiltration.

Research reproducibilityAll results are reproducible using R/Bioconductor (24)

and knitr (25) with LaTeX output at overleaf.com/read/srvqbpxpqbyz. Output of this code is provided as Supplemen-tal File S1. Subtyping algorithms are provided by the opensource consensusOV R package available from Bioconductor(http://bioconductor.org/packages/consensusOV).

ResultsWe performed a meta-analysis of three published subtyping

algorithms for HGSOC (5, 6, 9) and developed a new consen-

sus classifier to identify unambiguously classifiable tumors(Table 1). Each of these algorithms identified four distinctHGSOC subtypes with specific clinical and tumor pathologycharacteristics (Fig. 1). We assessed the algorithms on a com-pendium of 15 datasets including over 1,700 patients withHGSOC (Table 2) with respect to concordance, robustness, andassociation to patient outcome. By modifying individual algo-rithms to discard tumors of intermediate subtype, we foundthat concordance between algorithms is greatly improved.

Concordance of published classifiersWe reimplemented three published HGSOC subtype classi-

fiers (5, 6, 9) (Table 1) and applied these methods to newdatasets. We ensured correct implementation of classifiers byreproducing results from the original articles (SupplementaryInformation). When applied to independent datasets, pairwiseconcordance of the three methods was statistically significant(P < 10�5, c2 test; Fig. 2A) with the highest agreement observedfor Helland and Konecny subtyping schemes (70.9%), fol-lowed by Verhaak and Helland (67.4%) and Verhaak andKonecny (58.9%). Cramer V coefficients (26) indicated astrong association between subtypes as identified by the dif-ferent algorithms (>0.5).

Tumors of intermediate subtypeThe individual subtyping algorithms calculate numeric

scores for each subtype and assign each tumor to the subtypewith the highest score. A tumor with a large difference or"margin" between the highest and second highest scores canbe considered distinctly classifiable, whereas a tumor with twonearly equal scores could be considered of intermediate sub-type. We examined the effect of modifying the individual

Immunoreac�ve Differen�ated Prolifera�ve Mesenchymal

Pa�ent ~ages 61 years ~55 ~years 64 ~years 59 years

Risk (5-year survival %) Low (50%) High (34%) High (34%) Very High (20%)

Purity of TCGA samples(ABSOLUTE) 71% 87% 91% 62%

Lymphocyte infiltara�on(TCGA samples) ~ ~24% 41% <5% <5%

Neutrophil infiltra�on(TCGA samples) <~10%~8% 5% <5%

Figure 1.

Properties of subtypes identified by consensus classifier. Subtype associations with patient age and overall survival were assessed across our compendiumof microarray datasets; association with tumor purity and immune cell infiltration was assessed using the TCGA dataset. Tumor purity was estimatedfrom genotyping data in TCGA; lymphocyte infiltration was based on pathology estimates from TCGA. Patient age (P < 0.001), overall survival (P < 0.005),and ABSOLUTE purity (P < 0.001) were statistically significant across subtypes. When compared with all other groups, the immunoreactive subtypehad elevated infiltration of lymphocytes (P < 0.05) and neutrophils (P < 0.10). Mean monocyte infiltration was less than 5% across all subtypes and wasexcluded from this analysis. Classification was performed using default parameters, and mean values of each variable are shown.

Chen et al.

Clin Cancer Res; 24(20) October 15, 2018 Clinical Cancer Research5040

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 5: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

algorithms to prevent assignment of indeterminate cases atvarious thresholds. For each pair of subtype classifiers, weexamined the classification concordance with increasing thresh-olds on the margins.

For all pairs of subtype classifier, classification concordanceincreased as additional marginal cases are removed, approach-ing over 90% concordance once the majority of tumors are left

unclassified (Fig. 2B). Three-way concordance followed thesame trend with lower overall concordance: a minimum of23% for the proliferative subtype and maximum of 45% for theimmunoreactive subtype when all tumors are classified.Restricting the concordance analysis to the top 50% of tumorsby margin value resulted in an increased overlap between 35%(proliferative) and 65% (immunoreactive). At a strict

301 78 10 58

70 324 97 114

14 48 263 32

17 3 38 307

IMR

DIF

PR

OM

ES

C2 C4 C5 C1

HellandPercent concordant = 67.36%

P < 0.00001; Cramer v = 0.59

Ver

haak

318 97 29 19

58 318 139 7

7 114 163 94

64 76 26 245

C1_

imm

LC

2_di

ffLC

3_pr

ofL

C4_

mes

cL

IMR DIF PRO MES

VerhaakPercent concordant = 58.85%

P< 0.00001; Cramer v = 0.50

Kon

ecny

349 23 4 26

84 325 33 11

6 159 226 17

24 15 115 357

C2

C4

C5

C1

C1_immL C2_diffL C3_profL C4_mescL

KonecnyPercent concordant = 70.86%

P < 0.00001; Cramer v = 0.65

Hel

land

A

0.00

0.25

0.50

0.75

1.00

0 25 50 75 100

% of Dataset removed

Con

cord

ance

Helland_vs_Verhaak

Konecny_vs_Helland

Verhaak_vs_Konecny

Two−way concordancesB

0.00

0.25

0.50

0.75

1.00

0 25 50 75 100

% of Dataset removed

Con

cord

ance

IMR

DIF

PRO

MES

overall

Three−way concordance

yncenoK

Helland

IMR

Helland

DIF

yncenoK

Helland

PRO

Helland

MES

Overlap in nonmarginal cases (all cases)D

C

yncenoK

yncenoK

6.4%(16.8%)

1.7%(6.8%)

0.6%(4%)

80.8%(44.5%)5.8%

(11.8%)

0%(4.5%)

16.5%(24.5%)

0%(8.1%)

58.7%(28.8%) 3.4%

(8.8%)

1.9%(6%)

0.6%(7.5%)

0.6%(2%)

76.1%(38.1%) 9.2%

(12.1%)

4.9%(13.1%)

0%(6.9%)

8.6%(20.3%)

4.3%(12.2%)

0.7%(2.5%)

51.8%(23%)

1.4%(10.3%)

6.4%(12.4%)

15.6%(21.3%)

4.9%(8.9%)

19.9%(18.2%)

4.7%(11.6%)

14.6%(14.8%)

Verhaak Verhaak

VerhaakVerhaak

Figure 2.

Concordance analysis. A, Contingency table showing concordance of subtypes while comparing the methods pairwise. B, Pairwise concordance betweenthe methods versus percentage of the dataset with samples of lower subtype margins removed. C, Three-way overall concordance between themethods and that of the individual subtypes versus percentage removed. D, The classification of patients by three published algorithms as a Venndiagram for each of the four subtypes. Each area shows percentages of patients when all patients are classified (below, in parentheses) and after refusingto classify 75% of the most marginally classified tumors by any of the three methods (above). Thus, the numbers on the top of the three-way intersectionare the concordant tumors according to the three original algorithms. Bottom numbers indicate relatively unambiguous subtype predictions by allthree algorithms and which are also concordant with the others.

Consensus on Molecular Subtypes of HGSOC

www.aacrjournals.org Clin Cancer Res; 24(20) October 15, 2018 5041

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 6: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

threshold of where only 10% of tumors are classified, 88% oftumors overall are concordantly classified by all three pub-lished subtyping algorithms (Fig. 2C). This large gain inconcordance results from large reductions in both singletoncalls, tumors assigned to one subtype by one algorithm, butnot by the other two algorithms, and in 2-to-1 calls, tumorsassigned to one subtype by two algorithms, but not by the third(Fig. 2D). This indicates that tumors distinctly classifiable by asingle algorithm are more likely to be concordantly classifiedby the other algorithms, and conversely, tumors that appear

ambiguous to one algorithm are less likely to be classified inthe same way by the other algorithms.

Survival analysisAll proposed subtyping algorithms classified patients into

groups that significantly differed in overall survival (Fig. 3A,P < 10�5 for each subtyping algorithm, log-rank test).Comparing low-risk with high-risk subtypes for each algo-rithm, the HRs increase from approximately 1.5 as marginalcases are removed (Fig. 3B), suggesting that marginal cases

n = 1,581HR C2_diffL: 1.40 (1.10−1.70)HR C3_profL: 1.70 (1.50−2.00)HR C4_mescL: 1.80 (1.50−2.00)Log-rank P = 4.2E−09

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5 10.0

Time (years)

Sur

viva

l

C1_immL C2_diffL C4_mescL C3_profL

Konecny

n = 1,581HR DIF: 1.50 (1.20−1.70)HR PRO: 1.50 (1.30−1.80)HR MES: 2.00 (1.70−2.30)Log-rank P = 8.3E−10

0.00

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5 10.0

Time (years)

Sur

viva

l

IMR PRO DIF MES

Verhaak

n = 1,581HR C4: 1.10 (0.94−1.40)HR C5: 1.60 (1.40−1.90)HR C1: 1.80 (1.50−2.10)Log-rank P = 1.6E−10

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5 10.0

Time (years)

Sur

viva

l

C2 C4 C5 C1

Helland

A

1.0

1.5

2.0

2.5

0 25 50 75

Percentage of dataset removed

HR

HR w.r.t. less risky subtypes C1_immL

1.0

1.5

2.0

2.5

0 25 50 75

Percentage of dataset removed

HR

HR w.r.t. less risky subtypes IMR

1.0

1.5

2.0

2.5

0 25 50 75

Percentage of dataset removed

HR

HR w.r.t. less risky subtypes C2 C4

B

Figure 3.

Survival analysis. A, Kaplan–Meier curves of subtypes of the 1,581 patients with survival data under different methods. B, HRs and 95% CIs of the lowest risksubtype (Konecny and Verhaak) or two subtypes (Helland) compared with the remaining subtypes.

Chen et al.

Clin Cancer Res; 24(20) October 15, 2018 Clinical Cancer Research5042

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 7: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

may contribute to the intermediate survival profiles betweensubtypes.

Robustness of the classifiersRobust molecular subtyping should be replicable in multiple

datasets. We performed de novo clustering in 15 independentovarian datasets using the authors' original gene lists and clus-teringmethods.We compared these de novo clusterswith the labelsfrom our implementation of the published classifiers to assessrobustness using the PS statistic (12). For PS estimation, weincluded validation datasets with at least 100 HGSOC tumors.Overall, we observed low robustness for all classifiers, with PSvalues under 0.6 for the three algorithms across datasets(Supplementary Fig. S2), nonemeeting the 0.8 threshold typicallyindicating robust classes (12, 17).

To assess whether low confidence predictions are driving thePS estimation, we recomputed the robustness of each algorithmset to classify varying fractions of the tumors with the highestmargins. We used the largest dataset available, the TCGA data-set, as the validation set, and varied margin cutoffs of the Tothilland Konecny classifiers to require them to classify between 25%and 100% of the cases. From 10 random clustering runs, wereport the median PS for the dataset. Clustering was performedon the full TCGA dataset and tumors of low margin values wereremoved subsequent to clustering and after the classifier wasfully defined, in order to avoid optimistically biasing the appar-ent strength of clusters. We observed that the robustness of eachalgorithm is substantially improved by preventing them toclassify ambiguous cases. The Tothill algorithm achieved almostperfect robustness (PS ¼ 0.96) when allowed to leave 75% ofcases unclassified (Fig. 4).

Consensus classifierTo maximize concordance across classifiers, we developed

consensusOV, a consensus subtyping scheme facilitating classi-fication of tumors of well-defined subtypes (Fig. 5). Thisclassifier uses binary gene pairs (20, 21) to support applicationacross gene expression platforms. The consensusOV classifierexhibits overall pairwise concordance of 67% to 78% witheach of the other three algorithms, when classifying all tumors,and 94% concordance with tumors that are concordantlyclassified by the other three algorithms (Fig. 5A). The marginsof consensusOV are higher for concordantly classified cases thanfor nonconcordantly classified cases, and this difference inmargins is greater than for any of the other three classifiers(Fig. 6A). Accordingly, consensusOV was also most effective inidentifying concordantly classified cases, although it was sim-ilar to the Konecny classifier in this respect (AUC ¼ 0.76;Fig. 6B). As expected, differences in survival of subsets identi-fied by consensusOV are similar to those identified by previousclassifiers. The highest risk subtypes are proliferative [HR, 1.44;95% confidence interval (CI), 1.07�1.94] and mesenchymal(HR, 1.97; 95% CI, 1.46�2.67) when removing 75% of inde-terminate low-margin tumors, with similar HRs for the con-cordant cases (Fig. 5B).

DiscussionThe existence of four distinct and concordant molecular sub-

types of HGSOC has been reported in several studies of largepatient cohorts (4–6, 9), but also called into question by anothereffort (2) that could not identify subtypes, and by an independentvalidation effort that reported only two or three reproducible

0.0

0.2

0.4

0.6

0.8

1.0

5705520

Percent removed

PS

MethodTothill

Konecny

Figure 4.

Robustness analysis of published classifiers, byprediction strength. In each dataset, concordancewas calculated between the published classifier andaclassifier retrained on the validation dataset. TheTCGA dataset also classified using the publishedclassifiers of Helland and Konecny (no retraining wasdone for the classifiers). The TCGA dataset was alsoclustered using the methods of Tothill and Konency(in solid and dashed lines, respectively). Sampleswere removed from prediction strength calculationsstarting with the most ambiguous samples (with thesmallest difference between the top subtypeprediction and runner-up subtype prediction); the x-axis shows the percent removed before computingprediction strength. Each algorithm improves inrobustness when allowed to leave ambiguoussamples, that it is less certain in its classification,unclassified.

Consensus on Molecular Subtypes of HGSOC

www.aacrjournals.org Clin Cancer Res; 24(20) October 15, 2018 5043

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 8: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

288 41 6 16

72 304 16 2

11 42 278 20

42 29 46 332C

2C

4C

5C

1

IMR_consensus DIF_consensus PRO_consensus MES_consensus

Consensus gene−pairs RF n = 1,545 Percent concordant = 77.80% Cramer v = 0.71

Hel

land

318 44 3 40

54 342 42 61

24 29 262 8

17 1 39 261

IMR

DIF

PR

OM

ES

IMR_consensus DIF_consensus PRO_consensus MES_consensus

Consensus gene−pairs RF n = 1,545 Percent concordant = 76.57% Cramer v = 0.70

Ver

haak

312 54 8 25

44 278 123 11

3 63 190 73

54 21 25 261

C1_

imm

LC

2_di

ffLC

3_pr

ofL

C4_

mes

cL

IMR_consensus DIF_consensus PRO_consensus MES_consensus

Consensus gene−pairs RF n = 1,545 Percent concordant = 67.38% Cramer v = 0.59

Kon

ecny

224 9 1 9

9 201 4 0

0 3 131 1

7 0 3 196

IMR

DIF

PR

OM

ES

IMR_consensus DIF_consensus PRO_consensus MES_consensus

Consensus gene−pairs RF n = 798 Percent concordant = 94.24% Cramer v = 0.93

Con

cord

ant_

Cla

ssifi

catio

n

A

n = 396HR DIF: 1.131 (0.727−1.759)HR PRO: 1.444 (1.073−1.943)HR MES: 1.972 (1.456−2.670)Log-rank P = 2.5E−03

0.00

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5

Time (years)

Sur

viva

l

IMR_consensus

MES_consensus

PRO_consensus

DIF_consensus

Combined classifier: gene pairs RF Patients with higher margins

n = 328HR DIF: 1.083 (0.678−1.731)HR PRO: 1.313 (0.969−1.779)HR MES: 1.730 (1.262−2.371)Log-rank P = 4.7E−02

0.00

0.25

0.50

0.75

1.00

0.0 2.5 5.0 7.5

Time (years)

Sur

viva

l

IMR_consensus

MES_consensus

PRO_consensus

DIF_consensus

Combined classifier: gene pairs RF Patients with higher margins (concordant cases)

B

Figure 5.

Concordance and survival stratification of consensusOV. A, Contingency plots showing concordance of subtype classification between consensusOV andthe classifiers of Helland, Verhaak, and Konecny. The fourth (bottom right) plot shows the concordance between the consensus classifier and thepatients concordantly classified between the three classifiers. B, Survival curves for the pooled dataset provided by consensusOV. Classification wasperformed using leave-one-dataset-out validation. For the bottom two figures, classification with consensusOV was performed with the default cutoff,in which 75% of patients with the lowest margin are not classified.

Chen et al.

Clin Cancer Res; 24(20) October 15, 2018 Clinical Cancer Research5044

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 9: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

0

1

2

3

Nonconcordant Concordant

P = 2.26E−58

Mar

gin

valu

eHelland

0.0

0.2

0.4

0.6

Nonconcordant Concordant

P = 2.96E−17

Mar

gin

valu

e

Verhaak

0.0

0.2

0.4

0.6

Nonconcordant Concordant

P = 2.76E−45

Mar

gin

valu

e

Konecny

0.00

0.25

0.50

0.75

Nonconcordant Concordant

P = 4.29E−132

Mar

gin

valu

e

ConsensusOVA

B

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

False positive rate

True

pos

itive

rat

e

ConsensusOV, AUC: 0.763Helland, AUC: 0.748Konecny, AUC: 0.708Verhaak, AUC: 0.647

ROC Curves for predicting concordance

Figure 6.

Margin analysis.A,Boxplots indicating themargin values assignedby each classifier to concordant anddiscordant cases. All statistical testswere performedusing theWilcoxon rank-sum test. B, ROC curve for assessing the ability of margin values to discriminate between concordant and discordant cases.

Consensus on Molecular Subtypes of HGSOC

www.aacrjournals.org Clin Cancer Res; 24(20) October 15, 2018 5045

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 10: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

subtypes (27). Meanwhile, significant effort is being expendedto translate these subtypes to clinical practice, for example, topredict response to the angiogenesis inhibitor bevacizumab inthe ICON7 trial (28, 29). Our study pursues three major objec-tives: (i) reproduction of published subtype classificationalgorithms as an open-source resource; (ii) evaluation of therobustness and prognostic value of each proposed subtypingscheme in independent data; and (iii) consolidation of proposedsubtyping schemes into a consensus algorithm.

We find that although the proposed four-subtype classifica-tions demonstrate significant concordance and association withpatient survival, none is robust to retraining in new datasets. Bymodifying any of these algorithms to prevent classification oftumors of ambiguous subtype, robustness and concordance ofsubtyping algorithms improve dramatically. We propose a"consensus" classifier that can identify the most unambiguous-ly classifiable tumors, although a continuous trade-off existsbetween classifying more tumors versus having greater confi-dence in those classified.

Ambiguity in tumor classification might arise from a hetero-geneous admixture of different subtypes, or from a morehomogeneous composition of indeterminate subtype. Thisdistinction has implications for the therapeutic value of theproposed subtypes. Lohr and colleagues estimated that 90% oftumors in the TCGA HGSOC dataset are polyclonal (30), andclonal spread of HGSOC has been directly inferred from single-nucleus sequencing (31). However, it remains unclear whethermultiple clones in a tumor are consistently classifiable to thesame subtype. If a tumor consists of multiple clones of differentsubtypes, then a subtype-specific therapy will likely lead torelapse as other clones survive and continue to grow. If thissituation is common, even unambiguously classifiable tumorsmight be contaminated by small amounts of another subtypethat could lead to relapse after subtype-specific therapy. Thisquestion could not be resolved by the current datasets, but mayeventually be addressed by single-cell RNA sequencing (32),which is expected to further improve precision HGSOC molec-ular subtyping.

Several findings stand out in the validation of published sub-typing algorithms. First, although previous studies reportedinconsistent findings on whether subtypes differ by patient sur-vival, our analysis in independent data showed clear survivaldifferences. The 5-year survival rate for patients with differentsubtypes ranged from as low as 20% to as high as 50%. Second,published algorithmsdonotmeet previously defined standards ofrobustness in terms of PS, a measure of consistency betweensubtype classifiers trained in independent datasets. Finally, theconcordance of three algorithms, established independently bydifferent research groups from different patient cohorts, is onlymoderate but can be greatly improved by modifying the originalalgorithms to allow them to leave ambiguous tumors unclassified.In their original forms, all-way concordance of the four definedclasses occurs in 23% to 45% of tumors. As the individualalgorithms are modified so they are allowed to leave ambiguouscases unclassified, the minority of remaining tumors can beclassified with over 90% concordance between the three algo-rithms. This is a novel finding of interest, because an alternatepossibility was that classifiers trained on different datasets wouldsuffer low concordance no matter how they treated uncertaintumors. This finding suggests a subset of tumors of "pure" sub-type; unfortunately, such unambiguous cases account for as few as

25% of HGSOC tumors. This places important limitations onthe potential for clinical application of HGSOC subtypes. Theproposed alternative, consensusOV, identifies the consensus ofpublished HGSOC subtype classifiers. By training on multipledatasets, using binary (pairwise greater-than or less-than)relationships between pairs of genes, and using a relatively smallgene set, it is designed to identify robustly classifiable HGSOCtumors across gene expression platforms and datasets.

Moving forward, general agreement on how molecular sub-groups of ovarian cancer are defined would facilitate the use ofexpression data in clinical management. (33). The currentsubgroups, although prognostically important, are not yetclinically meaningful. Much like other prognostic factors suchas age, ascites, and histology, they do not alter clinical man-agement. However, a better understanding of the biologyunderlying the subgroups will provide a more rational targetedtreatment of those patients (perhaps first in trial) such as seenin HRD tumors with PARP inhibitors. The use of algorithmsthat can classify the tumor of an individual patient, whileallowing some tumors to remain unclassified, along withassessment of subtype robustness in independent datasets byPS, would move the field closer to this goal.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors' ContributionsConception and design: G.M. Chen, L. Kannan, M. Birrer, B. Haibe-Kains,L. WaldronDevelopment ofmethodology:G.M.Chen, L. Kannan,G. Parmigiani,M. Birrer,B. Haibe-Kains, L. WaldronAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): D.M.A. Gendoo, M. Birrer, B. Haibe-KainsAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): G.M. Chen, L. Kannan, Z. Safikhani, G. Parmigiani,M. Birrer, B. Haibe-Kains, L. WaldronWriting, review, and/or revision of the manuscript: G.M. Chen, L. Kannan,L. Geistlinger, M. Birrer, B. Haibe-Kains, L. WaldronAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): L. Kannan, L. Geistlinger, V. Kofia, Z. Safikhani,B. Haibe-Kains, L. WaldronStudy supervision: B. Haibe-Kains, L. WaldronOther (developed the MetaGxOvarian package): D.M.A. Gendoo

AcknowledgmentsThe authors thank Brad Nelson for his feedback regarding the prognostic

value of molecular subtypes in HGSOC and Andrew Cherniak for providingABSOLUTE purity and ploidy estimates for tumors from The Cancer GenomeAtlas. G.M. Chen was supported by the Canadian Institutes of HealthResearch and the Terry Fox Research Institutes. D.M.A. Gendoo was sup-ported by the Ontario Institute for Cancer Research through funding pro-vided by the Government of Ontario. Z. Safikhani was supported by TheCancer Research Society (Canada). G. Parmigiani was supported by TheNational Cancer Institute 5P30CA006516-53. B. Haibe-Kains was supportedby the Gattuso-Slaight Personalized Cancer Medicine Fund at PrincessMargaret Cancer Centre, the Canadian Institutes of Health Research, andthe Terry Fox Research Institute. L Waldron was supported by grants from theNCI at the NIH (1R03CA191447-01A1 and U24CA180996). This work waspart of the immunoTherapy Network supported by the Terry Fox ResearchInstitute (Translational Research Program Grant #1060).

The costs of publication of this article were defrayed in part by the paymentof page charges. This article must therefore be hereby marked advertisementin accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received March 9, 2018; revised May 1, 2018; accepted June 26, 2018;published first July 3, 2018.

Chen et al.

Clin Cancer Res; 24(20) October 15, 2018 Clinical Cancer Research5046

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 11: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

References1. Liu J, Matulonis UA. New strategies in ovarian cancer: translating the

molecular complexity of ovarian cancer into treatment advances. ClinCancer Res 2014;20:5150–6.

2. Bonome T, Levine DA, Shih J, Randonovich M, Pise-Masison CA, Bogo-molniy F, et al. A gene signature predicting for survival in suboptimallydebulked patients with ovarian cancer. Cancer Res 2008;68:5478–86.

3. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, et al. Novelmolecular subtypes of serous and endometrioid ovarian cancer linked toclinical outcome. Clin Cancer Res 2008;14:5198–208.

4. The Cancer Genome Atlas Research Network. Integrated genomic analysesof ovarian carcinoma. Nature 2011;474:609–15.

5. Verhaak RGW, Tamayo P, Yang J-Y, Hubbard D, Zhang H, Creighton CJ,et al. Prognostically relevant gene signatures of high-grade serous ovariancarcinoma. J Clin Invest 2013;123:517–25.

6. Konecny GE, Wang C, Hamidi H, Winterhoff B, Kalli KR, Dering J, et al.Prognostic and therapeutic relevance of molecular subtypes in high-gradeserous ovarian cancer. J Natl Cancer Inst 2014;106:pii:dju249.

7. Waldron L, Haibe-Kains B, Culhane AC, Riester M, Ding J, Wang XV, et al.Comparative meta-analysis of prognostic gene signatures for late-stageovarian cancer. J Natl Cancer Inst 2014;106:pii:dju049.

8. Planey CR, Gevaert O. CoINcIDE: A framework for discovery of patientsubtypes across multiple datasets. Genome Med 2016;8:27.

9. Helland Å, Anglesio MS, George J, Cowin PA, Johnstone CN, House CM,et al. Deregulation of MYCN, LIN28B and LET7 in a molecular subtype ofaggressive high-grade serous ovarian cancers. PLoS One 2011;6:e18064.

10. Ganzfried BF, Riester M, Haibe-Kains B, Risch T, Tyekucheva S, Jazic I, et al.curatedOvarianData: clinically annotated data for the ovarian cancertranscriptome. Database 2013;2013:bat013.

11. Cheng X, Lu W, Liu M. Identification of homogeneous and heterogeneousvariables in pooled cohort studies. Biometrics 2015;71:397–403.

12. Tibshirani R, Walther G. Cluster validation by prediction strength.J Comput Graph Stat 2005;14:511–28.

13. Gendoo DMA, Ratanasirigulchai N, Chen GM, Waldron L, Haibe-Kains B.MetaGxData: breast and ovarian clinically annotated transcriptomicsdatasets. bioRxiv 2016 [cited 2017 May 18]. Available from: http://biorxiv.org/content/early/2016/05/12/052910.abstract.

14. Waldron L, RiesterM, RamosM, Parmigiani G, Birrer M. The doppelg€angereffect: hidden duplicates in databases of transcriptome profiles. J NatlCancer Inst 2016;108:djw146.

15. R Core Team. R: a language and environment for statistical computing[Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2014.Available from: http://www.R-project.org.

16. Schr€oder MS, Culhane AC, Quackenbush J, Haibe-Kains B. survcomp: anR/Bioconductor package for performance assessment and comparisonof survival models. Bioinformatics 2011;27:3206–8.

17. Haibe-Kains B, Desmedt C, Loi S, Culhane AC, Bontempi G, QuackenbushJ, et al. A three-gene model to robustly identify breast cancer molecularsubtypes. J Natl Cancer Inst 2012;104:311–25.

18. GendooDMA,RatanasirigulchaiN,Schr€oderMS,Par�eL,ParkerJS,PratA,etal.Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 2016;32:1097–9.

19. Guinney J, Dienstmann R, Wang X, de Reyni�es A, Schlicker A, Soneson C,et al. The consensus molecular subtypes of colorectal cancer. Nat Med2015;21:1350–6.

20. Paquet ER, Hallett MT. Absolute assignment of breast cancer intrinsicmolecular subtype. J Natl Cancer Inst 2015;107:357.

21. Patil P, Bachant-Winner P-O, Haibe-Kains B, Leek JT. Test set bias affectsreproducibility of gene signatures. Bioinformatics 2015;31:2318–23.

22. Riester M, Wei W, Waldron L, Culhane AC, Trippa L, Oliva E, et al. Riskprediction for late-stage ovarian cancer by meta-analysis of 1525 patientsamples. J Natl Cancer Inst 2014;106:pii:dju048.

23. Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, et al.Absolute quantification of somatic DNA alterations in human cancer. NatBiotechnol 2012;30:413–21.

24. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al.Orchestrating high-throughput genomic analysis with Bioconductor.Nat Methods 2015;12:115–21.

25. Xie Y. Dynamic Documents with R and knitr, Second Edition. Boca Raton,FL: CRC Press; 2015.

26. Cram�er H. Mathematical Methods of Statistics (PMS-9). Princeton, NJ:Princeton University Press; 2016.

27. Way GP, Rudd J, Wang C, Hamidi H, Fridley BL, Konecny GE, et al.Comprehensive cross-population analysis of high-grade serous ovariancancer supports no more than three subtypes. G3 2016;6:4097–103.

28. Gourley C, McCavigan A, Perren T, Paul J, Michie CO, Churchman M,et al. Molecular subgroup of high-grade serous ovarian cancer (HGSOC)as a predictor of outcome following bevacizumab. J Clin Oncol2014;32:5502.

29. Winterhoff B, Kommoss S, Oberg AL, Wang C, Riska SM, Konecny GE,et al. Bevacizumab may differentially improve survival for patientswith the proliferative and mesenchymal molecular subtype of ovariancancer. J Clin Oncol 2014;32:32.

30. Lohr JG, Stojanov P, Carter SL, Cruz-Gordillo P, Lawrence MS, Auclair D,et al. Widespread genetic heterogeneity inmultiplemyeloma: implicationsfor targeted therapy. Cancer Cell 2014;25:91–101.

31. McPherson A, Roth A, Laks E, Masud T, Bashashati A, Zhang AW, et al.Divergentmodes of clonal spread and intraperitonealmixing inhigh-gradeserous ovarian cancer. Nat Genet 2016;48:758–67.

32. Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, Rothenberg ME, et al.Quantitative assessment of single-cell RNA-sequencing methods.Nat Methods 2014;11:41–6.

33. Waldron L, Riester M, Birrer M. Molecular subtypes of high-grade serousovarian cancer: the holy grail? J Natl Cancer Inst 2014;106:pii:dju297.

34. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI geneexpression and hybridization array data repository. Nucleic Acids Res2002;30:207–10.

35. Yoshihara K, Tajima A, Yahata T, Kodama S, Fujiwara H, Suzuki M,et al. Gene expression profile for predicting survival in advanced-stageserous ovarian cancer across two independent datasets. PLoS One2010;5:e9615.

36. Denkert C, Budczies J, Darb-Esfahani S, Gy€orffy B, Sehouli J, K€onsgen D,et al. A prognostic gene expression index in ovarian cancer - validationacross different independent data sets. J Pathol 2009;218:273–80.

37. Mok SC,BonomeT, Vathipadiekal V, Bell A, JohnsonME,WongK-K, et al. Agene signature predictive for outcome in advanced ovarian cancer identifiesa survival factor: microfibril-associated glycoprotein 2. Cancer Cell2009;16:521–32.

38. Mateescu B, Batista L, CardonM, Gruosso T, de Feraudy Y, Mariani O, et al.miR-141 and miR-200a act on ovarian tumorigenesis by controllingoxidative stress response. Nat Med 2011;17:1627–35.

39. Dressman HK, Berchuck A, Chan G, Zhai J, Bild A, Sayer R, et al. Anintegrated genomic-based approach to individualized treatment ofpatients with advanced-stage ovarian cancer. J Clin Oncol 2007;25:517–25.

40. KarlanBY,Dering J,WalshC,Orsulic S, Lester J, Anderson LA, et al. POSTN/TGFBI-associated stromal signature predicts poor prognosis in serousepithelial ovarian cancer. Gynecol Oncol 2014;132:334–42.

41. Crijns APG, Fehrmann RSN, de Jong S, Gerbens F, Meersma GJ, Klip HG,et al. Survival-related profile, pathways, and transcription factors in ovariancancer. PLoS Med 2009;6:e24.

42. Pils D, Hager G, Tong D, Aust S, Heinze G, Kohl M, et al. Validating theimpact of a molecular subtype in ovarian cancer on outcomes: a study ofthe OVCAD Consortium. Cancer Sci 2012;103:1334–41.

43. Bentink S, Haibe-Kains B, Risch T, Fan J-B, Hirsch MS, Holton K, et al.Angiogenic mRNA and microRNA gene expression signature predicts anovel subtype of serous ovarian cancer. PLoS One 2012;7:e30269.

44. Yoshihara K, Tsunoda T, Shigemizu D, Fujiwara H, Hatae M, Fujiwara H,et al. High-risk ovarian cancer based on 126-gene expression signatureis uniquely characterized by downregulation of antigen presentationpathway. Clin Cancer Res 2012;18:1374–85.

45. Meyniel J-P, CottuPH,DecraeneC, SternM-H,Couturier J, Lebigot I, et al. Agenomic and transcriptomic approach for a differential diagnosis betweenprimary and secondary ovarian carcinomas in patients with a previoushistory of breast cancer. BMC Cancer 2010;10:222.

www.aacrjournals.org Clin Cancer Res; 24(20) October 15, 2018 5047

Consensus on Molecular Subtypes of HGSOC

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784

Page 12: Consensus on Molecular Subtypes of High-Grade Serous ... · statistical significance was assessed by log-rank test using the survcomp R package (16). HRs were calculated using the

2018;24:5037-5047. Published OnlineFirst July 3, 2018.Clin Cancer Res   Gregory M. Chen, Lavanya Kannan, Ludwig Geistlinger, et al.   CarcinomaConsensus on Molecular Subtypes of High-Grade Serous Ovarian

  Updated version

  10.1158/1078-0432.CCR-18-0784doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://clincancerres.aacrjournals.org/content/suppl/2018/07/03/1078-0432.CCR-18-0784.DC1

Access the most recent supplemental material at:

   

   

  Cited articles

  http://clincancerres.aacrjournals.org/content/24/20/5037.full#ref-list-1

This article cites 41 articles, 5 of which you can access for free at:

  Citing articles

  http://clincancerres.aacrjournals.org/content/24/20/5037.full#related-urls

This article has been cited by 10 HighWire-hosted articles. Access the articles at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://clincancerres.aacrjournals.org/content/24/20/5037To request permission to re-use all or part of this article, use this link

on March 28, 2021. © 2018 American Association for Cancer Research. clincancerres.aacrjournals.org Downloaded from

Published OnlineFirst July 3, 2018; DOI: 10.1158/1078-0432.CCR-18-0784