6
Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer John B. Welsh*, Patrick P. Zarrinkar* , Lisa M. Sapinoso*, Suzanne G. Kern*, Cynthia A. Behling , Bradley J. Monk § , David J. Lockhart* , Robert A. Burger § , and Garret M. Hampton* i *Genomics Institute of the Novartis Research Foundation, 3115 Merryfield Row, San Diego, CA 92121; Department of Pathology, University of California, San Diego, CA 92103; Affymetrix, Inc., 3380 Central Expressway, Santa Clara, CA 95051; and § Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, University of California, Irvine–Medical Center, Orange, CA 92868 Communicated by Peter G. Schultz, Genomics Institute of the Novartis Research Foundation, La Jolla, CA, December 1, 2000 (received for review November 2, 2000) Epithelial ovarian cancer is the leading cause of death from gyne- cologic cancer, in part because of the lack of effective early detection methods. Although alterations of several genes, such as c-erb-B2, c-myc, and p53, have been identified in a significant fraction of ovarian cancers, none of these mutations are diagnostic of malignancy or predictive of tumor behavior over time. Here, we used oligonucleotide microarrays with probe sets complementary to >6,000 human genes to identify genes whose expression correlated with epithelial ovarian cancer. We extended current microarray technology by simultaneously hybridizing ovarian RNA samples in a highly parallel manner to a single glass wafer con- taining 49 individual oligonucleotide arrays separated by gaskets within a custom-built chamber (termed ‘‘array-of-arrays’’). Hierar- chical clustering of the expression data revealed distinct groups of samples. Normal tissues were readily distinguished from tumor tissues, and tumors could be further subdivided into major group- ings that correlated both to histological and clinical observations, as well as cell type-specific gene expression. A metric was devised to identify genes whose expression could be considered ideal for molecular determination of epithelial ovarian malignancies. The list of genes generated by this method was highly enriched for known markers of several epithelial malignancies, including ovar- ian cancer. This study demonstrates the rapidity with which large amounts of expression data can be generated. The results highlight important molecular features of human ovarian cancer and identify new genes as candidate molecular markers. E pithelial ovarian cancer has one of the worst prognoses among gynecologic malignancies. The majority of early-stage cancers are asymptomatic, and over three-quarters of the diag- noses are made at a time when the disease has already estab- lished regional or distant metastases. Whereas the 5-yr survival is favorable for women with an early diagnosis (90%), survival with distant metastases over the same time period is less than 20% (1). Most ovarian cancers arise from the surface (coelomic) epithelium that covers the ovary and are thought to progress through premalignant phases before becoming frankly invasive (2). The only validated marker in current use for the manage- ment of ovarian cancer is CA125, which can be detected in the serum of more than 80% of women with ovarian carcinomas (3). However, CA125 is thought to be robust only in following the response or progression of the disease, but not as a diagnostic or prognostic marker (4). Thus, there is an important need for additional diagnostic and prognostic markers for this disease. Molecular genetic analyses of ovarian cancers have uncovered genetic alterations of several genes, such as c-erb-B2, c-myc, and p53, in a significant fraction of tumors (5). Global studies of genomic rearrangements suggest that additional genes involved in ovarian tumor progression correlate to a variable extent with clinical parameters (6). Previously, spotted cDNA filters and microarrays have been used to profile limited numbers of epithelial ovarian adenocarcinomas (7, 8). These studies iden- tified several genes, such as HE4, that may be diagnostic of ovarian cancer and other genes that correlate with serous or mucinous histology. Despite this progress, however, relatively little is known about the cellular changes that correlate with, or determine the different biological properties and diverse behav- ior of, individual ovarian epithelial tumors. For example, it is not known what underlying molecular properties distinguish border- line from invasive tumors. We used oligonucleotide microarrays with probe-sets comple- mentary to more than 6,000 human genes to monitor the levels of expression within aggregate normal and malignant ovarian tissues. To help interpret the observed patterns of gene expres- sion, we selectively macrodissected normal samples into epithe- lial and stromal fractions. We also hybridized RNAs from endothelial and activated B cells to help discern patterns of gene expression consistent with the presence of blood vessels andyor infiltrating immune cells. A new, high-throughput protocol allowed simultaneous hybridization of 49 samples to individual microarrays on a single glass wafer (P.P.Z., J. K. Mainquist, M. Zamora, D. Stern, J.B.W., L.M.S., G.M.H., and D.J.L., unpub- lished data). Data obtained from normal and malignant ovarian tissues allowed us to distinguish these two classes of tissue based on quantitative expression levels, and to propose several genes as candidate diagnostic markers. Expression profiles of different tumors demonstrate significant heterogeneity, but it is nonethe- less possible to identify groups of genes whose altered expression may influence their clinical behavior. Materials and Methods Tissues and Cell Lines. Twenty-seven flash-frozen serous papillary adenocarcinomas of the ovary and three normal samples of whole ovarian tissue were made available through the Cooper- ative Human Tissue Network (CHTN Midwestern Division, Columbus, OH). Institutional Review Board approvals were granted before accession of the tissues. Frozen sections were taken from each tissue block before removing tissue for expres- Abbreviation: RT, reverse transcription. Present address: Aventa Biosciences Corporation, 4757 Nexus Centre Drive, Suite 200, San Diego, CA 92121. i To whom reprint requests should be addressed. E-mail: [email protected]. The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. §1734 solely to indicate this fact. 1176 –1181 u PNAS u January 30, 2001 u vol. 98 u no. 3 Downloaded by guest on July 28, 2021

Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

Analysis of gene expression profiles in normal andneoplastic ovarian tissue samples identifiescandidate molecular markers ofepithelial ovarian cancerJohn B. Welsh*, Patrick P. Zarrinkar*†, Lisa M. Sapinoso*, Suzanne G. Kern*, Cynthia A. Behling‡, Bradley J. Monk§,David J. Lockhart*¶, Robert A. Burger§, and Garret M. Hampton*i

*Genomics Institute of the Novartis Research Foundation, 3115 Merryfield Row, San Diego, CA 92121; ‡Department of Pathology, University of California,San Diego, CA 92103; ¶Affymetrix, Inc., 3380 Central Expressway, Santa Clara, CA 95051; and §Division of Gynecologic Oncology, Department ofObstetrics and Gynecology, University of California, Irvine–Medical Center, Orange, CA 92868

Communicated by Peter G. Schultz, Genomics Institute of the Novartis Research Foundation, La Jolla, CA, December 1, 2000 (received for reviewNovember 2, 2000)

Epithelial ovarian cancer is the leading cause of death from gyne-cologic cancer, in part because of the lack of effective earlydetection methods. Although alterations of several genes, such asc-erb-B2, c-myc, and p53, have been identified in a significantfraction of ovarian cancers, none of these mutations are diagnosticof malignancy or predictive of tumor behavior over time. Here, weused oligonucleotide microarrays with probe sets complementaryto >6,000 human genes to identify genes whose expressioncorrelated with epithelial ovarian cancer. We extended currentmicroarray technology by simultaneously hybridizing ovarian RNAsamples in a highly parallel manner to a single glass wafer con-taining 49 individual oligonucleotide arrays separated by gasketswithin a custom-built chamber (termed ‘‘array-of-arrays’’). Hierar-chical clustering of the expression data revealed distinct groups ofsamples. Normal tissues were readily distinguished from tumortissues, and tumors could be further subdivided into major group-ings that correlated both to histological and clinical observations,as well as cell type-specific gene expression. A metric was devisedto identify genes whose expression could be considered ideal formolecular determination of epithelial ovarian malignancies. Thelist of genes generated by this method was highly enriched forknown markers of several epithelial malignancies, including ovar-ian cancer. This study demonstrates the rapidity with which largeamounts of expression data can be generated. The results highlightimportant molecular features of human ovarian cancer and identifynew genes as candidate molecular markers.

Epithelial ovarian cancer has one of the worst prognosesamong gynecologic malignancies. The majority of early-stage

cancers are asymptomatic, and over three-quarters of the diag-noses are made at a time when the disease has already estab-lished regional or distant metastases. Whereas the 5-yr survivalis favorable for women with an early diagnosis ('90%), survivalwith distant metastases over the same time period is less than20% (1). Most ovarian cancers arise from the surface (coelomic)epithelium that covers the ovary and are thought to progressthrough premalignant phases before becoming frankly invasive(2). The only validated marker in current use for the manage-ment of ovarian cancer is CA125, which can be detected in theserum of more than 80% of women with ovarian carcinomas (3).However, CA125 is thought to be robust only in following theresponse or progression of the disease, but not as a diagnostic orprognostic marker (4). Thus, there is an important need foradditional diagnostic and prognostic markers for this disease.

Molecular genetic analyses of ovarian cancers have uncoveredgenetic alterations of several genes, such as c-erb-B2, c-myc, andp53, in a significant fraction of tumors (5). Global studies ofgenomic rearrangements suggest that additional genes involvedin ovarian tumor progression correlate to a variable extent with

clinical parameters (6). Previously, spotted cDNA filters andmicroarrays have been used to profile limited numbers ofepithelial ovarian adenocarcinomas (7, 8). These studies iden-tified several genes, such as HE4, that may be diagnostic ofovarian cancer and other genes that correlate with serous ormucinous histology. Despite this progress, however, relativelylittle is known about the cellular changes that correlate with, ordetermine the different biological properties and diverse behav-ior of, individual ovarian epithelial tumors. For example, it is notknown what underlying molecular properties distinguish border-line from invasive tumors.

We used oligonucleotide microarrays with probe-sets comple-mentary to more than 6,000 human genes to monitor the levelsof expression within aggregate normal and malignant ovariantissues. To help interpret the observed patterns of gene expres-sion, we selectively macrodissected normal samples into epithe-lial and stromal fractions. We also hybridized RNAs fromendothelial and activated B cells to help discern patterns of geneexpression consistent with the presence of blood vessels andyorinfiltrating immune cells. A new, high-throughput protocolallowed simultaneous hybridization of 49 samples to individualmicroarrays on a single glass wafer (P.P.Z., J. K. Mainquist, M.Zamora, D. Stern, J.B.W., L.M.S., G.M.H., and D.J.L., unpub-lished data). Data obtained from normal and malignant ovariantissues allowed us to distinguish these two classes of tissue basedon quantitative expression levels, and to propose several genesas candidate diagnostic markers. Expression profiles of differenttumors demonstrate significant heterogeneity, but it is nonethe-less possible to identify groups of genes whose altered expressionmay influence their clinical behavior.

Materials and MethodsTissues and Cell Lines. Twenty-seven flash-frozen serous papillaryadenocarcinomas of the ovary and three normal samples ofwhole ovarian tissue were made available through the Cooper-ative Human Tissue Network (CHTN Midwestern Division,Columbus, OH). Institutional Review Board approvals weregranted before accession of the tissues. Frozen sections weretaken from each tissue block before removing tissue for expres-

Abbreviation: RT, reverse transcription.

†Present address: Aventa Biosciences Corporation, 4757 Nexus Centre Drive, Suite 200, SanDiego, CA 92121.

iTo whom reprint requests should be addressed. E-mail: [email protected].

The publication costs of this article were defrayed in part by page charge payment. Thisarticle must therefore be hereby marked “advertisement” in accordance with 18 U.S.C.§1734 solely to indicate this fact.

1176–1181 u PNAS u January 30, 2001 u vol. 98 u no. 3

Dow

nloa

ded

by g

uest

on

July

28,

202

1

Page 2: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

sion profiling. Each tumor was then histologically analyzed byone of us (C.A.B.). Clinical and pathologic details of the samplesare tabulated in supplementary Table 1, which is published assupplemental data on the PNAS web site, www.pnas.org. Ovar-ian cancer-derived cell lines, SK-OV-3 [American Type CultureCollection (ATCC) HTB-77], MDAH-2774 (ATCC CRL-10303), and CAOV-3 (ATCC HTB-75), were obtained from theATCC and grown in DMEM (Life Technologies, Rockville,MD), supplemented with 10% (volyvol) FCS and penicillinystreptomycin. RNAs from human umbilical vein endothelial cells(HUVECs) and human activated B cells were gifts from Drs.Akira Kawamura (The Scripps Research Institute, San Diego,CA) and Michael Cooke (Genomics Institute of the NovartisResearch Foundation), respectively. RNA from normal humanovarian tissue was purchased from BioChain Institute (Hayward,CA). RNAs from heart, lung, liver, thymus, testis, and spleenwere purchased from CLONTECH.

Tissue Processing and Preparation of RNA. Fragments of normal andmalignant ovarian tissue were sharp dissected and homogenizedwith a rotary homogenizer (Omni International, Warrenton,VA) in RNeasy lysis buffer (Qiagen, Chatsworth, CA). For eachof the normal tissues (case nos. 102, 233, and 278), surfaceepithelium was selectively procured by macrodissection. In onecase (no. 278), two samples were prepared, one of which wasenriched for epithelium and one for stroma. RNA from each ofthe tissue samples was prepared by using the RNeasy Mini Kit(Qiagen) and used at a final concentration of 2 mgyml.

Hybridization of RNAs to Oligonucleotide ‘‘Array of Arrays.’’ Labelingof samples, hybridization to oligonucleotide arrays, and scanningwere performed essentially as described (9), with modificationsto allow simultaneous processing of samples in 96-well plates.Hybridizations were performed simultaneously in a custom-builtchamber. Briefly, we obtained a glass wafer bearing 49 identicalHuGeneFL arrays before division and packaging as individualchips (Affymetrix, Santa Clara, CA). Gaskets allowed sampleseparation during hybridization on the glass wafer. The separa-tion between arrays was removed for staining, washing, andscanning steps. Arrays on the wafer were scanned in series byusing a custom-made confocal laser scanner with a speciallyconstructed translation stage. The quality and reproducibility ofthe data derived from the glass wafer array-of-arrays are highand will be described elsewhere (P.P.Z., J. K. Mainquist, M.Zamora, D. Stern, J.B.W., L.M.S., G.M.H., and D.J.L., unpub-lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the twocell lines were averaged. The data from duplicate hybridizationsof tumor 1 were kept separate to assess the reproducibility of thedata through cluster analysis (see below). RNA from six normaltissues (lung, liver, heart, spleen, testis, and thymus) was pooledbefore labeling and hybridization. RNA from activated B cellsand HUVECs was hybridized separately to microarrays contain-ing '12,000 human genes (Affymetrix; U95A). To include datafrom B cells and HUVECs in the analysis, we excluded most ofthe information from the U95A arrays and used only thehybridization intensities of genes that were also represented onthe HuGeneFL arrays (see supplementary data).

Data Analysis. Scanned image files were visually inspected forartifacts and analyzed with GENECHIP 3.1 (Affymetrix). Eachimage was then scaled to an average hybridization intensity of200, which corresponds to '3–5 transcripts per cell (9). Theexpression level (average difference) for each gene was deter-mined by calculating the average of differences in intensity(perfect match-mismatch) between its probe pairs (9–11). Geneswith average hybridization intensities ,0 across all samples wereexcluded from further analysis. CLUSTER and TREEVIEW (12)

were used to select, group, and visualize genes whose expressionvaried across the samples with SD $ 250.

To identify potential tumor markers, the hybridization inten-sity of each gene in normal and malignant samples was com-pared, and three different estimates for population differences(difference of means, fold change, and unpaired t test) wereapplied in parallel. The genes were ranked according to eachmetric, and the sum of the metrics was used to derive asemiquantitative estimate of the differential abundance of eachtranscript.

Reverse Transcription (RT)-PCR. One microgram of total RNA fromselected normal and tumor tissue samples (see legend to Fig. 3)was reverse transcribed with random hexamers and thermostableTaqGold polymerase by using the Thermoscript RT-PCR Sys-tem (Life Technologies). One microliter of RT product wasamplified by primer pairs specific for HE4, CD24, and LU(BCAM). Primers specific for 18s ribosomal RNA and empiri-cally determined ratios of 18s competimers (Ambion, Austin,TX) were used to control for the amounts of cDNA generatedfrom each sample. Amplification was carried out by using 1 unitof polymerase in a final volume of 20 ml containing 2.5 mMMgCl2. TaqGold was activated by incubation at 96°C for 12 min,and the reactions were cycled 26–30 times at 95°C for 1 min, 55°Cfor 1 min, and 72°C for 1 min, followed by a final extension at72°C for 10 min. PCR products were visualized on 2% agarosegels stained with ethidium bromide, and images were capturedby an Ultraviolet Products Image Analysis System.

ResultsTo identify patterns of gene expression in normal and malignantovarian tissue, we performed cluster analysis (12) on hybridiza-tion intensity values for each gene. We chose 1,243 genes withaverage hybridization intensities .0 that varied most across thesamples (SD $ 250), to select for genes with strong differentialexpression. Cluster analysis for these genes is shown in Fig. 1 andsupplementary Fig. 4. The complete dataset is available at ourweb site (http:yywww.gnf.orgycanceryovary).

Identification of Sample Groups by Hierarchical Clustering. Thedendrogram illustrated in Fig. 1a showed a major division in thedistribution of samples. In the leftmost columns (colored blue),the 5 normal ovarian tissue samples and a subset of 7 ovariantumors were grouped together. Within this branch, the normaltissues were found to cluster tightly together on a subbranch, andthe independent hybridizations of tumor 1 correlated well witheach other, as expected. The remaining samples formed 2 groups(black and red branches of the dendrogram). One group (redbranches) contained all of the ovarian cancer-derived cell lines,activated B cells, and endothelial cells, as well as a subset of 8 ofthe ovarian tumors. Samples in the middle columns (black)included 12 ovarian tissue samples.

Clusters of Genes Expressed in Normal and Malignant Tissues. Thetight clustering of the normal tissues was largely ‘‘driven’’ by twodistinct profiles of gene expression. The first was represented bya group of about 100 genes that were highly expressed in normaltissue and underexpressed in the tumors and cell lines [some ofwhich are highlighted in Fig. 1c (Normal)]. Many of these areimmediate early genes (e.g., c-fos, jun-B, and EGR-1) and geneswhose expression is stimulated by estrogen in breast cancer celllines (e.g., the family 4 nuclear hormone receptors 1, 2, and 3;unpublished data). The second profile was represented byseveral smaller clusters of genes that were underexpressed innormal tissue as compared with ovarian tumors and cell lines(see below). One of these clusters is expanded in Fig. 1c (Tumor)and includes genes highly expressed in a large fraction of thetumors, such as HE4 and the preferentially expressed antigen of

Welsh et al. PNAS u January 30, 2001 u vol. 98 u no. 3 u 1177

MED

ICA

LSC

IEN

CES

Dow

nloa

ded

by g

uest

on

July

28,

202

1

Page 3: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

Fig. 1. Legend on next page.

1178 u www.pnas.org Welsh et al.

Dow

nloa

ded

by g

uest

on

July

28,

202

1

Page 4: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

melanoma gene (PRAME). These two genes were highly ex-pressed in 14y27 and 15y27 of the tumors, respectively. Othersimilar clusters contained genes known to be amplified inovarian cancer, such as c-erb-B2, which was elevated in about30% of the tumors (see supplementary data), consistent withprevious reports (13).

Coclustering of normal and tumor tissues within the leftmostbranch of the dendrogram (colored blue) was due, in part, tohigh expression of a very large group of ribosomal genes, 17 ofwhich are shown in detail in Fig. 1c (Ribosomal cluster).Expression of these genes was not entirely confined to this subsetof tumors, however, because other tumors within the middlebranch (black) of the dendrogram (e.g., tumors 15, 24, and 31)also expressed some of these genes at high levels. Eight of theovarian tumor samples were found to cluster close to the groupof cell lines grown in vitro. This group, which is illustrated by theProliferative cluster (Fig. 1c), showed coordinate expression ofmultiple genes associated with cell cycle, such as cdc28 proteinkinases 1 and 2, cdc25B, and cdc20. These results suggestedthat this subset of tumors contained a high S-phase fraction(SPF), which is thought to be predictive of aggressive tumorbehavior (14).

Identification of Cell Type-Specific Gene Expression. Selective en-richment of the normal ovarian tissues into epithelial (102, 233,and 278-E) and stromal (278-S) fractions facilitated identifica-tion of genes expressed within one or both types of cells. Forexample, genes encoding members 1 and 2 of the family 4 nuclearhormone receptors, the zinc-finger transcription factor AREB6,and the early growth response protein hEGR3 (PILOT) werehighly expressed in the three epithelial-enriched samples but notin 278-S [Fig. 1c (Normal)]. The same enrichment also allowedus to identify groups of genes that encode products characteristicof fibroblasts and myofibroblasts [e.g., tropomyosin, entericsmooth muscle g-actin, and skeletal muscle LIM protein,SLIM1; Fig. 1c (Stromal)]. Based on this profile, several of thetumors (e.g., tumors 14, 15, 17, and 21; Fig. 1) were found tocontain a significant amount of stroma, which was consistentwith histological analysis. Normal human ovary purchased from

BioChain was not enriched for any specific cell type before RNApreparation, and, accordingly, we observed strong expressionsignals from genes that we determined to be specific to theepithelium and stroma.

A cluster of genes whose annotations were characteristic ofexpression within activated B cells (e.g., Ig genes) was alsoidentified. Although activated B cells were hybridized on thearrays to discern B cell expression patterns within the tumors,little correlation was found using this approach (not shown; seeclusters in supplementary data). Based on gene annotationsalone, several tumors (e.g., tumors 24, 25, and 31) displayed highexpression of B cell genes, consistent with infiltrating lympho-cytes seen on histology.

Molecular Distinctions Between Low and High Malignant PotentialOvarian Cancers. Five of seven of the tumors that clustered withnormal tissues (leftmost columns of Fig. 1a) were characterizedby histology as well-differentiated lesions, with no infiltratinglymphocytes. In contrast, tumor samples clustering with the celllines (rightmost group in Fig. 1a) were mostly (4y7) poorlydifferentiated. Samples in the middle columns showed varyingproportions of lymphocytes and necrosis, with generally lowerproportions of tumor cells and higher amounts of stroma (seesupplementary data).

Candidate Molecular Markers of Epithelial Ovarian Cancer. We nextsought genes whose expression pattern could be considereddiagnostic of epithelial ovarian cancer. Good tumor markersfulfill several criteria, including low expression in normal tissueand high expression in neoplastic tissue. Markers should alsoexhibit a clear cutoff in expression levels between normal andneoplastic tissues to unambiguously resolve the two diagnosticconditions. As illustrated above, some of the tumor samplesexhibited strong signals from genes characteristic of stromaandyor infiltrating immune cells. This observation suggested thatthey contained a low fraction of epithelial cells, which wasconfirmed by histology. Because we sought genes whose expres-sion was diagnostic of malignant epithelium, 14 of the 27 tumorswere excluded from this analysis based on their ‘‘non-epithelial’’

Fig. 2. Expression levels of selected genes in normal and malignant ovarian tissues. Shown are the hybridization signals of 30 transcripts, which were rankedhighest by the “diagnostic” metric (see Meterials and Methods). Each gene is represented by two mean values derived from the expression level in 24 malignant(red) and four normal (blue) samples. Error bars 5 99% confidence intervals. Expression in a pool of six normal tissues is shown for comparison as horizontal greenbars. Gene names correspond to Human Genome Organization (HUGO) Gene Nomenclature Committee recommendations.

Fig. 1. Expression patterns of 1,243 genes in 38 experimental samples. Rows represent individual genes; columns represent individual samples. Each cell in thematrix represents the expression level of a single transcript in a single sample, with red and green indicating transcript levels above and below the median forthat gene across all samples, respectively. Color saturation is proportional to magnitude of the difference from the mean. Gray squares, data not available. (a)Dendrogram of samples showing overall similarity in gene expression profiles across the samples. Three main subdivisions, colored blue, black, and red, arediscussed in the text. (b) Demonstration of overall groupings of genes and samples. Colored bars to the right indicate gene clusters of special interest. (c) Enlargedview of selected clusters of genes with common tissue distribution (Tumor, Stromal, Normal) or biochemical role (Proliferative, Ribosomal).

Welsh et al. PNAS u January 30, 2001 u vol. 98 u no. 3 u 1179

MED

ICA

LSC

IEN

CES

Dow

nloa

ded

by g

uest

on

July

28,

202

1

Page 5: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

expression patterns (see Discussion). To estimate how well atranscript level might correlate with malignancy, we comparedthe groups of normal (n 5 4) and tumor samples (n 5 13) withrespect to the differences and fold changes in hybridizationintensities. The significance of the differential expression wasmeasured by using the Student unpaired t test. These parameterswere ranked, and each gene was then given a score based on thesum of the ranks. The full list of genes is provided on our website, and the expression levels of the thirty most highly scoringgenes are shown in Fig. 2. Each gene’s expression level in a poolof six normal tissues is also shown for comparison. The genesshown in Fig. 2 were all strongly differentially expressed, with Pscores ranging from 2.5 3 1024 (MUC-1) to 6.5 3 1029 (TAC-STD1yGA733-2).

Validation of Tumor-Specific Gene Expression. We validated differ-ential expression of genes discovered by the ranking method intwo distinct ways. First, fragments of CD24, HE4, and LU wereamplified by RT-PCR from the RNAs of three epithelial-enriched normal tissues, nine randomly selected ovarian tumortissues, and the ovarian cancer cell line CAOV-3. The results ofthese experiments demonstrated overexpression of three genesin tumor tissues relative to normal ovarian epithelium (Fig. 3).Second, we queried the National Center for BiotechnologyInformation (NCBI) ‘‘Gene-to-tag’’ databases, available throughUniGene (http:yywww.ncbi.nlm.nih.govyUniGeney), for geneexpression patterns of these same three genes in tumor cells andtissues. LU and HE4 were highly expressed in 4y4 and 2y3primary ovarian tumors, respectively, as well as in the ovariancancer cell line OVCA432-2. CD24 was highly up-regulated inonly one of three ovarian tumors, but also highly expressed in twobreast lesions and in a primary medulloblastoma.

DiscussionTo identify candidate molecular markers of epithelial ovariancancers, we used oligonucleotide microarrays to monitor theexpression levels of .6,000 human genes in 27 tumor and fournormal tissue samples. Whereas microarrays have now becomean established tool with which to study gene expression patternsin human cancer (15), the process of carrying out array exper-iments one sample at a time is limiting. Here, we addressed thisissue by performing all of the sample hybridizations simulta-neously on a single glass wafer containing 49 individual humanHuGeneFL oligonucleotide arrays in a custom-built chamber.The physical process of tissue preparation, isolation of RNAs,

labeling, hybridization, washing, and reading the arrays wascompleted in under 1 wk. Correlation coefficients betweensamples hybridized in duplicate on the array-of-arrays and thosehybridized in duplicate on single arrays were better in manyinstances, most likely because all of the chips on the glass waferwere handled identically (1). This method minimized some of thevariation inherent in microarray experiments.

Hierarchical clustering of the samples and gene expressionlevels within the samples led to the unambiguous separation ofnormal and malignant tissues, as well as the identification ofthree subsets of ovarian tumor tissue samples. One group oftumors segregated with normal tissues and coordinately ex-pressed high levels of a large number of ribosomal genes.Overexpression of such genes in tumors has been previouslynoted (16) and may reflect a higher metabolic rate withinmalignant cells. However, rapidly growing tumor cell lines andthe most poorly differentiated tumors, which were groupedtogether, showed relative underexpression of these genes, de-spite their presumed high metabolic rates. It is of note that highexpression of ribosomal genes was correlated with tumors thatwere histologically well differentiated (i.e., tumors within theleftmost branch (blue) and some of those within the mid-branchof the dendrogram, e.g., samples 15 and 31 (black branch); Fig.1c). A larger study is warranted to examine whether highexpression of ribosomal proteins is correlated with less aggres-sive ovarian tumors.

Another group of 12 tumors (center columns in Fig. 1)exhibited expression patterns that are most likely explained bythe presence of non-epithelial cell types, particularly stroma andactivated B cells in the tissue examined, rather than to featuresof the malignant cells. This conclusion was based on the iden-tification of overlapping patterns of gene expression in several ofthese tumors, with gene expression observed in selectivelyenriched normal tissue. Selective tissue enrichment was moreeffective in yielding cell-specific gene profiles than activated Bcells or endothelial cells grown in culture. In part, this differenceis probably due to the fact that cells in culture are rapidlygrowing, and a large fraction of the expression data reflects thatprocess. Cell growth within normal tissues is considerably slower,and expression patterns are probably more indicative of the celltypes under study. It is also important to note that we havediscussed only a small fraction of the genes here that vary withinand among samples and that there are many other genes whoseexpression reflects the true complexity of these aggregate cancertissues. The use of single cell lines or strains (e.g., B cells) ascontrols to discern cell type-specific gene expression is, ofcourse, a gross simplification, but nonetheless provides a firstapproximation of that complexity and, by extension, allowed usto discern expression within malignant epithelial cells. Also,profiling of the cell lines was informative, because it resulted ina third partition of the tumor samples. The malignancies withinthis group coexpressed high levels of genes associated with cellcycle, and it is noteworthy that four of the seven tumors in thisgroup were poorly differentiated on histology. This coincidentexpression may reflect a higher S-phase fraction within themalignant cells, which has been correlated with generally moreaggressive tumor behavior (14).

The principal goal of this study was to exploit the potentialclinical usefulness of these data by identifying genes whose highexpression correlated with malignancy and which could beconsidered candidate molecular markers of ovarian cancer. Weused a combination of several parameters (see Materials andMethods and Results) to rank genes that were lowly expressed innormal tissues and highly expressed in cancer tissues, with arange of expression that was limited and nonoverlapping. Whenthis procedure was initially carried out with all of the 27 tumorsthat were profiled, the list of genes contained many that arecharacteristic of stromal tissue andyor infiltrating B cells. After

Fig. 3. Validation of ovarian tumor-specific gene expression. RT-PCR wasused to amplify fragments of CD24 (a), HE4 (b), and LU (c) from RNAs of thenormal, stromal-enriched fraction of sample 278 (lane 1), the three normalepithelial-enriched ovarian samples (lanes 2–4), nine randomly selected tu-mor tissues (lanes 5–13), the ovarian cancer cell line CAOV-3 (lane 14), andwater as a control (lane 15). 18s rRNA amplified from the same samples showssimilar loading in each lane (d).

1180 u www.pnas.org Welsh et al.

Dow

nloa

ded

by g

uest

on

July

28,

202

1

Page 6: Analysis of gene expression profiles in normal and neoplastic ...lished data). Three samples (tumor 1, MDAH 2774, and CA-OV-3) were hybridized in duplicate, and the data from the two

excluding 14 tumors based on high expression of genes withinnon-epithelial cells, the list of genes was highly ‘‘enriched’’ forknown or suspected makers of epithelial malignancy (Fig. 2).The most highly ranked gene, CD24, was first proposed as atherapeutic target for lung cancer (17, 18), and more recently asa diagnostic marker for breast cancer (19). This gene has beenreported to have a role in increasing the motility of breast cancercells (20). Another highly ranked gene, HE4, a secreted extra-cellular protease inhibitor, has been shown to be overexpressedin a significant fraction of ovarian cancers and lowly expressedin other tissues (7, 8). The product of this gene has beenproposed as a diagnostic marker of ovarian cancers (8). CD9,which was also found to be overexpressed in ovarian tumors, issimilarly overexpressed in carcinomas of the colon (21) and lung(22), and may participate in tumor invasion (23). Other highlyranked genes included the tumor-associated antigen GA733-2(TACSTD1), cytokeratins 7, 8, 18, and 19, and MUC-1 (Fig. 2).Several of these have been reported as highly expressed in avariety of epithelial cancers, including ovarian carcinoma. Theexpression of the highest-ranked genes was also assessed in apool of six normal human tissues to determine levels of expres-sion in tissues other than ovary (depicted by the green lines inFig. 2). The results suggest that the expression of many of thesegenes is low in normal tissue and confirms the restrictedexpression of genes such as HE4 (8). Exceptions included theubiquitin carrier protein 2 (UCP2), Pax8, and the epidermalgrowth factor receptor (ERBB3), which were expressed in one ormore of the pooled tissues at higher levels than in ovariantumors.

We found a tentative molecular distinction between lessaggressive, borderline tumors and rapidly dividing malignanciesidentified by differential expression of ribosomal and cell-cyclegenes. However, profiling a larger cohort of tumors with definedoutcomes is required to identify whether any of these genes orgene clusters can be used to determine tumor behavior overtime. We are also aware that changes in mRNA abundance maynot always correlate with changes in protein levels or activity,and we have not investigated whether the protein levels of CD24or HE4, for example, are high in ovarian tumors. However, manyof the genes identified as overexpressed in this study have beenused as immunohistochemical markers in other tumor types, andthus correlations between changes in transcript and proteinlevels may exist for other genes that we identified.

An important aspect of the work presented here is theidentification of multiple genes of interest, which was affordedby the highly parallel monitoring of gene expression levels in asufficiently large number of cancers. This enabled stringentcriteria to be imposed to identify ‘‘idealized’’ candidates. Theidentification of many candidates by this approach increases theprobability that we can reduce to practice the clinical utility ofone or more gene products in ovarian cancers. In general, webelieve that this rapid and systematic approach toward markeridentification will be useful in finding genes of interest frommany cancer types and that knowledge of the function of thegene products is likely to provide significant insight intoneoplasia.

We thank the Cooperative Human Tissue Network for tissue samples,Drs. Michael Cooke and Akira Kawamura for data and valuablediscussions, and Don Bambico for excellent technical assistance.

1. Gatta, G., Lasota, M. B. & Verdecchia, A. (1998) Eur. J. Cancer 34, 2218–2225.2. Chuaqui, R. F., Cole, K. A., Emmert-Buck, M. R. & Merino, M. J. (1998) Ann.

Diagn. Pathol. 2, 195–207.3. Niloff, J. M., Klug, T. L., Schaetzl, E., Zurawski, V. R., Jr., Knapp, R. C. & Bast,

R. C., Jr. (1984) Am. J. Obstet. Gynecol. 148, 1057–1058.4. Meyer, T. & Rustin, G. J. (2000) Br. J. Cancer 82, 1535–1538.5. Aunoble, B., Sanches, R., Didier, E. & Bignon, Y. J. (2000) Int. J. Oncol. 16,

567–576.6. Suzuki, S., Moore II, D. H., Ginzinger, D. G., Godfrey, T. E., Barclay, J.,

Powell, B., Pinkel, D., Zaloudek, C., Lu, K., Mills, G., et al. (2000) Cancer Res.60, 5382–5385.

7. Ono, K., Tanaka, T., Tsunoda, T., Kitahara, O., Kihara, C., Okamoto, A.,Ochiai, K., Takagi, T. & Nakamura, Y. (2000) Cancer Res. 60, 5007–5011.

8. Schummer, M., Ng, W. V., Bumgarner, R. E., Nelson, P. S., Schummer, B.,Bednarski, D. W., Hassell, L., Baldwin, R. L., Karlan, B. Y. & Hood, L. (1999)Gene 238, 375–385.

9. Wodicka, L., Dong, H., Mittmann, M., Ho, M. H. & Lockhart, D. J. (1997) Nat.Biotechnol. 15, 1359–1367.

10. Lipshutz, R. J., Fodor, S. P., Gingeras, T. R. & Lockhart, D. J. (1999) Nat.Genet. 21, 20–24.

11. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee,M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. & Brown, E. L.(1996) Nat. Biotechnol. 14, 1675–1680.

12. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998) Proc. Natl.Acad. Sci. USA 95, 14863–14868.

13. Schmitt, J. F., Susil, B. J. & Hearn, M. T. (1996) Growth Factors 13,19–35.

14. Bakshi, N., Rajwanshi, A., Patel, F. & Ganguly, N. K. (1998) Anal. Quant. Cytol.Histol. 20, 215–220.

15. Liotta, L. & Petricoin, E. (2000) Nat. Rev. Genet. 1, 48–56.16. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. &

Levine, A. J. (1999) Proc. Natl. Acad. Sci. USA 96, 6745–6750.17. Jackson, D., Waibel, R., Weber, E., Bell, J. & Stahel, R. A. (1992) Cancer Res.

52, 5264–5270.18. Olabiran, Y., Ledermann, J. A., Marston, N. J., Boxer, G. M., Hicks,

R., Souhami, R. L., Spiro, S. G. & Stahel, R. A. (1994) Br. J. Cancer 69,247–252.

19. Fogel, M., Friederichs, J., Zeller, Y., Husar, M., Smirnov, A., Roitman, L.,Altevogt, P. & Sthoeger, Z. M. (1999) Cancer Lett. 143, 87–94.

20. Aigner, S., Ramos, C. L., Hafezi-Moghadam, A., Lawrence, M. B., Friederichs,J., Altevogt, P. & Ley, K. (1998) FASEB J. 12, 1241–1251.

21. Cajot, J. F., Sordat, I., Silvestre, T. & Sordat, B. (1997) Cancer Res. 57,2593–2597.

22. Higashiyama, M., Doi, O., Kodama, K., Yokouchi, H., Adachi, M., Huang,C. L., Taki, T., Kasugai, T., Ishiguro, S., Nakamori, S., et al. (1997) Int. J. Cancer74, 205–211.

23. Hirano, T., Higuchi, T., Katsuragawa, H., Inoue, T., Kataoka, N., Park, K. R.,Ueda, M., Maeda, M., Fujiwara, H. & Fujii, S. (1999) Mol. Hum. Reprod. 5,168–174.

Welsh et al. PNAS u January 30, 2001 u vol. 98 u no. 3 u 1181

MED

ICA

LSC

IEN

CES

Dow

nloa

ded

by g

uest

on

July

28,

202

1