Upload
yumei-feng
View
216
Download
1
Embed Size (px)
Citation preview
PRECLINICAL STUDY
Evidence for a transcriptional signature of breast cancer
Yumei Feng Æ Xiaoqing Li Æ Baocun Sun Æ Yuli Wang ÆLina Zhang Æ Xiuhua Pan Æ Xiaohui Chen ÆXiaoyan Wang Æ Jinfeng Wang Æ Xishan Hao
Received: 5 April 2009 / Accepted: 6 August 2009 / Published online: 2 September 2009
� Springer Science+Business Media, LLC. 2009
Abstract Cancer arises from a step-wise accumulation of
genetic and epigenetic changes in oncogenes and tumor
suppressor genes, followed by changes in transcription and
protein profiles. To identify the intrinsic transcriptional
features of breast cancer and to explore in more detail the
molecular basis of breast carcinogenesis, genes differen-
tially expressed between cancers and their paired normal
breast samples in nine breast cancer patients were screened
using microarray. Nine normal breast tissues and 49 breast
cancer tissue samples were then clustered based on the set
of differentially expressed genes. A transcriptional signa-
ture of breast cancer consisting of 188 differentially
expressed genes was identified. This signature allowed the
normal breast tissues to be distinguished from all of the
breast cancer samples, and primary breast cancers could be
classified into two phenotype-associated subgroups with
different ER status and clinical outcome. Furthermore, the
classification accuracy of the set of differentially expressed
genes was validated in publically available breast micro-
array data. Moreover, the differentially expressed genes
could be grouped into five subclusters involved in different
biological processes of carcinogenesis. Most genes in a
given subcluster interacted within an independent subnet-
work, and subnetworks could cross-talk through a set of
signal molecules. Thus, the transcriptional signature iden-
tified here may be an intrinsic feature of breast cancer, and
it may constitute to the molecular basis of breast carcino-
genesis and different phenotypes of breast cancer.
Keywords Breast cancer � Gene expression profiling �Diagnosis
Introduction
Cancer arises from a step-wise accumulation of genetic and
epigenetic changes in oncogenes and tumor suppressor
genes, followed by changes in transcription and protein
profiles. Sporadic breast cancers show a wide variety of
genomic alterations, and no specific genetic mutation
profile has been reported in sporadic breast cancer. How-
ever, cancer cells seem to share a common set of molecular
pathways that may govern the genesis of most types of
human cancers [1]. Profiles of transcription and translation
have been shown specific changes among different kinds of
cancer as a result of sequential mutation and signal
amplification, and these differences can distinguish cancers
from normal tissues [2–4]. Moreover, the different gene
expression profiles are likely to reflect distinct tumor sub-
types involving different phenotypes and clinical features
Electronic supplementary material The online version of thisarticle (doi:10.1007/s10549-009-0505-z) contains supplementarymaterial, which is available to authorized users.
Y. Feng � J. Wang (&)
School of Pharmaceutical Science and Technology,
Tianjin University, 300072 Tianjin, China
e-mail: [email protected]
Y. Feng � X. Li � Y. Wang � L. Zhang � X. Pan � X. Chen �X. Wang
Department of Biochemistry and Molecular Biology, Tianjin
Medical University Cancer Institute and Hospital, Tianjin, China
Y. Feng � X. Li � B. Sun � X. Hao (&)
Breast Cancer Prevention and Treatment Key Laboratory
of the Ministry of Education, Tianjin Medical University Cancer
Institute and Hospital, 300060 Tianjin, China
e-mail: [email protected]
B. Sun
Department of Pathology, Tianjin Medical University Cancer
Institute and Hospital, Tianjin, China
123
Breast Cancer Res Treat (2010) 122:65–75
DOI 10.1007/s10549-009-0505-z
[5]. Changes in the expression level of cancer-related genes
occur much earlier than morphological changes, and the
expression changes lead to different degree of differentia-
tion of tissue cells. A characteristic expression profile may
help pathologists and oncologists diagnose cancer before
morphologic changes can be observed.
Transcriptional profiling involves differential mRNA
expression of genes related to different aspects of carcino-
genesis, such as cell growth and metabolism, cell differ-
entiation, signal transduction and transcription regulation,
cell adhesion and migration, and immune surveillance. The
high-throughput microarray provides a powerful tool for
detecting the expression of thousands of genes simulta-
neously and gaining insights into the underlying molecular
mechanism of carcinogenesis. New molecular tumor
markers can potentially be used for more accurate diagnosis
and drug targets for effective individualized therapy.
To explore the molecular basis of breast carcinogenesis,
we selected nine breast cancer patients, each with a tumor
with a diameter smaller than 5 cm and without lymph node
metastasis. The difference between the cancers and their
paired normal breast samples using a high-density micro-
array was analyzed, and the nine normal breast samples
and 49 breast cancer samples were clustered based on the
set of differentially expressed genes. Public microarry data
were used to validate the classification accuracy of the set
of differentially expressed genes. To confirm the genes
obtained by microarray, kallikrein 5 (KLK5), kallikrein 7
(KLK7), small inducible cytokine subfamily B (Cys-X-
Cys) member 10 (CXCL10), collagen type IX alpha 1
(COL11A1), matrix metalloproteinase 3 (MMP3), and os-
teoglycin (OGN) were detected by real-time RT-PCR
analysis to estimate expression differences among 30 nor-
mal breast samples, 30 benign tumor samples, and 30
invasive ductal carcinoma samples.
Materials and methods
Clinical samples
All samples of normal breast tissues, benign breast
tumors, and invasive breast cancers were collected from
patients who underwent complete dissection of the breast
and axillary lymph nodes (breast cancer patients) or local
tumorectomy (patients with benign breast disease) at
Tianjin Cancer Hospital, China, between January 2002
and June 2003. Forty-nine cases with a tumor with
diameter below 5 cm were selected for microarray anal-
ysis; this group consisted of 29 node-negative cases and
20 node-positive cases. In these 49 cases, primary cancers
and paired normal breast tissues were collected in nine
lymph node-negative cases; only primary cancer samples
were obtained from the other 40 cases. In addition, RNA
was extracted and pooled equally from normal breast
tissue taken from 32 patients with benign or malignant
breast disease as control RNA. Thirty breast cancer
patients (15 node-negative and 15 node-positive cases)
and 30 patients with a benign adenoma were selected for
real-time RT-PCR. Twelve cases were used in both
microarray and real-time RT-PCR analyses.
All of the breast cancers were confirmed as invasive
ductal carcinoma by hematoxylin–eosin (H&E) staining.
Normal breast tissues were defined as breast tissues lying
more than 5 cm away from the edge of tumors and were
confirmed normal by pathologic analysis. Tissue samples
were snap-frozen in liquid nitrogen and stored at -80�C.
ER and PR expression were determined as positive when
more than 15% of the nuclei showed staining by immu-
nohistochemical staining. Her2 was defined as positive
when more than 10% of the membrane showed staining by
immunohistochemical assay. The use of these tissues was
approved by the Institutional Review Board and the
Research Ethics Committee of Tianjin Medical University.
RNA extraction
RNA was extracted with TRIZOL reagent (Invitrogen,
Gaithersburg, MD, USA) and purified with the RNeasy
mini kit (Qiagen, Valencia, CA, USA) according to the
manufacturer’s instructions. RNA quality was assessed by
formaldehyde agarose gel electrophoresis and quantified
spectrophotometrically.
Preparation of fluorescent dye-labeled DNA
and hybridization
DNA labeled with fluorescent dye (cy5-dCTP and cy3-dCTP)
was produced by Eberwine’s linear RNA amplification
method [6] and subsequent enzymatic reaction as previously
described [7]. In detail, double-stranded (ds) cDNA contain-
ing the T7 RNA polymerase promoter sequence (50-AAACG
ACGGCCAGTGAATTGTAATACGACTCACTATAGGC
GC-30) was synthesized with 10 lg of total RNA using the
cDNA Synthesis System Kit according to the manufac-
turer’s protocol (TaKaRa, Dalian, China). A T7-OligodT
primer (50-AAACGACGGCCAGTGAATTGTAATACGA
CTCACTATAGGCGCTTTTTTTTTTTTTTTTTV-30) was
used. Half of the eluted dscDNA product was subject to an
in vitro transcription reaction using T7 RiboMAX Express
Large Scale RNA Production System (Promega, Madison,
WI, USA) [8]. The amplified RNA (aRNA) was purified
with the RNeasy Mini kit (Qiagen) and labeled with a ran-
dom primer labeling kit (TaKaRa) [9]. All samples were
labeled with Cy5, and the control, pooled from 32 normal
breast tissues, was labeled with Cy3. The labeled DNA was
66 Breast Cancer Res Treat (2010) 122:65–75
123
purified, resuspended in Elution Buffer, and quantified.
Labeled control and test samples were quantitatively
adjusted based on the efficiency of Cy-dye incorporation and
mixed with 30 ll of hybridization solution (39 SSC, 0.2%
SDS, 25% formamide, and 59 Denhart’s). DNA in the
hybridization solution was denatured at 95�C for 3 min prior
to loading on a microarray. The human long oligonucleotide
microarray was constructed by CapitalBio Corporation
(Beijing, China). The microarray consists of 50-amino-
modified 70-mer probes representing 21,329 well-charac-
terized human genes purchased from the Operon Company
(www.operon.com). The array was hybridized at 42�C
overnight and washed consecutively with two consecutive
washing solutions: 0.2% SDS, 29 SSC at 42�C for 5 min,
followed by 0.2% SSC for 5 min at room temperature.
Imaging and data analysis
Arrays were scanned with a ScanArray Express Scanner
(Packard Bioscience, Kanata, OT, USA), and the images
obtained were analyzed with GenePix Pro 4.0 (Axon
Instruments, Foster City, CA, USA). Normalization was
performed using the LOWESS program [10]. Genes with
Cy3 and/or Cy5 intensity values higher than 100 were
considered to be expressed genes. The Cy5 intensity value
of each gene was divided by the Cy3 intensity value (the
control), in order to determine a relative expression level
for the gene in the sample tissues relative to the control.
Definition of differentially expressed genes
and classification of samples
A gene was considered to be significantly differentially
expressed (over-expressed or under-expressed), if the ratio
of the expression level in the cancer sample to the expression
level in normal breast tissues was than 4.0-fold (or lower
than 0.25) in more than six of nine cases showing the same
trend in expression (up-regulation or down-regulation). In
addition, a paired-samples two-sided t-test was performed
for each of these genes. Genes with no significant P value
(P [ 0.05) were excluded. All of the differentially expres-
sed genes were rank-ordered on the basis of the P value of
paired t-test between breast cancers and their paired normal
breast samples. Then, the differentially expressed genes was
optimized by sequential backward selection from the bottom
of this rank-ordered list and evaluating its power for correct
classification using leave-one-out cross validation. The
optimal differentially expressed gene set was reached until
the minimal error rate. Leave-one-out cross validation pro-
cedure was carried out using GeneCluster 2.0 software
(http://www.broad.mit.edu/cancer/software/genecluster2/
gc2.html). The functions of these differentially expressed
genes were retrieved using GoMiner software (http://dis
cover.nci.nih.gov/gominer/) [11]. The interaction between
molecules was retrieved using the String 8.0 (http://string.
embl.de/). Cluster 3.0 and Treeview software (Stanford
University) were used to carry out average linkage cluster-
ing. Samples were clustered based on the genes differen-
tially expressed between breast cancers and normal breast
tissues.
Validation of classification potential of differentially
expressed genes in public microarray data sets
Turashvili’s data [12], consisting of experiment data of
laser capture microdissected 10 normal breast ductal sam-
ples, 10 breast normal lobular samples, 5 breast ductal, and
5 breast lobular tumor samples onto Affymetrix Human
Genome U133 Plus 2.0 Arrays with more than 50,000
genes, was retrieve in NCBI Gene Expression Omnibus
(GEO, http://www.ncbi.nlm.nih.gov/geo/) with a GEO
Series accession number GSE5764. Sørlie’s primary data
[13] included 4 normal breast samples, 3 fibroadenomas
samples, and 78 breast carcinomas samples on six different
batches of microarray with 8,000–23,000 genes, and was
accessible in the Stanford Microarray Database (http://
genome-www.stanford.edu/breast_cancer/mopo_clinical/).
To avoid the classification error resulted by different
microarray batches, only 44 samples using svcc bacth
microarray and CRA as common reference samples,
including 3 normal breast samples, 1 fibroadenomas sam-
ple, and 40 breast cancer samples, were chosen to validate
the classification accuracy of the differentially expressed
gene set. van de Vijver’s data [14] on Hu25 K microarray
consisting of 25,000 genes and clinical factors of 295
breast cancer samples were gained on the website of
Rosetta Inpharmatics LLC (http://www.rii.com/publica
tions/2002/nejm.html). Samples in these three different
data sets were clustered, respectively, based on the differ-
entially expressed genes and genes related to different
biological processes identified by our experiments.
Confirmation of microarray data using
real-time RT-PCR
mRNA expression levels of KLK5, KLK7, CXCL10,
COL11A1, MMP3, OGN in 30 normal breast tissue samples,
30 benign tumor samples, and 30 invasive ductal carcinoma
samples were detected by real-time RT-PCR method.
Reverse transcription was carried out after denaturation at
65�C for 5 min, followed by incubation on ice for 5 min and
at 42�C for 50 min in order to synthesize cDNA. Real-time
RT-PCR analysis was performed using the Platinum�
Quantitative PCR System (Invitrogen) according to the
manufacturer’s instructions. Assays were performed with
Breast Cancer Res Treat (2010) 122:65–75 67
123
the ABI 7500 TaqMan system (Applied Biosystems, Foster
City, CA, USA). We quantified the transcripts of the
housekeeping gene glyceraldehyde 3-phosphate dehydro-
genase (GAPDH) as control, as previously described [7].
Primers and Taqman probes shown in Table 1 for KLK5,
KLK7, CXCL10, COL11A1, MMP3, and OGN were
designed using Oligo 6.0 software (Molecular Biology
Insights, West Cascade, USA). PCR was carried out after
incubation at 50�C for 2 min and pre-denaturation at 95�C
for 3 min, followed by 45 cycles at 95�C for 30 s and 62�C
for 1 min. Quantification of target gene expression in
samples was accomplished by measuring the fractional
cycle number at which the amount of expression reached a
fixed threshold (CT). The relative quantification was given
by the CT values, which were determined in triplicate
reactions with both the experimental and the GAPDH ref-
erence test and reference samples. Triplicate CT values were
averaged and the GAPDH CT was subtracted from the test
sample to obtain DCT. Relative expression level of the
target gene was determined as 2�DCT .
Statistical analysis
Chi-square test or the Fisher’s exact test was used to ana-
lyze the relationship between molecular classification of
breast tissues based on the set of differentially expressed
genes and ER status or relapse/metatasis status. The paired
t test was used to analyze differences in mRNA expression
between primary breast cancers and paired normal breast
tissue. One-way analysis of variance (ANOVA) was used
to compare mRNA expression among normal breast tissue
samples, benign tumor samples, and invasive ductal car-
cinoma samples. The correlation coefficients between
interacted genes were calculated with Pearson correlation
analysis. Survival analysis was carried out according to the
methods of Kaplan and Meier. All calculations were
Table 1 Primer and TaqMan
probe sequences used in real-
time RT-PCR
Genes Primers and probe sequences Product size (bp)
KLK5
Upper 50-GCAAGACCCCCCTGGATGTG-30 127
Lower 50-TCCCAGAGGGCACGGTGTTA-30
Probe 50(FAM)-GTTGGCGAGAACATGCTCTGTGACCC-(TAMRA)30
KLK7
Upper 50-AGGCGTCCTGGTCAATGAG-30 138
Lower 50-GGGTGGCGGAATGACTT-30
Probe 50(FAM)-CCACTGCAAGATGAATGAGTACACCG-(TAMRA)30
CXCL10
Upper 50-CTTTCTGACTCTAAGTGGCATTC-30 176
Lower 50-CACCCTTCTTTTTCATTGTAGCAA-3
Probe 50(FAM)-ACAGCGTACGGTTCTAGAGAGAGGT-(TAMRA)30
COL11A1
Upper 50-TCGCATTGACCTTCCTCTTC-30 113
Lower 50-TCCCGTTGTTTTTGATATTC-3
Probe 50(FAM)-CAGAGGAGCTGCTCCAGTTGATGT-(TAMRA)30
MMP3
Upper 50-TGCCCACTTTGATGATGATG-30 122
Lower 50-GTTGGCTGAGTGAAAGAGACC-30
Probe 50(FAM)-GACAAAGGATACAACAGGGACCAAT-(TAMRA)30
OGN
Upper 50-ACACCATTACCTCCCAAGAAAG-30 111
Lower 50-GGGTGGTACAGCATCAATGTCAA-30
Probe 50(FAM)-AGCAGACACGTGGGCATTTCATCAT-(TAMRA)30
GAPDH
Upper 50-GAAGGTGAAGGTCGGAGTC-30 226
Lower 50-GAAGATGGTGATGGGATTTC-30
Probe 50(FAM)-CAAGCTTCCCGTTCTCAGCC-(TAMRA)30
68 Breast Cancer Res Treat (2010) 122:65–75
123
performed using the SPSS for Windows statistical software
package (SPSS Inc, Chicago, IL, USA).
Results
Differentially expressed genes and molecular
classification of breast cancer
About 188 differentially expressed genes (Supplementary
Table 1) between breast cancers and paired normal breast
tissues were identified, consisting of 128 genes down-reg-
ulated and 60 up-regulated. These differentially expressed
genes were enriched in chromosomal regions 1p21-36 (nine
genes, 4.76%), 1q21-32 (11 genes, 5.82%), 2p12-25 (seven
genes, 3.70%), 2q12-36 (10 genes, 5.29%), 3q12-28 (eight
genes, 4.23%), 4q11-26 (six genes, 3.17%), 5q22-35 (seven
genes, 3.70%), 7q21-31 (five genes, 2.65%), 8q21-24
(seven genes, 3.70%), 9q21-22 (five genes, 2.65%), 11p11-
15 (seven genes, 3.70%), 11q11-23 (14 genes, 7.41%),
12q12-24 (six genes, 3.17%), 13q11-34 (five genes, 2.65%),
17q11-23 (five genes, 2.65%), 19q12-13 (six genes, 3.17%),
and X (10 genes, 5.29%). The functions of these genes are
related to the cell cycle, apoptosis, signal transduction and
transcriptional regulation, cell adhesion, the cytoskeleton,
and the extracellular matrix, all of which are involved in the
biological process of breast carcinogenesis.
Based on this transcriptional signature of the 188 differ-
entially expressed genes, eight of nine normal breast tissues
were distinguishable from all of the breast cancers, and most
of the 49 primary breast cancers could be classified into two
different biological groups by average clustering: ‘ER-neg-
ative group’ (‘Basal-like’ group) and ‘ER-positive group’
(‘Luminal-like’ group). There were more ER-negative cases
in ‘ER-negative group’ (9/13) than in ‘ER-positive group’
(6/34), and this difference was statistically significant
(P = 0.002) (Fig. 1). Furthermore, ‘ER-positive group’
were clustered into two subgroups. Patients in ‘Subgroup II’
had better clinical outcome than patients in ‘Subgroup I’ and
‘ER-negative group’, though their difference did not show
statistic significance (P = 0.127; Fig. 5a).
Besides the classification of breast cancer samples, the
188 differentially expressed genes could be clustered
(Fig. 1). Most of the genes in ‘Cluster A’ are involved in
signal transduction and transcription regulation; genes
in ‘Cluster B’ are components of the cytoskeleton; genes in
‘Cluster C’ are related to cell adhesion and migration;
genes in ‘Cluster D’ are concerned with the cell cycle; and
most of the genes in ‘Cluster E’ are involved in signal
transduction and immune response. However, when these
gene clusters related to different biological processes were
used to classify breast samples, none of them could group
the samples successfully.
Interactions among the differentially expressed genes
All of the 188 differentially expressed genes were input
into a protein interaction database to find any interactions
among the proteins encoded by these genes. Forty-four
genes were shown to interact with at least one another gene
based on the results of active prediction methods, including
neighborhood, gene fusion, co-occurrence, databases,
homology, text mining, and experiments (Fig. 2). Most of
these genes were grouped in four subnetworks functioning
in different biological processes (Fig. 1): ‘Subnetwork A’
consisted of genes in ‘Cluster A’, with functions related to
signal transduction and transcriptional regulation; ‘Sub-
network C’ was enriched with adhesion molecules in
‘Cluster C’; ‘Subnetwork D’ consisted of genes involved in
cell cycle and proliferation in ‘Cluster D’; and ‘Subnet-
work E’ consisted of genes concerned with signal trans-
duction and immune response in ‘Cluster E’. More than
80% interacted genes were shown correlated expression
with correlation coefficients higher than 0.3 (Fig. 2), and
most of these correlated expression were also observed in
van de Vijver’s data (data not shown).
Validation of classification potential of differentially
expressed genes in public microarray data sets
The mRNA expression levels of 183 genes in 188 differ-
entially expressed genes were detected in Turashvili’s
microarray experiments. A median-centered average clus-
tering on the basis of the expression profiling of the 183
genes could classify most of tumors (9/10) and normal
breast samples (14/20) correctly, regardless of ductal or
lobular tissues (Fig. 3a and Supplementary Fig. 1), and this
difference was statistically significant (P = 0.005).
Only 79 genes of 188 differentially expressed genes were
included in the data of Sørlie et al. In these 79 genes, 19
genes were overlapped with the 456 intrinsic genes identi-
fied by Sørlie et al., including myxovirus resistance 1 (MX1),
keratin 5 (KRT5), keratin 17 (KRT17), fibromodulin
(FMOD), signal transducer and activator of transcription 1
(STAT1), estrogen receptor 1 (ESR1), dual specificity
phosphatase 6 (DUSP6), bullous pemphigoid antigen 1
(BPAG1/DST), fatty acid binding protein 4 (FABP4), lipo-
protein lipase (LPL), small inducible cytokine subfamily D
(Cys-X3-Cys), member 1 (SCYD1/CX3CL1), immuno-
globulin J polypeptide (IGJ), cysteine dioxygenase, type I
(CDO1), collagen type XI alpha 1(COL11A1), small
inducible cytokine subfamily A (Cys-Cys), member 18
(SCYA18/CCL18), apolipoprotein D (APOD), fibronectin 1
(FN1), interferon-stimulated protein (ISG15/G1P2), sema
domain, immunoglobulin domain (Ig), short basic domain,
secreted, (semaphorin) 3C (SEMA3C), and endothelin
receptor type B (EDNRB). When 44 breast samples using
Breast Cancer Res Treat (2010) 122:65–75 69
123
svcc bacth microarray and CRA as common reference were
clustered based on these 79 genes, all of the three normal
breast tissues and one fibroadenomas sample could be rec-
ognized from the other 40 breast cancers (Fig. 3b and Sup-
plementary Fig. 2). However, breast cancers could not be
classified into subgroups related to different ER status and
outcome due to insufficient genes (only 42%) used to cluster
samples and fewer ER-negative cases.
Furthermore, 142 genes in the set of differentially
expressed genes were available in van de Vijver’s data.
MetER
Normal
GB.accession NAME Chr NetAF007153 2q33.3-q34NM 000125 ESR1 6q25.1X51730 PGR 11q22-q23NM 000353 TAT 16q22.1NM 001756 SERPINA6 14q32.1AF136408 C6ORF4 6q21AK021972 14AL390170 9AF073310 IRS2 13q34AK057678 4p15.32BC015907 11NM 002023 FMOD 1q32AL157455 15AJ000098 EYA1 8q13.3NM 024748 FLJ11539 4NM 004795 KL 13q12NM 000555 DCX Xq22.3-q23AL390150 10NM 006379 SEMA3C 7q21-q31NM 001159 AOX1 2q33AB032953 ODZ2 5q34-q35.1AB037730 BKLHD2 Xq23-q24NM 004570 PIK3C2G 12p12AF115402 ELF5 11p13-p12AK024491 SOX8 16p13.3AK054858 FLJ30296 Xp22.11NM 002639 SERPINB5 18q21.3NM 006533 MIA 19q13.32-q13.33NM 005046 KLK7 19q13.33NM 012427 KLK5 19q13.3-q13.4
NM 007289 MME 3q25.1-q25.2NM 004673 ANGPTL1 1q25.2U50748 LEPR 1p31AK025953 3q21AK057333 FLJ32771 11q12.2NM 053025 MYLK 3q21NM 000115 EDNRB 13q22NM 000916 OXTR 3p25AK027841 DKFZP586H211p13NM 001723 BPAG1 6p12.1NM 032411 ECRG4 2q12.2NM 002380 MATN2 8q22NM 022844 MYH11 16p13.11NM 001946 DUSP6 12q22-q23AB002351 DMN 15q26.3NM 018658 KCNJ16 17q23.1-q24.2NM 002923 RGS2 1q31NM 000222 KIT 4q11-q12NM 002275 KRT15 17q21.2NM 003012 SFRP1 8p12-p11.1NM 032321 MGC13057 2q32.2NM 002089 CXCL2 4q21NM 002996 CX3CL1 16q13NM 000596 IGFBP1 7p13-p12
M60502 FLG 1q21.3AF041210 MID1 Xp22AK022269 CLDN8 21q22.11NM 006307 SRPX Xp21.1M54927 PLP1 Xq22NM 001647 APOD 3q26.2-qterNM 000422 KRT17 17q12-q21NM 000424 KRT5 12q12-q13NM 001615 ACTG2 2p13.1AJ420458 NTRK2 9q22.1NM 003243 TGFBR3 1p33-p32AK026320 PIGR 1q31-q41BC012513 ARHE 2q23.3NM 003919 SGCE 7q21-q22
NM 001850 COL8A1 3q12.3BC014245 CTHRC1 8q22.3AB029000 SULF1 8q13.2-q13.3NM 000493 COL10A1 6q21-q22NM 000089 COL1A2 7q22.1M86849 GJB2 13q11-q12NM 001854 COL11A1 1p21NM 002026 FN1 2q34NM 002421 MMP1 11q22.3NM 005940 MMP11 22q11.2|22q11.23
NM 001238 CCNE1 19q12NM 001827 CKS2 9q22AK001379 ASPM 1q31NM 031423 CDCA1 18p11.32NM 001237 CCNA2 4q25-q31BC015050 OIP5 15q15.1NM 004701 CCNB2 15q22.2NM 001034 RRM2 2p25-p24NM 014176 HSPC150 1q32.1NM 004217 AURKB 17p13.1NM 018101 CDCA8 1p34.3NM 057749 CCNE2 8q22.1NM 020675 AD024 2q24.3NM 002422 MMP3 11q22.3AK027294 8q24.1-q24.3NM 005402 RALA 7p15-p13NM 004336 BUB1 2q14NM 018685 ANLN 7p15-p14NM 022346 HCAP-G 4p15.33
AF026941 CIG5 2p25.2BC016969 LOC129607 2p25.2NM 002462 MX1 21q22.3NM 016816 OAS1 12q24.1NM 005101 G1P2 1p36.33AK055278 13q12.13NM 001565 CXCL10 4q21NM 002416 CXCL9 4q21NM 007315 STAT1 2q32.2NM 012252 TFEC 7q31.2NM 014479 ADAMDEC18p21.2NM 020125 BLAME 1q23.2NM 014398 LAMP3 3q26.3-q27NM 012162 FBXL6 8q24.3NM 005980 S100P 4p16NM 000239 LYZ 12q15NM 004931 CD8B1 2p12NM 001814 CTSC 11q14.1-q14.3NM 002164 INDO 8p12-p11NM 000433 NCF2 1q25NM 002288 LAIR2 19q13.4NM 002664 PLEK 2p14NM 004131 GZMB 14q11.2NM 000732 CD3D 11q23NM 006398 UBD 6p21.3NM 021950 MS4A1 11q12NM 032966 BLR1 11q23.3NM 002988 CCL18 17q11.2NM 018196 TMLHE Xq28
ER-Neg ER-Pos subgroup I
ER-Pos subgroup II
A
Signal
Transduction
B
Cytoskeleton
C
Cell Adhesion
D
Cell Cycle
ESignal
Transductionand Immune
Response
Fig. 1 Hierarchical clustering
of 49 breast cancer samples and
nine normal breast tissue
samples based on a set of 188
differentially expressed genes.
Each column represents an
experimental sample, and each
row a single gene. The detailed
ER status and relapse/metastasis
status are shown on the
dendrogram. Black barsrepresent negative ER and
positive relapse/metastasis.
White bars represent positive
ER and disease-free survival.
Gray bars represent missing
data. Cluster A, highlighted in
red, contains genes involved in
signal transduction and
transcriptional regulation.
Genes in Cluster B, highlighted
in purple, are components of the
cytoskeleton. Cluster C in blueincludes genes related to cell
adhesion and migration. Genes
in Cluster D, highlighted in
pink, are involved in cell cycle
and proliferation. Cluster E in
green contains genes related to
signal transduction and immune
response. Colored bars in the
‘Net’ column signify genes
involved in the homonymic
subnetworks in Fig. 3
70 Breast Cancer Res Treat (2010) 122:65–75
123
Based on the transcriptional signature of the 142 genes, all
of the 295 cases were clustered into two groups (Fig. 4):
ESR1-negative group and ESR1-positive group. ESR1
mRNA was low expressed in 100% (43/43) samples of
ESR1-negative group, but high expressed in 89.68% (226/
252) ESR1-positive group (P = 0.000). Moreover, cases in
one of subgroup of the ESR1-positive group (subgroup II)
had better clinical outcome than the other subgroup of the
ESR1-positive group (subgroup I) and ESR1-negative
group (Figs. 4, 5b).
Validation of differentially expressed genes
using real-time RT-PCR
To confirm the results obtained by microarray, kallikrein 5
(KLK5), kallikrein 7 (KLK7), small inducible cytokine sub-
family B (Cys-X-Cys) member 10 (CXCL10), collagen XI,
alpha 1 (COL11A1), matrix metalloproteinase 3 (MMP3),
and osteoglycin (OGN) were detected by real-time RT-PCR
analysis to estimate expression differences among 30 benign
breast disease samples, 30 breast cancer samples, and paired
normal breast samples. GAPDH mRNA, as the control gene,
was shown stable expression across all breast samples with
CT values from 19 to 21. Consistent with the microarray data,
all six of these genes were differentially expressed, with
changes higher than twofold in at least half of the 30 breast
cancers when compared with their paired normal breast tis-
sues, and paired t-test showed statistic difference (P \ 0.05).
Figure 6 shows the mRNA expression levels of KLK5,
KLK7, CXCL10, COL11A1, MMP3, and OGN in all of the
samples. The mRNA levels of KLK5, KLK7, and CXCL10 in
benign breast disease tissues did not differ from their levels
in normal breast tissues, but they were present at higher
(KLK5 and KLK7) or lower (CXCL10) levels in breast can-
cers. Levels of COL11A1 mRNA were higher in breast
cancers than in benign breast disease tissues, and they were
E
D
A
C
AD024
CDCA1
CDCA8
BKLHD2AURKB
ANLN
CCNE2CCNB2
BUB1CKS2
CCNE1
CCNA2
TAT
SERPINA6TGFBR3
PGR
OXTR
FMOD
ESR1MID1
RALA
MS4A1TCN1MMP1
COL1A2
FN1MIA
MATN2
COL4A6 MMP3
COL11A1
CD3D
CD8B1
APOD
MME
IGFBP1
EDNRB
KIT
IRS2 LEPR
PLEK
HSPC150
BLAME
STAT1
ALX4
DUSP6FLG
UBD
HOXB1G1P2
OAS1
CXCL10BLR1
CCL18LAMP3
CX3CL1
CTSC
GZMB
CXCL9
MX1DUSP1
INDO
0.51
0.48
0.600.20
-0.31
-0.44
0.66
0.50
0.54
0.41
0.50
0.71
0.63
0.77
-0.2
4
0.28
0.26
0.25
0.35
0.25
0.08-0.53
-0.12
-0.03
-0.36-0.19
0.59
-0.0
7
0.77
-0.48
0.39
-0.65-0.5
7
0.58
0.51
0.15
0.58
0.23
0.37
0.630.67
0.28
0.45
-0.60-0.66-0.55
-0.52
-0.340.16
-0.50
-0.520.40
0.49
-0.380.71
0.70
-0.35
0.37
0.48
0.45
-0.220.22
-0.43
0.47
0.58
0.51
0.58
0.75
0.46
-0.30
-0. 32
-0.2
8
0.46
0.650.77
0.81
0.76 0.65
0.69
0.870.750.75
0.77
0.80 0.83
0.61
0.82
CIG5
Fig. 2 The interaction network of genes differentially expressed between breast cancer tissue and normal tissue. Correlation coefficients
between mRNA expression levels of two interacted genes were shown on their link lines
Breast Cancer Res Treat (2010) 122:65–75 71
123
higher in benign breast disease tissues than in normal breast
tissues. However, the mRNA levels of MMP3 and OGN in
benign breast disease tissues were higher than in both normal
breast tissues and breast cancers.
Discussion
In this study, the transcriptional profile of breast cancer
tissue samples was found to differ from normal breast
samples based on the set of 188 differentially expressed
genes; moreover, the profile always differed from the
control in the same way, regardless of the characteristics of
breast cancer sample. This suggests that this transcriptional
profile is a transcriptional signature specific to breast can-
cer. Only one pathologically ‘‘normal’’ sample had a
transcriptional signature that was similar to that of breast
cancer, leading our classification algorithm to group it
erroneously with the cancer samples. It may be that this
‘‘normal’’ sample is in fact cancerous tissue that has
undergone transcriptional changes before the breast tissue
showed any morphological changes. Thus, this transcrip-
tional signature could be used to the molecular diagnosis of
surgery margin, so it could alert clinical oncologists ahead
of any morphological symptoms, and prevent them from
applying breast-conserving surgery to these patients, since
they would have a high risk of recurrence following such a
procedure.
Microarray analysis has been used to screen for phe-
notype-associated gene signatures. Several research groups
have explored the relationship between gene expression
profiles and the phenotype of breast cancer. The research
groups of Perou [2] and Sørlie [13] have classified
breast cancers into four groups: ‘normal-like subgroup’,
‘basal-like subgroup’, ‘luminal-like subgroup’, and
‘Her2 ? subgroup’. ‘Luminal-like’ tumors are ER-positive
and have a similar keratin (KRT) expression profile to that
of epithelial cells lining the lumen of the breast ducts [5,
13, 15]. ER-negative tumors can be divided into two main
subtypes: the ‘Her2 ? subgroup’, with amplified Her2
DNA and ‘basal-like’ tumors which have an expression
profile similar to that of basal epithelium and which
express KRT5, KRT6B, KRT14, and KRT17. Several other
studies have identified characteristic signatures in breast
cancers with different lymph node status [16], nuclear
grade [17], and clinical outcome [18, 19]. In this study, the
set of 188 differentially expressed genes could classify
most of the breast cancers into two different biological
groups by average clustering: ‘ER-negative group’ and
‘ER-positive group’. ‘ER-positive group’ were clustered
into two subgroups. Patients in ‘Subgroup II’ had better
clinical outcome than patients in ‘Subgroup I’ and ‘ER-
negative group’. Both estrogen receptor 1 (ESR1) and
progesterone receptor (PGR) were included in these 188
differentially expressed genes. Their mRNA expression
levels, in accordance with the ER and PR status detected by
immunohistochemical staining, differed between sub-
groups. Thus, breast cancer samples had common tran-
scriptional signature comparing to normal breast tissues,
however, this transcriptional signature were viable in dif-
ferent breast tumors which decided the different phenotype
of breast cancer.
Changes in the expression profile of cancerous tissue
relative to normal tissue may be the result of genetic and
epigenetic alteration, followed by changes in signal trans-
duction. Nearly, 70% of the differentially expressed genes
identified in this study were enriched in chromosomal
regions 1p21-36, 1q21-32, 2p12-25, 2q12-36, 3q12-28,
4q11-26, 5q22-35, 7q21-31, 8q21-24, 9q21-22, 11p11-15,
11q11-23, 12q12-24, 13q11-34, 17q11-23, 19q12-13, and
X. Therefore, the genetic changes in these chromosomal
regions, including deletion and amplification, may play
important roles in the process of carcinogenesis. Though
genetic changes may be the initiating event in carcino-
genesis, generating a genetic mutation profile can uncover
alterations in only some genes. However, such an approach
cannot identify the downstream genes that are regulated by
these mutated genes. Expression profiling, on the other
hand, can reflect changes in the expression of both the
Fig. 3 a Classification of Turashvili’s 30 microarray data based on
183 differentially expressed genes. b Classification of Sørlie’s 44 svcc
bacth microarray data in the basis of the only 79 differentially genes
detected in their microarray experiments. Black bars represent
negative ER and positive relapse/metastasis. White bars represent
positive ER, and disease-free survival
72 Breast Cancer Res Treat (2010) 122:65–75
123
initially mutated genes and genes they subsequently regu-
late during breast carcinogenesis.
The changes in the transcriptional profile identified here
involve many of the genes related to different biological
processes in carcinogenesis, such as cell growth and
metabolism, cell differentiation, signal transduction and
transcriptional regulation, cell adhesion and migration, and
immune surveillance. A biological process occurs through
the coordinated regulation of the expression of a set of
function-related genes, and a process interacts with other
processes through signal crosstalk. In this study, most of
the differentially expressed genes were grouped into four
subnetworks. Genes in ‘Subnetwork A’, whose functions
are related to signal transduction, play a central role in the
network by connecting the other three subnetworks; the
genes in these subnetworks are involved in cell adhesion
and migration (‘Subnetwork C’), cell cycle and prolifera-
tion (‘Subnetwork D’), and immune response (‘Subnetwork
ES
R1
Met
ESR1-Negative
ESR1-Positive
Subgroup I
ESR1-Positive
Subgroup I
ESR1-Positive
Subgroup II
Fig. 4 Classification of van de Vijver’s 295 microarray data based on
142 differentially expressed genes which were detected in their
microarrays. Black bars represent negative ESR1 and positive relapse/
metastasis (Met). White bars represent positive ESR1 and disease-free
survival
Breast Cancer Res Treat (2010) 122:65–75 73
123
E’). Furthermore, hierarchical clustering tends to group
genes in one subnetwork into their own cluster, indicating
that their expression is coordinately regulated and their
functions in carcinogenesis are linked.
The six genes KLK5, KLK7, CXCL10, COL11A1,
MMP3, and OGN were selected for real-time RT-PCR, and
in this way, microarray analysis proved to be a reliable
method for identifying the characteristic gene expression
signature of breast cancer. However, these differentially
expressed genes must have changed in different ways
during the step-wise process of carcinogenesis. Some
genes, such as MMP3 and OGN, which this study found to
be expressed at higher levels in benign breast tissues than
in normal or cancerous tissues, may change during an early
stage of carcinogenesis and serve as trigger events. Genes
such as KLK5, KLK7, and CXCL10 were expressed to
similar extents in benign and normal tissues, but their
expression level changed in cancer; thus, these genes may
play a role during a late stage of carcinogenesis. The
expression level of still other genes, such as COL11A1, was
found to increase in the order normal \ benign \ cancer
tissue, and such genes may play a role throughout the entire
process of breast carcinogenesis.
In conclusion, this study identified a transcriptional
signature consisting of 188 genes differentially expressed
between breast cancers and their paired normal breast tis-
sues. Based on this set of differentially expressed genes,
normal breast tissues could be distinguished from all of the
breast cancers, and most of the primary breast cancers
could be classified into different phenotype-associated
subgroups. This signature may be a specific feature of
breast cancer, and it may represent the molecular basis of
breast carcinogenesis and its distinct phenotype.
Acknowledgments This research was supported by the Tianjin
Major Program of Science and Technology (013182311), the National
High-Tech Research Development Plan of China (2002AA2Z2011),
the Program for Changjiang Scholars and Innovative Research Team
in University (URT0743), the Applied Basic Research Programs of
Science and Technology Commission Foundation of Tianjin
(06YFJMJC1290) and a donation from TaiJi Co., China.
References
1. Hahn WC, Weinberg RA (2002) Rules for making human tumor
cells. N Engl J Med 347:1593–1603
TIME (years)20181614121086420
1.1
1.0
.9
.8
.7
.6
.5
.4
.3
TIME (months)
706050403020100
1.1
1.0
.9
.8
.7
.6
.5
.4
.3
ER-negative (n=14)
ER-positive-I (n=7)
ER-positive-II (n=28)
ER-positive-I (n=28)
ER-positive-II (n=206)
ER-negative (n=43)
P=0.003P=0.127a b
Cum
Sur
viva
l
Cum
Sur
viva
l
Fig. 5 Result of Kaplan and Meier
survival analysis. Patients in ER-
positive subgroup I have better
outcome than ER-positive subgroup
II and ER-negative group. a Survival
analysis result of our 49 breast cancer
samples classified by the set of 188
differentially expressed genes. bSurvival analysis result of van de
Vijver’s 295 breast cancers classified
by 142 differentially expressed genes
MMP3 OGNKLK5 KLK7 CXCL10 COL11A1
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
1.00E-01
1.00E+00
mR
NA
exp
ress
ion
leve
lss
N B C N B C N B C N B C N B C N B C
Fig. 6 The mRNA expression levels of KLK5, KLK7, CXCL10,
COL11A1, MMP3, and OGN in normal breast tissue samples (N),
benign breast samples (B), and breast cancer samples (C) using real-
time RT-PCR method. The black bars represent the median
expression levels of these six genes in each sample group
74 Breast Cancer Res Treat (2010) 122:65–75
123
2. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees
CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O,
Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borre-
sen-Dale AL, Brown PO, Botstein D (2000) Molecular portraits
of human breast tumours. Nature 406:747–752
3. Nindl I, Dang C, Forschner T, Kuban RJ, Meyer T, Sterry W,
Stockfleth E (2006) Identification of differentially expressed
genes in cutaneous squamous cell carcinoma by microarray
expression profiling. Mol Cancer 5:30
4. Lee EJ, Kong G, Lee SH, Rho SB, Park CS, Kim BG, Bae DS,
Kavanagh JJ, Lee JH (2005) Profiling of differentially expressed
genes in human uterine leiomyomas. Int J Gynecol Cancer
15:146–154
5. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A,
Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM,
Lonning PE, Brown PO, Borresen-Dale AL, Botstein D (2003)
Repeated observation of breast tumor subtypes in independent
gene expression data sets. Proc Natl Acad Sci USA 100:8418–
8423
6. Guo Y, Guo H, Zhang L, Xie H, Zhao X, Wang F, Li Z, Wang Y,
Ma S, Tao J, Wang W, Zhou Y, Yang W, Cheng J (2005)
Genomic analysis of anti-hepatitis B virus (HBV) activity by
small interfering RNA and lamivudine in stable HBV-producing
cells. J Virol 79:14392–14403
7. Feng Y, Sun B, Li X, Zhang L, Niu Y, Xiao C, Ning L, Fang Z,
Wang Y, Zhang L, Cheng J, Zhang W, Hao X (2007) Differen-
tially expressed genes between primary cancer and paired lymph
node metastases predict clinical outcome of node-positive breast
cancer patients. Breast Cancer Res Treat 103:319–329
8. Gomes LI, Silva RL, Stolf BS, Cristo EB, Hirata R, Soares FA,
Reis LF, Neves EJ, Carvalho AF (2003) Comparative analysis of
amplified and nonamplified RNA for hybridization in cDNA
microarray. Anal Biochem 321:244–251
9. Smith L, Underhill P, Pritchard C, Tymowska-Lalanne Z, Abdul-
Hussein S, Hilton H, Winchester L, Williams D, Freeman T,
Webb S, Greenfield A (2003) Single primer amplification (SPA)
of cDNA for microarray expression analysis. Nucleic Acids Res
31:e9
10. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP
(2002) Normalization for cDNA microarray data: a robust com-
posite method addressing single and multiple slide systematic
variation. Nucleic Acids Res 30:e15
11. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine
M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey
KJ, Riss J, Barrett JC, Weinstein JN (2003) GoMiner: a resource
for biological interpretation of genomic and proteomic data.
Genome Biol 4:R28
12. Turashvili G, Bouchal J, Baumforth K, Wei W, Dziechciarkova
M, Ehrmann J, Klein J, Fridman E, Skarda J, Srovnal J, Hajduch
M, Murray P, Kolar Z (2007) Novel markers for differentiation of
lobular and ductal invasive breast carcinomas by laser micro-
dissection and microarray analysis. BMC Cancer 7:55
13. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H,
Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist
H, Matese JC, Brown PO, Botstein D, Eystein Lonning P, Bor-
resen-Dale AL (2001) Gene expression patterns of breast carci-
nomas distinguish tumor subclasses with clinical implications.
Proc Natl Acad Sci USA 98:10869–10874
14. van de Vijver MJ, He YD, Van0t Veer LJ, Dai H, Hart AA,
Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ,
Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der
Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH,
Bernards R (2002) A gene-expression signature as a predictor of
survival in breast cancer. N Engl J Med 347:1999–2009
15. Taylor-Papadimitriou J, Stampfer M, Bartek J, Lewis A, Boshell
M, Lane EB, Leigh IM (1989) Keratin expression in human
mammary epithelial cells cultured from normal and malignant
tissue: relation to in vivo phenotypes and influence of medium. J
Cell Sci 94(Pt 3):403–413
16. Nishidate T, Katagiri T, Lin ML, Mano Y, Miki Y, Kasumi F,
Yoshimoto M, Tsunoda T, Hirata K, Nakamura Y (2004) Gen-
ome-wide gene-expression profiles of breast-cancer cells purified
with laser microbeam microdissection: identification of genes
associated with progression and metastasis. Int J Oncol 25:797–
819
17. Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nord-
gren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont
D, Cardoso F, Peterse H, Nuyten D, Buyse M, Van de Vijver MJ,
Bergh J, Piccart M, Delorenzi M (2006) Gene expression pro-
filing in breast cancer: understanding the molecular basis of
histologic grade to improve prognosis. J Natl Cancer Inst 98:262–
272
18. Van0t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao
M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT,
Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards
R, Friend SH (2002) Gene expression profiling predicts clinical
outcome of breast cancer. Nature 415:530–536
19. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F,
Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jat-
koe T, Berns EM, Atkins D, Foekens JA (2005) Gene-expression
profiles to predict distant metastasis of lymph-node-negative
primary breast cancer. Lancet 365:671–679
Breast Cancer Res Treat (2010) 122:65–75 75
123