Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
IDENTIFICATION OF ABERRANT EPIGENETIC
EVENTS IN MSS/CIMP-NEGATIVE COLON CANCER
A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE
UNIVERSITY OF HAWAI‘I AT MĀNOA IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
IN
MOLECULAR BIOSCIENCES AND BIOENGINEERING
May 2014
By
Min-Ae Song
Dissertation Committee
Maarit Tiirikainen, Chairperson
Jason Barbour
Dulal Borthakur
Lana Garmire
Alika Maunakea
We certify that we have read this dissertation and that, in our opinion, it is satisfactory in
scope and quality as a dissertation for the degree of Doctor of Philosophy in Molecular
Biosciences and Bioengineering.
DISSERTATION COMMITTEE
_______________________________
Maarit Tiirikainen, Chairperson
_______________________________
Jason Barbour
________________________________
Dulal Borthakur
_______________________________
Lana Garmire
_______________________________
Alika Maunakea
ii
© Copyright by Min-Ae Song 2014
All Rights Reserved
iii
ACKNOWLEDGEMENTS
Completing my PhD degree has been the most challenging activity of my life. I
would never have been able to finish my dissertation without the guidance of my
committee members, help from friends, and support from my family. I would like to send
a warm and well deserved thanks to the following organizations for the research materials
and to the following people for intellectual and moral support.
I would like to thank all colon tissue donors, their families and colon cancer
family registry for the advancement of science. I would like to express my special
appreciation and thanks to my academic advisor Dr. Maarit Tiirikainen for outstanding
guidance, mentorship and caring throughout my graduate career. I would also like to
thank my thesis committee members Drs. Jason Barbour, Dulal Borthakur, Lana Garmire,
and Alika Maunakea for serving as my committee, for their helpful comments and
discussions. I would especially like to thank Dr. Loic Le Marchand for being a great
mentor and for being so generous in financially supporting my research. I would like to
thank my academic advisor for the master’s degree Dr. Suman Lee for always being in
my corner and for being a true inspiration for me. I would like to give special thanks to
Drs. Song-yi Park and Unhee Lim for caring, encouraging me, being there for me, and
making sure I made a right decision to go through the graduate school. I would like to
express sincere appreciation to all members of the liver cancer team, especially, Drs.
Linda Wong, Herbert Yu, and Sandi Kwee for their generously continuous support and for
helping me to shape my interest and ideas in liver cancer studies which I was was
conducting in parallel to my thesis work on colon cancer. I would also especially like to
iv
thank my lab-mates, Annette Jones, Ann Seifried and Matt Hiramoto for their friendship,
support, and insight, day-in and day-out. I would like to a offer special thanks to my
family. Words cannot express how grateful I am to my parents, two elder sisters; Min-Suk
Song, Yun-sook Song, and an elder brother; Yui-Sung Song in Korea for all of their
sacrifices and for always supporting and encouraging me with their best wishes. This
dissertation had never even been started if it was not for my wonderful husband Dong
Hyun Kim. I would like to thank him for always cheering me up and standing by me
through the good and bad times. I would like to thank to my beloved two kids Claire
Haeun Kim and Aiden Seojin Kim for being such great a daughter and a son.
v
ABSTRACT
My research project focused on the identification of aberrant epigenetic changes via non-
coding RNAs and DNA methylation in MSS/CIMP-negative colon cancer, the major
subtype. DNA methylation is the most well studied epigenetic event, while non-coding
RNA-mediated transcriptional silencing has also an important role in cancer. My first aim
was to profile microRNAs and their potential target genes in colon cancer by a study of
10 paired normal and tumor colon tissues using data from Next Generation Sequencing
and Exon arrays. Nineteen miRNAs, including 6 previously colon cancer associated and
13 not previously implicated, were found aberrantly expressed in the tumor tissues.
Thirty-six colon cancer related genes were significantly correlated to the expression
levels of the identified miRNAs and ‘Wnt/beta-catenin Signaling’ was identified as the
top canonical pathway for these target genes. My second aim was to identify small and
long novel non-coding RNAs at the 8q24 region which contains one of the most relevant
colon cancer risk variants, SNP rs6983267. Thirty-two pre-miRNAs were identified In
Silico by two algorithms, but none of them were verified in further studies. However, a
novel long non-coding RNA spanning the rs6983267 was recently identified, and
significantly elevated expression levels were observed in 23 colon tumor tissues in our
sample set. Also, one known miRNA in a cluster 400 kb away from the risk SNP showed
genotype dependent expression patterns. My last aim was to elucidate the landscape of
genome-wide DNA methylation in colon cancer. General hypomethylation was observed,
concentrating in the intergenic regions and gene bodies, while hypermethylation was
observed in promoter regions, N_Shores and CpG islands. Differentially methylated
vi
CpGs were enriched in genes with roles in cancer and gastrointestinal disease.
Observations in imprinted genes suggest a more widespread dysregulation of imprinting
in colon cancer than previously reported. The findings of epigenetic alterations in colon
cancer will hopefully contribute to a better understanding of these aberrant events: how
they are related to colon cancer development and progression, while these findings may
also lead to discovery of new biomarkers that can be utilized in patient diagnosis,
stratification and the follow-up of the treatment.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS .......................................................................................................... iii
ABSTRACT .................................................................................................................................... v
LIST OF FIGURES ........................................................................................................................ x
LIST OF ABBREVIATIONS .................................................................................................... xiii
LIST OF PUBLICATIONS BY MIN-AE SONG RELATED TO THIS THESIS WORK .... 21
CHAPTER 1 ................................................................................................................................ 26
INTRODUCTION ....................................................................................................................... 26 1.1. Epigenetic mechanisms------------------------------------------------------------------------26
1.1.1. Histone modification ........................................................................................................................ 26 1.1.3. Non-coding RNAs ............................................................................................................................ 28 1.1.2. DNA methylation .............................................................................................................................. 29 1.1.3.1. Long non-coding RNAs .............................................................................................................. 32 1.1.3.2. MicroRNAs ..................................................................................................................................... 35
1.2. Epigenetic alterations in cancer-------------------------------------------------------------37 1.2.1. Histone modification in cancer ..................................................................................................... 37 1.2.2. DNA methylation in cancer ........................................................................................................... 38 1.2.3. LncRNAs in cancer .......................................................................................................................... 40 1.2.4. MiRNAs in cancer ............................................................................................................................ 42
1.3. Characterization of colorectal cancer------------------------------------------------------44 1.3.1. Molecular classification of colorectal cancer ........................................................................... 45 1.3.2. Genetic and epigenetic alterations of CRC ............................................................................... 46
1.4. Genome-wide State-of-the-art methods used for epigenetics-----------------------51 1.4.1. SOLiD Next Generation Sequencing (NGS) for miRNAs .................................................. 51 1.4.2. Illumina Infinium HumanMeth450 BeadChip for Methylation ......................................... 53
1.5. Research aims------------------------------------------------------------------------------------55 1.6. Significance---------------------------------------------------------------------------------------56
CHAPTER 2 ................................................................................................................................ 58
COMPREHENSIVE PROFILING OF EXPRESSION ALTERATIONS IN KNOWN AND NOVEL MIRNAS IN COLON CANCER ...................................................................................... 58
2.1. Introduction--------------------------------------------------------------------------------------58 2.2. Materials and method--------------------------------------------------------------------------58
2.2.1. Small RNA extraction and quality checks ................................................................................. 58 2.2.2. SOLiD sequencing ............................................................................................................................ 59 2.2.3. Statistical data analysis of SOLiD sequencing ........................................................................ 60 2.2.4. Technical Validation and Replication using realtime RT-qPCR ......................................... 61 2.2.5. Sample processing for Affymetrix Exon Arrays ..................................................................... 62 2.2.6. Integrated analysis and Ingenuity pathway analysis .............................................................. 62
2.3. Results and discussion-------------------------------------------------------------------------63 2.3.1. Quality checks of small RNAs ...................................................................................................... 63 2.3.2. Small RNA library preparation ..................................................................................................... 64 2.3.3. Deep sequencing of small RNAs ................................................................................................. 65
viii
2.3.4. Evaluation and preliminary analyses of the NGS data .......................................................... 66 2.3.5. MiRNA expression profiles in colon normal and tumor tissues......................................... 69 2.3.6. Correlation of expression levels between 19 differentially expressed tumor miRNAs a
nd their predicted target genes .................................................................................................................. 75 2.3.7. Previous findings of known colon cancer miRNAs in cancers .......................................... 80 2.3.8. Previous findings of colon cancer new-miRNAs in cancers ............................................... 82
2.4. Conclusion-----------------------------------------------------------------------------------------89
CHAPTER 3 ................................................................................................................................ 90
IDENTIFICATION OF NONCODING RNAS IN THE 8q24 REGION SPANNING THE MULTIPLE CANCER RISK LOCUS SNP rs6983267 ............................................................ 90
3.1. Introduction--------------------------------------------------------------------------------------90 3.2. Materials and methods------------------------------------------------------------------------91
3.2.1. In Silico prediction of potential miRNAs in the 8q24 region ............................................. 91 3.2.2. Total RNA extraction ....................................................................................................................... 92 3.2.3. Reverse Transcriptase Quantitative PCR ................................................................................... 92 3.2.4. Affymetrix Genome-Wide Human SNP 6.0 Array ................................................................. 93
3.3. Results and discussion-------------------------------------------------------------------------93 3.3.1. Identification of novel miRNAs in the 8q24 region using computational algorithms 93 3.3.2. Altered expression of five known miRNAs located in the 8q24 region .......................... 96 3.3.3. Altered expression of novel lncRNAs in the 8q24 region .................................................... 98
3.4. Conclusion---------------------------------------------------------------------------------------100
CHAPTER 4 ............................................................................................................................. 101
LANDSCAPE OF ALTERED METHYLATION IN COLON CANCER ............................. 101 4.1. Introduction-------------------------------------------------------------------------------------101 4.2. Materials and methods-----------------------------------------------------------------------101
4.2.1. Information on Patient Specimen .............................................................................................. 101 3.2.2. Total RNA extraction .................................................................................................................... 102 4.2.3. Affymetrix Exon Arrays ............................................................................................................... 103 4.2.2. DNA Extraction and Bisulfite Conversion ............................................................................. 103 4.2.3. HumanMethylation450 BeadChips .......................................................................................... 104 4.2.4. Raw data normalization ............................................................................................................... 105 4.2.5. Initial filtering of beta-values ..................................................................................................... 105 4.2.6. Statistical analysis of differential methylation ...................................................................... 106 4.2.7. Ingenuity pathway analysis (IPA) ............................................................................................. 106
4.3. Results and discussion-----------------------------------------------------------------------107 4.3.1. Aims of data analysis .................................................................................................................... 107 4.3.2. Conversion of beta-values to M-values................................................................................... 107 4.3.3. Quality checks based on the distribution of the beta-values ............................................ 109 4.3.4. Distribution and classification of CpGs .................................................................................. 113 4.3.5. Identification of the genome-wide methylation profiles in colon cancer ..................... 113 4.3.6. Genome-wide methylation patterns of significant DM CpGs in colon cancer ........... 116 4.3.7. MSS/CIMP-neg colon cancer DM CpGs compared to another cancer type ............... 123 4.3.8. Deregulated methylation at imprinted genes in MSS/CIMP-neg colon cancer .......... 129 4.3.9. Correlation between DNA methylation and miRNA expression .................................... 133
4.4. Conclusion---------------------------------------------------------------------------------------137 GENERAL DISCUSSION-----------------------------------------------------------------------------138
REFERENCES.......................................................................................................................... 147
ix
LIST OF TABLES
Table 1. Genomic loci associated with CRC risk.
Table 2. HumanMethylation450 BeadChip coverage through gene regions.
Table 3. Sample information for NGS miRNA study.
Table 4. NGS Mapping Results.
Table 5. Numbers of significant mRNAs in 21 comparisons.
Table 6. Expression levels and putative targets of 6 differentially expressed known colon
tumor miRNAs.
Table 7. Significantly differentially expressed target genes (FDR q<0.05) among
correlated targets (FDR q<0.05).
Table 8. Top IPA network of the CRC miRNA target genes.
Table 9. Significantly correlated target mRNAs that have been reported in other studies
and their differential expression in colon tumor compared to normal tissues in the current
study.
Table 10. Sample information for the ncRNA study.
Table 11. Lists of predicted pre-miRNAs by ProMiRII and miR-abela.
Table 12. Sample information for methylation analysis.
Table 13. IPA top networks for genes with DM CpGs.
Table 14. List of genes that have previously been reported to be mutated in colon cancer.
Table 15. Significant differential methylation in imprinted gene loci between colon tumor
and normal tissues at Bonferroni corrected p<0.05.
Table 16. The number of analyzed CpGs for differentially expressed miRNAs.
x
LIST OF FIGURES
Figure 1. Dynamic regulation of transcription by histone modifications.
Figure 2. Conversion of the cytosine to 5-methylcytosine by DNA methyltransferase
(DNMT).
Figure 3. Distribution of CpG dinucleotides throughout the genome.
Figure 4. The decision tree to select appropriate DNA methylation analysis methods.
Figure 5. Paradigms for cellular functions of lncRNAs (red).
Figure 6. LncRNAs play roles in the chromatin remodeling, transcriptional control, post-
transcriptional processing.
Figure 7. MicroRNA biogenesis.
Figure 8. MicroRNAs' involvement in colorectal cancer pathogenesis.
Figure 9. Progressive altered genetics and epigenetics steps in the development of CRCs.
Figure 10. Derivation of molecular CRC groups 1-5 based on CIMP status and MSI status.
Figure 11. Classification of 125 CRCs and heatmap representation of Illumina
HumanMethylation27 BeadChip analysis.
Figure 12. Epigenetic alterations in colon cancer.
Figure 13. Overview of SOLiD sequencing chemistry.
Figure 14. Schematic of Infinium I (A) and II (B) technology.
Figure 15. Outline of the three Thesis Aims.
Figure 16. The quality of the total RNAs and the small RNA preparations.
Figure 17. The quality of the size selected cDNA libraries was checked on the
Bioanalyzer using the DNA 1000 chip.
Figure 18. Read length distribution (nt, number of nucleotides) of sequences mapped to
miRBase.
Figure 19. The strongest oncogenic and tumor suppressor miRNA candidates in colon
tumor tissues.
Figure 20. Expression values from SOLiD sequencing (X-axis) plotted against qPCR
delta Ct values (Y-axis).
Figure 21. Source of Variation and Principal Component Analysis (PCA). A. Source of
variation.
xi
Figure 22. Tumor vs. normal miRNA expression profiles of 13 newly identified miRNAs.
Figure 23. Genes significantly correlated with top miRNAs in the Wnt/beta-catenin
signaling pathway in colon tumor tissue.
Figure 24. In Silico-predicted pre-miRNAs in the 30kb region flanking the 8q24 SNP
rs6983267.
Figure 25. Examples of secondary stem-loop structures of predicted miRNA precursors
from 3’region of rs6983267 by ProMiRII.
Figure 26. Expression levels of five known miRNAs located in the 8q24 region by
different tissue types (10 tumors and 10 normals) and the genotype of the rs6983267 SNP
(5 GG versus 5 TT for each group).
Figure 27. Elevated expression of a novel lncRNA, CCAT2, in tumors (A) and the
differential expression between GG, GT and TT samples.
Figure 28. The workflow for HumanMethylation450 BeadChips.
Figure 29. M-value transformation to address the issue of heteroscedasticity.
Figure 30. Histograms of beta-values (A) and M-values (B) interrogating CpGs in the
total of 485,577 CpGs.
Figure 31. Histogram of average beta-values for 485,577 CpGs in 40 tumor samples (A)
and 36 adjacent normal samples (B).
Figure 32. Distribution (A) and median of average beta-values (B) on 113 CpGs
consistently methylated in normal, but not in tumor tissues by Peter Laird’s group.
Figure 33. Unsupervised hierarchical clustering of beta-values for 8 CpGs (rows) in
pooled samples (A), and only paired samples (B) (columns).
Figure 34. Dot plots of beta-values in 26 paired colon tissues for 8 previously identified
hypermethylated CpGs by Karpinski et al.
Figure 35. Distribution of CpGs across functional genomic locations (A) and CGIs (B).
Figure 36. Volcano plots showing the magnitude of differential methylation levels (delta-
beta) in the entire CpGs sets; (A) various functional regions; (B) CpG islands and the
surrounding regions.
Figure 37. Methylation profiles of (A) 304 DM CpGs with Bonferroni corrected p<0.05
and (B) 152 DM CpGs with delta-beta values ≥ l0.2l by PCA (left), and unsupervised
hierarchical clustering (right).
xii
Figure 38. Functional location of the 152 DM CpGs. (A) Distribution of 152 DM CpGs
including 18 hypermethylated CpGs (Left) and 134 hypomethylated CpGs (Right).
Figure 39. Distribution of CGIs and surrounding regions of 152 DM CpGs. (A)
Distribution of 152 DM CpGs including 18 hypermethylated CpGs (Left) and 134
hypomethylated CpGs (Right).
Figure 40. Clustering of normal tissues (colon and liver) and tumor tissues (colon cancer
and HCC) using 152 colon DM CpGs resulting in near perfect discrimination of tissues.
Figure 41. Dot plots of beta-values for 8 differentially methylated CpGs in 5 imprinted
genes in MSS/CIMP-negative colon cancer compared to adjacent normal tissues.
Figure 42. Inverse correlation between DNA methylation and gene expression level of
MEST.
Figure 43. Correlation between miRNA expression and their DNA methylation.
xiii
LIST OF ABBREVIATIONS
AGCC Affymetrix GeneChip Command Console
ANCOVA Analysis of Covariance
APC adenomatous polyposis coli
BH Benjamini-Hochberg
BH-FDR Benjamini-Hochberg’s false discovery rate
BMP bone morphogenetic protein
C-DM cancer specific differentially methylated
CASP3 caspase 3
CCAT1 colon cancer associated transcript 1
CCAT2 colon cancer associated transcript 2
CDK4,6 cyclin dependent kinase 4,6
CIMP CpG island methylator phenotype
CLL chronic lymphocytic leukaemia
CML chronic myelogenous leukemia
CRC Colorectal cancer
CTGF connective tissue growth factor
DCC deleted in colorectal carcinoma
DM differentially methylated
ECM extracellular matrix
EGFR epidermal growth factor receptor
EMT epithelial-mesenchymal transition
EMT epithelial mesenchymal transition
EXPO5 exportin 5
GWAS genome-wide association studies
hESC human embryonic stem cells
HOTAIR HOX antisense intergenic lncRNA
ICAMs intercellular adhesive molecules
IPA Ingenuity Pathway Analysis
KLF4 Krüppel-like factor 4
known-
miRNAs previously reported miRNAs
KRAS Kirsten rat sarcoma viral oncogene homolog
LncRNAs Long non-coding RNAs
LOI loss of imprinting
MC microCosm
miRAGE miRNA serial analysis of gene expression
MMPs matrix metallopeptidases
MMR DNA mismatch repair gene
mRNA messenger RNA
microRNAs miRNAs
MSCs mesenchymal stem cells
xiv
MSI microsatellite instability
mTOR mechanistic target of rapamycin
new-
miRNAs newly identified miRNAs
NGS Next Generation Sequencing
PCA Principal Component Analysis
PDCD4 programmed cell death 4
PI3K phosphatidylinositol-3-kinase
POU5F1B POU class 5 homeobox 1B
POU5FP1 POU class 5 homeobox 1B pseudoprotein 1
pre-miRNAs precursor miRNAs
PRNCR1 Prostate cancer non-coding RNA 1
PTEN phosphatase and tensin homolog
PVT1 plasmacytoma variant translocation 1
QC quality control
R-SBE repressive SBE sequence
RASSF1A ARAS association family 1 gene
RECK
reversion inducing cysteine rich protein with kazal
motifs
RISC RNA inducing silencing complex
rRNA ribosomal RNA
SBE Smad binding element
SIRT1 sirtuin 1
SNPs single nucleotide polymorphisms
T-DM tissue specific differentially methylated
TGFb transforming growth factor b
TGFR1/2 transforming growth factor, beta receptor 1/2
TIMP3 tissue inhibitor of metalloproteinase 3
tRNAs transfer RNA
TS TargetScan 5.1
TSP1 thrombospondin 1
TSS transcription start site
UCRs ultra conserved regions
uPAR urokinase plasminogen activator surface receptor
USP33 ubiquitin specific peptidase 33
UTR untranslated region
XIST X-inactive specific transcript
ZEB1/2 zinc finger E box binding homeobox 1
5-FU 5-fluorouracil
21
LIST OF PUBLICATIONS BY MIN-AE SONG RELATED TO THIS
THESIS WORK
PUBLICATIONS
1. Unhee Lim and Min-Ae Song, Dietary and Lifestyle Correlates of DNA Methylation,
Methods in Molecular Biology in Cancer Epigenetics, Springer Science (Human Press).
2011. (Book Chapter)
Abstract
Lifestyle factors, such as diet, smoking, physical activity and body weight management, are
known to constitute the majority of cancer causes. Epigenetics has been widely proposed as a
main mechanism that mediates the reversible effects of dietary and lifestyle factors on
carcinogenesis. This chapter reviews human studies on potential dietary and lifestyle
determinants of DNA methylation. Apart from a few prospective investigations and
interventions of limited size and duration, evidence mostly comes from cross-sectional
observational studies and supports some associations. Considering the plasticity of epigenetic
marks and correlated nature of lifestyle factors, more longitudinal studies of healthy individuals
of varying age, sex, and ethnic groups are warranted, ideally with simultaneous and
comprehensive data collection on various lifestyle factors. Studies to date suggest that certain
dietary components may alter genomic and gene-specific DNA methylation levels in systemic
and target tissues, affecting genomic stability and transcription of tumor suppressors and
oncogenes. Most data and supportive evidence exist for folate, a key nutritional factor in one-
carbon metabolism that supplies the methyl units for DNA methylation. Other candidate
bioactive food components include alcohol and other key nutritional factors of one-carbon
metabolism, polyphenols and flavonoids in green tea, phytoestrogen and lycopene. Some data
22
also support a link of DNA methylation with physical activity and energy balance. Effects of
dietary and lifestyle exposures on DNA methylation may be additionally modified by common
genetic variants, environmental carcinogens, and infectious agents, an aspect that remains largely
unexplored. In addition, growing literature supports that the environmental conditions during
critical developmental stages may influence later risk of metabolic disorders in part through
persistent programming of DNA methylation. Further research of these modifiable
determinants of DNA methylation will improve our understanding of cancer etiology and may
present certain DNA methylation markers as attractive surrogate endpoints for prevention
research.
2. Min-Ae Song, Maarit Tiirikainen, Sandi Kwee, Gordon Okimoto, Herbert Yu, Linda L.
Wong. Elucidating the Landscape of Aberrant DNA Methylation in Hepatocellular
Carcinoma. PLOS ONE, 8(2): e55761, 2013
Abstract
Background: Hepatocellular carcinoma (HCC) is one of the most common cancers and
frequently presents with an advanced disease at diagnosis. There is only limited knowledge of
genome-scale methylation changes in HCC.
Methods and Findings: We performed genome-wide methylation profiling in a total of 47
samples including 27 HCC and 20 adjacent normal liver tissues using the Illumina
HumanMethylation450 BeadChip. We focused on differential methylation patterns in the
promoter CpG islands as well as in various less studied genomic regions such as those
surrounding the CpG islands, i.e. shores and shelves. Of the 485,577 loci studied, significant
differential methylation (DM) was observed between HCC and adjacent normal tissues at 62,692
23
loci or 13% (p,1.03e-07). Of them, 61,058 loci (97%) were hypomethylated and most of these
loci were located in the intergenic regions (43%) or gene bodies (33%). Our analysis also
identified 10,775 differentially methylated (DM) loci (17% out of 62,692 loci) located in or
surrounding the gene promoters, 4% of which reside in known Differentially Methylated
Regions (DMRs) including reprogramming specific DMRs and cancer specific DMRs, while the
rest (10,315) involving 4,106 genes could be potential new HCC DMR loci. Interestingly, the
promoter- related DM loci occurred twice as frequently in the shores than in the actual CpG
islands. We further characterized 982 DM loci in the promoter CpG islands to evaluate their
potential biological function and found that the methylation changes could have effect on the
signaling networks of Cellular development, Gene expression and Cell death (p = 1.0e-38), with
BMP4, CDKN2A, GSTP1, and NFATC1 on the top of the gene list.
Conclusion: Substantial changes of DNA methylation at a genome-wide level were observed in
HCC. Understanding epigenetic changes in HCC will help to elucidate the pathogenesis and may
eventually lead to identification of molecular markers for liver cancer diagnosis, treatment and
prognosis.
3. Hui Ling, Riccardo Spizzo, Yaser Atlasi, Milena Nicoloso, Masayoshi Shimizu,
Roxana S. Redis, Naohiro Nishida, Roberta Gafà, Jian Song, Zhiyi Guo, Cristina Ivan,
Elisa Barbarotto, Ingrid De Vries, Xinna Zhang, Manuela Ferracin, Mike Churchman,
Janneke F. van Galen, Berna H. Beverloo, Maryam Shariati, Franziska Haderk,
Marcos R Estecio, Guillermo Garcia-Manero, Gijs A. Patijn, David C. Gotley, Vikas
Bhardwaj, Shureiqi Imad, Subrata Sen, Asha S. Multani, James Welsh, Ken Yamamoto,
Itsuki Taniguchi, Min-Ae Song, Steven Gallinger, Graham Casey, Stephen N Thibodeau,
24
Loïc Le Marchand, Maarit Tiirikainen, Sendurai A. Mani, Wei Zhan2, Ramana V.
Davuluri , Koshi Mimori, Masaki Mori, Anieta M. Sieuwerts, John W.M. Martens, Ian
Tomlinson, Massimo Negrini, Ioana Berindan Neagoe, John A. Foekens, Stanley R.
Hamilton, Giovanni Lanza, Scott Kopetz, Riccardo Fodde, George A. Calin. CCAT2, a
novel non-coding RNA mapping to 8q24, underlies metastatic progression and chromosomal
instability in colon cancer. Genome Research, 23(9):1446-61, 2013
Abstract
The functional roles of SNPs within the 8q24 gene desert in the cancer phenotype are not yet
well understood. Here, we report that CCAT2, a novel long noncoding RNA transcript (lncRNA)
encompassing the rs6983267 SNP, is highly over-expressed in microsatellite-stable colorectal
cancer and promotes tumor growth, metastasis, and chromosomal instability. We demonstrate
that MYC, miR–17–5p, and miR–20a are up-regulated by CCAT2 through TCF7L2-mediated
transcriptional regulation. We further identify the physical interaction between CCAT2 and
TCF7L2 resulting in an enhancement of WNT signaling activity. We show that CCAT2 is itself a
WNT downstream target, which suggests the existence of a feedback loop. Finally, we
demonstrate that the SNP status affects CCAT2 expression and the risk allele G produces more
CCAT2 transcript. Our results support a new mechanism of MYC and WNT regulation by the
novel lncRNA CCAT2 in colorectal cancer pathogenesis, and provide an alternative explanation
of the SNP-conferred cancer risk.
25
POSTER ABSTRACTS
1. Min-Ae Song, Lenora WM Loo, Iona Cheng, Graham Casey, Steven
Callinger, Stephen N Thibodeau, Loïc Le Marchand, Maarit Tiirikainen. Integrated
analysis of microRNA and mRNA expression in microsatellite-stable colon cancer using next-
generation sequencing and cDNA microarrays. American Association for Cancer Research
Annual Meeting, April, 2011.
MANUSCRIPTS IN PREPARATION
1. Min-Ae Song, Lenora WM Loo, Iona Cheng, Graham Casey, Steven
Callinger, Stephen N Thibodeau, Loïc Le Marchand, Maarit Tiirikainen. Integration
analysis of microRNA and mRNA expression in MSS/CIMP-neg colon cancer using Next
Generation Sequencing. Manuscript in preparation.
2. Min-Ae Song, Lenora WM Loo, Iona Cheng, Graham Casey, Steven
Callinger, Stephen N Thibodeau, Loïc Le Marchand, Maarit Tiirikainen. The Landscape
of Aberrant DNA Methylation in MSS/CIMP-negative Colon Cancer. Manuscript in
preparation.
26
CHAPTER 1
INTRODUCTION
1.1. Epigenetic mechanisms
Classic genetics alone is not able to explain how; despite of their identical DNA sequences,
monozygotic twins or cloned animals can have different phenotypes and different susceptibilities
to diseases. Epigenetic mechanisms may give an explanation for these phenomena (Esteller
2008). Epigenetics is defined as heritable modifications in gene function without a change of
DNA sequences (Goldberg, Allis et al. 2007). Epigenetics is also the gateway to gene-
environment interactions (Song 2011). Two major non-genetic alterations: DNA methylation and
histone modifications are tightly correlated to gene expression and activity (Goldberg, Allis et al.
2007; Mikkelsen, Ku et al. 2007). Moreover, although not currently known to be heritable, non-
coding RNAs (ncRNAs) such as microRNAs (miRNAs) and long ncRNAs (lncRNAs) have
recently been extensively studied for their roles as gene expression regulators (Lee 2012)
(Cannell, Kong et al. 2008) and they are considered to convey further epigenetic regulation.
1.1.1. Histone modification
Within the chromosome, DNA is packed into chromatin, which consists of DNA and
structural histone proteins. Within the chromatin, the repeating unit is the nucleosomes, which
are made up of about 146 base pairs (bp) of double stranded DNAs wrapped around the histone
octamer; consisting of two each of the histones H2A, H2B, H3, and H4 (Fischle, Wang et al.
2003). Epigenetic modification occurs at the amino terminal tail of the histones (Struhl 1998).
27
Histones and their modifications have an essential role in the formation of heterochromatin.
Heterochromatin (condensed or silent chromatin) is distinguished by hypoacetylation and H3K9
methylation; euchromatin (open or active chromatin) is characterized by histone H4 acetylation
and histone H3K4 methylation (Grewal and Jia 2007).
Acetylation of histones with mainly targeting the amino-terminal tails of histones H3 and
H4 plays a key role in the regulation of gene expression. The balance of the control of histone
acetylation activity is regulated by two families of enzymes, histone acetyltransferases (HATs)
and histone deacetylase (HDACs) (Trievel 2004). For a gene to be transcribed, it must become
physically accessible to the transcriptional machinery. HAT plays a role in the uncoiled DNA and
an open the chromatin structure. Conversely, HDAC plays a role in tight coiling of DNA and
close chromatin structure. Many transcription coactivators such as CBP, p300 and MOF have
been reported to possess intrinsic HAT activity, whereas many transcriptional corepressor
complexes such as mSin3a, NCoR/SMART and Mi-2/NuRd contain subunits with HDAC
activity (Wang, Zang et al. 2008). Figure 1 shows the chromatin remodeling complexes initiated
by histone modifications in the dynamic regulation of transcription (Davis and Brackmann 2003).
In contrast to the dynamic ‘on-off’ nature of histone acetylation, early studies found that
histones H3 and H4 were highly methylated with little turnover of the methyl groups (Borun,
Pearson et al. 1972; Rice and Allis 2001). Histone methylation can occur on arginine or lysine
residues and is catalyzed by histone methyltransferases (HMTs) (Trievel 2004). Arginine
residues can be mono- or di-methylated while lysines can also be tri-methylated (Cohen, Poreba
et al. 2011).
Histone modification patterns are closely associated with gene expression states. “Active”
histone modification marks such as H3K4me3 and H3K36me3 highly enriched within gene
28
promoters, may be involved in transcription initiation. “Silent” histone marks such as
H3K27me3 and H3K9me3 are correlated with transcriptional repression, in particular, the
H3K9me3 is highly correlated with constitutive heterochromatin as found at centromeres and
telomerases (Maunakea, Chepelev et al. 2010).
Figure 1. Dynamic regulation of transcription by histone modifications. In the presence of
acetylated histones by HAT and absence of methylase by HMT activity, chromatin is loosely
packed. Chromatin remodeler complex, SWI/SNF, opens up DNA region where
transcription machinery proteins such as RNA Polymerase II (RNA Pol II), transcription
factors and co-activators bind to turn on gene transcription. In the absence of SWI/SNF,
nucleosomes remain tightly aligned to one another. Additional methylation by HMT and
deacetylation by HDAC proteins condenses DNA around histones. Thus, RNA Pol II and
other activators cannot bind to DNA, leading to gene silencing (Davis and Brackmann
2003).
1.1.3. Non-coding RNAs
The human genome sequencing project revealed a quite a surprise; that the human
genome encodes just 20,000-25,000 protein-coding genes, representing less than 2% of the total
genome sequence (2004), although around 90% of the human genome is actively transcribed
29
(Birney, Stamatoyannopoulos et al. 2007). It was discovered that human transcriptome consists
of a complex network including extensive antisense transcription, overlapping multiple exons,
and non-coding RNA (ncRNA) transcription (Kapranov, Cheng et al. 2007). A ncRNA is a
functional RNA molecule that is not translated into a protein (due to the lack of a significant
open reading frame). This RNA class is classified into two major groups based on the size: small
ncRNAs (<200 nt) such as miRNAs and small interfering RNAs (siRNAs), and long ncRNAs
(lncRNAs) (>200 nt). These are arbitrarily divided by a convenient practical cut-off in typical
RNA purification protocols that exclude small RNAs (Esteller 2011). In the cell, most of the
ncRNAs are located in the cytoplasm although some are found in both cytoplasm and the nucleus
(Banfai, Jia et al. 2012).
1.1.2. DNA methylation
DNA methylation is the best known epigenetic marker (Esteller 2008). DNA methylation
is a covalent modification of post-replicative DNA by DNA methyltransferases (DNMTs)
(Herman and Baylin 2003) which transfer the methyl group from S-adenosylmethionine (SAM)
to the carbon 5 position of a cytosine residue to form 5’-methylcytosine (5mC) (Figure 2). These
methyl groups project into the major groove of double helix of DNA and effectively block
transcription. Although a small amount of methylation also occurs at CpNpG sequences, where
N can be A or T, DNA methylation in human genome mostly occurs at CpG dinucleotides rather
than any other sites (Lee, Jang et al. 2010).
A further possible modification of 5mC is the addition of hydroxyl group producing 5-
hydroxymethylcytosine (5hmC). 5hmC was initially discovered in the DNA of certain
bacteriophages (Hershey, Dixon et al. 1953) and was reported in mammalian tissues in 1972 in
30
brain and liver DNA (Penn, Suwalski et al. 1972). Recently, Tahiliani et al. discovered three
proteins; Ten-eleven translocation 1, 2, and 3 (TET1, TET2, TET3), which catalyze 5hmC
production in 2009 (Tahiliani, Koh et al. 2009). Although it has been suggested that 5hmC may
be produced as an intermediary molecule during demethylation of 5mC, the functional role and
proportion of 5hmC in the human genome warrants to be further determined.
Figure 2. Conversion of the cytosine to 5-methylcytosine by DNA methyltransferase
(DNMT). DNMT catalyzes the transfer of a methyl group (CH3) from S-
adenosylmethionine (SAM) to the 5-carbon position of cytosine (Singal and Ginder 1999).
In most cases, DNA methylation is fairly long-term, but in some cases such as in the
germ cells when silencing of imprinted genes must be reversed in germ cells during fertilization,
epigenetic reprogramming is performed. Although the mechanism for DNA demethylation is not
fully understood, deamination of 5mC may be mediated by the removal of amino groups in this
process (Morgan, Dean et al. 2004). Cytosine and especially 5mC are chemically less stable than
the other nucleobases. Cytosine deaminates into uracil, and 5mC deaminates into thymine.
Therefore, CpGs are underrepresented by about four fold of their expected frequency in
mammalian DNA (Simmen 2008). Although the general level of CpG dinucleotides within
human genome is low, high levels are observed at long repetitive sequences and CpG islands
31
(CGIs).
Mammals have three active DNA methyltransferases: DNMT1, DNMT3A and 3B.
DNMT1 maintains DNA methylation at hemi-methylated DNA following DNA replication
during cell division (Bestor 1992), whereas DNMT3A and 3B are both considered de novo
methyltransferases, recruited to establish new DNA methylation patterns (Okano, Bell et al.
1999). Although DNMT2 has been identified as a DNA methyltransferase homolog, it does not
methylate DNA but methylates aspartic acid transfer RNA (Goll, Kirpekar et al. 2006).
In mammalian DNA, 5mC is found in approximately 4% of genomic DNA, primarily at
CpGs. CpGs are not uniformly distributed throughout the human genome, but are found more
frequently at small regions of DNA called CpG islands (CGIs) (Herman and Baylin 2003). The
accepted definition of a CGI is a region with at least 200 bp having a GC content greater than 50%
and with an observed-to-expected CpG ratio that is greater than 60% (Gardiner-Garden and
Frommer 1987). About 70% of annotated human genes are associated with the CGIs (Saxonov,
Berg et al. 2006).
Figure 3. Distribution of CpG dinucleotides throughout the genome. N and S indicate the
upstream and downstream of CGIs, respectively.
Recently, the surroundings of CGI within genome have been further classified: CGI
32
shores (up to 2kb away from CGI) and CGI shelves (2kb to 4kb away from CGI). Interestingly, it
was found that most of the methylation actually occurs at CGI shores rather than in the CGIs
themselves (Irizarry, Ladd-Acosta et al. 2009). Figure 3 shows the classification of locations of
notable CGIs and their surrounding regions.
No single method to detect DNA methylation can be appropriate for every study. DNA
methylation can be analyzed by many different assays depending on the purpose of the study as
described in Figure 4 (Shen and Waterland 2007). For the investigation of DNA methylation,
bisulfite conversion of DNA, which converts unmethylated cytosines to uracil, leaving
methylated cytosines unchanged, is mostly essential for analyzing DNA methylation at specific
CpGs.
Figure 4. The decision tree to select appropriate DNA methylation analysis methods. (Shen
and Waterland 2007).
1.1.3.1. Long non-coding RNAs
LncRNAs can be produced from mRNA transcription process via alternative splicing
33
intragenically from exons and introns (Shi, Sun et al. 2013) (Figure 5) and they play regulatory
roles at almost every stage of gene expression; from targeting epigenetic modifications in the
nucleus to modulating mRNA stability and translation in the cytoplasm (Mercer and Mattick
2013) (Figure 5).
Figure 5. Paradigms for cellular functions of lncRNAs (red). Transcription from an
upstream lncRNA promoter can negatively (1) or positively (2) affect expression of the
coding gene (purple) by inhibiting RNAPol II recruitment or inducing chromatin
remodeling (HOTAIR lncRNA recruits polycomb complex to induce heterochromatin
formation by H3K27m). In addition, antisense transcripts can pair to their specific sense
RNA, generating alternative splicing (3) or endo-siRNAs (4). When they interact with
proteins, they may influence protein activity (5) or localization (6) or even form cellular
substructures or protein complexes (7). LncRNAs can be processed to yield small, single- or
double-stranded RNAs that may act as endo-siRNAs or miRNAs (8). Moreover, they can
also act as “miRNA sponges” that affect the ceRNA network. LncRNAs: long noncoding
RNAs, miRNA: microRNA, ceRNA: competitive endogenous RNAs (Shi, Sun et al. 2013).
LncRNAs are known for their important roles in epigenetic regulation via chromatin
34
modification, transcription, and post-translational processing (Figure 6) (Mercer, Dinger et al.
2009). Interestingly, at least 38% of lncRNAs bind to the histone methyltransferase complex
‘polycomb repressive complex2’ (PRC2) or the chromatin modifying proteins (Cheetham, Gruhl
et al. 2013). First discovered in 1991 and a well characterized lncRNA is X Inactivation Specific
Transcription (XIST) (Brown, Ballabio et al. 1991). XIST contains conserved repeats within the
transcript and is largely localized in the nucleus (Brown, Hendrich et al. 1992). Repeat region A
(RepA) is required for silencing function of XIST in cis X inactivation. RepA recruits the PRC2,
which lays down H3K27me, to silence one of the X chromosomes (Zhao, Sun et al. 2008).
Another well-studied lncRNA is Hox transcript antisense RNA (HOTAIR), which originates from
the HOXC locus at chromosome 12 and silences HOXD locus at chromosome 2 by recruiting
PRC2 to silence it (Rinn, Kertesz et al. 2007).
Figure 6. LncRNAs play roles in the chromatin remodeling, transcriptional control, post-
transcriptional processing. (a) LncRNAs can recruit chromatin modifying complexes to
specific genomic CpGs. HOTAIR and XIST recruit the chromatin modifying Polycomb
complex to HoxD locus in the X chromosome or the Kcng1 domain, respectively, where
they methylate H3K27 to induce heterochromatin formation and repress gene expression.
35
Therefore, a lncRNA can regulate the transcriptional process. (b) A lncRNA binds to the
cyclin D1 gene and recruits the RNA binding protein to modulate the p300 to repress gene
transcription. (c) A lncRNA acts as a co-activator to the transcription factor and regulates
gene expression. (d) A lncRNA transcribed from the DHFR minor promoter in humans can
form a triplex at the major promoter to prevent the binding of the general transcription
factor TFIID, leading to silent DHFR gene expression (e) An antisense ncRNA binds to
mRNA and it results in alternative splicing by blocking of Spliceosome.
1.1.3.2. MicroRNAs
MiRNAs were first discovered in 1993 during a study of the gene lin-14 in Caenorhabditis
elegans (C. elegans) development by Lee et al. (Lee, Feinbaum et al. 1993) and have been
proven as an essential component of the epigenetic regulation. In 2000, a second important
miRNA, let-7 was discovered also in C. elegans (Reinhart, Slack et al. 2000). Let-7 miRNAs
have now been predicted or experimentally identified in a wide range of species.(MIPF000002).
MiRNAs play an important role in gene transcription regulation in different species
including the vertebrates (Lagos-Quintana, Rauhut et al. 2001). As of June in 2013, 30,424
mature miRNAs in 206 species including 2,555 mature human miRNAs have been registered in
the miRBase database (http://microrna.sanger.ac.uk). MiRNAs play important roles in basic
biological functions including cell growth, proliferation, differentiation, invasion, and
angiogenesis by the downregulation of their target mRNAs.
The biogenesis of a miRNA begins with the transcription of a primary transcript (pri-
miRNA). This hairpin structure is transcribed from the miRNA gene as 500-3,000 nucleotide
long transcripts by RNA Pol II and then cleaved by a protein complex involving Drosha/DGCR8.
This results in the precursor miRNA (pre-miRNA, ~60-100 nucleotides) and these double
stranded hairpin structures are exported from the nucleus to cytoplasm by exportin 5 (EXPO5)
36
(Lagos-Quintana, Rauhut et al. 2001). Next, the pre-miRNA is further cleaved by Dicer1 (an
RNaseIII-containing enzyme) to produce the double stranded miRNA that includes a mature
miRNA sequence (~22 nucleotides, guide strand) and its complementary sequence, which a
miR* (star) (called also as passenger strand or 3p strand) (Denli, Tops et al. 2004). Whereas the
5’-end of the guide strand (the so-called “seed site”) represses the target coding mRNAs by
binding to the 3’untranslated region (UTR) of their target mRNAs and further through
incorporation into the RNA inducing silencing complex (RISC) (O'Toole, Miller et al. 2006), the
passenger strand is usually subjected to degradation (Khvorova, Reynolds et al. 2003). Binding
of a guide miRNA to an mRNA either triggers mRNA cleavage or inhibition of translation
depending on the degree of complementarity between the miRNA and the target sequence
(Figure 7). Interestingly, each miRNA has a potential to target a large number of genes and
bioinformatic analysis of miRNAs predicts that the 3’UTRs of single genes are often targeted by
several different miRNAs (Lewis, Burge et al. 2005). Many different algorithms have been
developed for the prediction of the miRNA-mRNA interactions. Well-annotated algorithms are
based on so called conservation criteria, such as miRanda (John, Enright et al. 2004), PicTar
(Krek, Grun et al. 2005) and TargetScan (Grimson, Farh et al. 2007). Moreover, other parameters
have been used, such as free energy of binding or secondary structures of 3’UTR that can
promote or prevent miRNA binding (Witkos, Koscianska et al. 2011). However, the rules for
predicting the interaction have not been fully established yet, and current knowledge of miRNAs
and their targets is based mainly on experimentally validated real miRNA-mRNA interactions
(Witkos, Koscianska et al. 2011).
37
Figure 7. MicroRNA biogenesis. (a) MiRNAs are transcribed by RNA pol II into pri-
miRNAs which are recognized and cleaved in the nucleus by Drosha, resulting in hairpin
pre-miRNAs. (b) Pre-miRNAs are exported by Exportin 5 from the nucleus to the
cytoplasm and further cleaved by Dicer, (c) resulting in a miRNA duplex. One strand of
miRNA duplex (mature miRNA) is incorporated into the RISC (d). The mature miRNA
leads RISC to cleave the mRNA or induce translational repression depending on the degree
of complementarity between the miRNA and its target (Garzon, Calin et al. 2009).
1.2. Epigenetic alterations in cancer
1.2.1. Histone modification in cancer
Given the fundamental role of histone modification in regulation of gene expression as
explained in 1.1.1. Histone modification, it is not surprising that aberrant histone modification is
38
found in cancer. Histone modification by HATs, HMTs, and HDACs have been found to be
involved in tumorigenesis (Fullgrabe, Kavanagh et al. 2011). Two HATs, p300 and CBP are
considered as tumor suppressors (Chan and La Thangue 2001) and loss of heterozygosity (LOH)
at the p300 locus is associated with hyperacetylation in many cancers (Tillinghast, Partee et al.
2003; Koshiishi, Chong et al. 2004). Aberrant expression of HDACs has also been found in
multiple cancers (Chervona and Costa 2012). Furthermore, HDACs have shown to associate with
the tumor suppressor retinoblastoma protein (RB) and repress RB-dependent cell cycle (Siddiqui,
Solomon et al. 2003).
1.2.2. DNA methylation in cancer
Global hypomethylation in tumors as compared to the normal tissue was one of the first
epigenetic alterations to be found (Feinberg and Vogelstein 1983). It is mainly caused by
hypomethylation of repetitive DNA sequences such as LINE-1, and causes demethylation of
coding regions as well as in introns of DNA that result in altered transcripts (Feinberg and Tycko
2004). Recent study found that hypomethylation of LINE-1 leads to activation of proto-
oncogenes such as MET, RAB3IP, and CHRM3 in colorectal liver metastasis tissues compared to
primary colorectal cancer tissues (Hur, Cejas et al. 2013). This study also indicates that increased
5hmC content is associated with LINE-1 hypomethylation in colorectal cancer, providing
important mechanistic insights into the fundamental processes underlying global DNA
hypomethylation. Hypomethylation of DNA was recently found in many CGIs in cancer, unlike
the normal pattern of methylation in somatic tissues. This can lead to gene activation in tumors
including oncogenes such as HRAS (Feinberg and Vogelstein 1983), cyclin D2, HPV16, WNT5A
and S100P (Feinberg and Tycko 2004; Wang, Williamson et al. 2007). However,
39
hypomethylation of DNA has many mechanistic implications and is not fully understood
(Feinberg and Tycko 2004).
Hypermethylation of DNA in the promoter regions of tumor suppressor genes has been
well studied as a major event in the tumorigenesis. The first finding of a hypermethylated tumor
suppressor gene, the retinoblastoma gene RB, (Greger, Passarge et al. 1989) was soon followed
by the identification of many other hypermethylated tumor suppressor genes including VHL, p16,
hMLH1, MGMT, WRN and BRCA1 (Herman, Latif et al. 1994; Esteller 2008; Kawasaki, Ohnishi
et al. 2008). Moreover, in 1999, a subtype of Colorectal Cancer (CRC) with hypermethylation at
a specific set of CGIs, the “CpG island methylator phenotype (CIMP) markers” was recognized
as a distinct subgroup of CRC (Toyota, Ahuja et al. 1999). Now this classification method has
been applied to other cancers including gastric (Toyota, Ahuja et al. 1999), breast (B-CIMP)
(Fang, Turcan et al. 2011) and glioblastoma multiforme (G-CIMP) (Noushmehr, Weisenberger et
al. 2010). Recently, the role of CIMP has also been investigated in ovarian tumors, especially at
seven CpGs; BRCA1, HIC1, MINT25, MINT31, MLH1, p73, and hTR. Hypermethylation of
those genes was found in a significant proportion of the ovarian tumors, and methylation of at
least one of these genes was found in the majority (71%, 63/93) of samples. (Strathdee, Appleton
et al. 2001).
Recently devised epigenomic techniques suggest that 100 to 400 hypermethylated CGIs
in the promoter regions occur in a given tumor (Esteller 2007). Despite of the extensive studies
of altered methylation in CGIs, it is still not clearly understood how CGIs become
hypermethylated in some types of cancer, but not in others. Moreover, the potential involvement
of methylation beyond the CGI promoters in human disease has been largely overlooked even in
genome-wide studies and the neighborhood of CGIs requires further work for our understanding
40
of cancers (Jones 2012).
1.2.3. LncRNAs in cancer
Since a number of ncRNAs such as miRNAs, tRNAs, rRNAs, and spliceosomal RNAs
are important to the functioning of the cells, it has been suggested that additional ncRNAs
may play a role in the regulation of cellular machinery (Wilusz, Sunwoo et al. 2009). Indeed,
a group of lncRNAs has been shown to be associated with developmental processes (Rinn,
Kertesz et al. 2007) and human diseases including cancer (Costa 2005), suggesting that
lncRNAs are a new class of functional transcripts. Although the biological significance of this
group of RNAs is still unclear, a variety of functions of lncRNAs have been found in normal
cells including the X-chromosome inactivation by the XIST gene (Wilusz, Sunwoo et al.
2009), genomic imprinting by H19 (Brannan, Dees et al. 1990) and DNA demethylation by
KHPS1a (Imamura, Yamamoto et al. 2004). Furthermore, recent studies have revealed
functional roles for several lncRNAs in cancer. For instance, human cancers have been
described to have aberrant overexpression of non-coding satellite repeats (Ting, Lipson et al.
2011). Also, highly conserved genomic regions called as ultra conserved regions or UCRs are
frequently aberrantly expressed in human leukemia (Calin, Liu et al. 2007) and colon cancer
(Wojcik, Rossi et al. 2010). Similarly, HOTAIR is highly expressed in breast cancers and plays
a role in retargeting chromatin-remodeling complexes (Gupta, Shah et al. 2010).
Other lncRNAs have been found to be key regulators of the protein signaling pathways
in carcinogenesis. The lncRNA lincRNA-p21 contains binding sites for the tumor suppressor
p53 in its promoter and it is directly activated by p53 under DNA damage. Similar to p53,
lincRNA-21 as a tumor suppressor is suggested (Huarte, Guttman et al. 2010).
41
To achieve replicative immortality, cancerous cells need to bypass the cellular
mechanisms inhibiting proliferation. Telomeres are consisted of many kilobases of short
repeats in humans to protect the chromosomes and they are extended by telomerases, which
are a part of a protein subgroup of specialized reverse transcriptase enzymes named as
Telomerase Reverse Transcriptases (TERTs). Because TERTs are very low expressed in
many types of human normal cells, telomeres shrink a little bit every time a cell replicates.
Recent studies have discovered that telomeric ends are transcribed into a TERRA lncRNA,
which acts as an inhibitor for telomeric DNA (Redon, Reichenbach et al. 2010). In many
cancer cells, alteration of TERRA expression has been observed (Arora, Brun et al. 2012).
Recent studies indicate that several cancer risk associated CpGs are transcribed into
lncRNAs and these transcripts play important roles in tumorigenesis (Cheetham, Gruhl et al.
2013). LncRNAs including POU5F1B (Takeda, Seino et al. 1992), PVT1 (Shtivelman,
Henglein et al. 1989), PRNCR1 (Chung, Nakagawa et al. 2011), POU5FP1 (Wright, Brown et
al. 2010), CCAT1 (Nissan, Stojadinovic et al. 2012) and CCAT2 (Ling, Spizzo et al. 2013;
Redis, Sieuwerts et al. 2013) have been identified in the 8q24 gene desert region which
harbors multiple cancer risk loci for prostate, breast, ovarian and colon cancer susceptibility
(Pomerantz et al. 2009). ANRIL, a large lncRNA gene spanning 126 kb adjacent to p14/ARF,
is located in a genome-wide association studies (GWAS) “hot spot” linked to many complex
diseases including type-2 diabetes and cancers. Recent studies have shown that multiple
disease associated SNPs mapped to the ANRIL locus may affect ANRIL function differently,
resulting in diverse diseases (Cheetham, Gruhl et al. 2013). However, the functional roles of
lncRNAs in the cancer development and progression are still not completely known, so more
investigation is needed to comprehensively understand the roles of lncRNAs in cancer.
42
1.2.4. MiRNAs in cancer
MiRNAs are directly involved in gene regulation by binding to the 3’UTR in their target
mRNAs, and many of them have been implicated in cancer. According to bioinformatical
analysis, miRNAs are thought to regulate ~30% of all genes (Lewis, Burge et al. 2005). In 2001,
Bullrich et al found chronic lymphocytic leukaemia (CLL) cases with a deletion of about 30 kb
at 13q14, at a chromosomal breakpoint (Bullrich, Fujii et al. 2001). Interestingly, two miRNA
genes, miR-15a and miR-16-1 were found in this region and loss of these miRNAs was observed
in 70% of CLLs. Following these initial observations, many other miRNAs have also been
identified in chromosomal loci which include regions of LOH, amplification, fragile sites, viral
integration sites, and other cancer associated genomic regions (Calin, Sevignani et al. 2004; Iorio
and Croce 2012). In 2005, Lu et al. presented systematic miRNA profiling in multiple human
cancer samples showing that the altered expression of miRNAs is highly correlated with
developmental lineages and differentiation states of the cancers whereas the classification based
on the mRNA profiles was highly inaccurate (Lu, Getz et al. 2005).
Recently, many approaches are applied to investigate the connection between miRNAs
and cancer (Witkos, Koscianska et al. 2011). MiRNAs have been shown to have a role in many
known oncogenic and tumor suppressor pathways involved in the pathogenesis of many cancers
such as the regulation of KRAS pathway by miR-143 (Johnson, Grosshans et al. 2005),
phosphatidylinositol-3-kinase (PI3K) pathway by miR-126 and miR-21 (Guo, Sah et al. 2008),
p53 as a transactivator of miR-34a (Chang, Wentzel et al. 2007), regulation of epithelial-
mesenchymal transition (EMT) transcription factors by the miR-200 family (Burk, Schubert et al.
2008) as well as the Wnt/beta-catenin pathway regulation by miR-135 (Nagel, le Sage et al.
43
2008). Furthermore, the miR-17-92 cluster (miR-17, miR-18a, miR-19a, miR-20a, miR-19b-1
and miR-92-1) on chromosome 13 mediates Myc-dependent tumor promoting effects (Venturini,
Battmer et al. 2007). Figure 8 shows examples of aberrantly expressed miRNAs in colorectal
cancer pathogenesis. (Slaby, Svoboda et al. 2009).
Figure 8. MicroRNAs' involvement in colorectal cancer pathogenesis. Deregulation of
miRNAs can influence colon cancer carcinogenesis if their mRNA targets are tumor
suppressor genes or oncogenes. Many studies have identified many target mRNAs in tumor
suppressor and oncogenic pathways to be involved in the pathogenesis of CRC. Many
target proteins are involved in key signaling pathways of CRC, such as Wnt/beta-catenin,
PI3K, KRAS, p53, and so on. (Slaby, Svoboda et al. 2009).
44
1.3. Characterization of colorectal cancer
Colorectal cancer (CRC) is a disease of the gastrointestinal tract arising in the epithelial
cells lining the colon consisting of the ascending, transverse, descending, and sigmoid colon, or
the lining of the rectum (2012). The development of colon cancer includes a heterogeneous
complex of etiological factors and pathogenic mechanisms (Fearon 2011). It is the third most
common cancer worldwide in both men and women in the United States, and the fourth most
common cause of death (Wiseman 2008). The American Cancer Society estimates 102,480 new
cases of colon cancer and 40,340 new cases of rectal cancer for 2013, and about 50,830 deaths
during 2013 (http://www.cancer.org).
Most colon cancer develops slowly over several years beginning as a non-cancerous
polyp on the inner lining of the colon or rectum. The vast majority of colon cancer (about 80%)
arises from adenomatous polyps (Cooper, Squires et al. 2010), which starts in cells that form
glands.
The risk of developing colon cancer is influenced by several risk factors including
modifiable risk factors such as environmental exposures, dietary factors, and lifestyle factors
(physical inactivity, obesity, high consumption of red meats, smoking, heavy alcohol use) and
non-modifiable risk factors like a personal or family history of colon cancer or adenomatous
polyps, and chronic inflammatory bowel disease (Wei, Giovannucci et al. 2004; Lin 2009).
About 25% of colon cancer occurs in people with family history (Cooper, Squires et al.
2010). On the other hand, about 5% to 10% of people who develop colon cancer have inherited
gene defects such as mutations that cause the disease.
45
1.3.1. Molecular classification of colorectal cancer
CRC results from a relatively uniform and linear sequence of steps caused by both
genetic and epigenetic alterations (Figure 9).
Figure 9. Progressive altered genetics and epigenetics steps in the development of CRCs.
APC inactivation which encodes a protein involved in cell adhesion and transcription is
found in up to 85% of all colon cancers. KRAS is mutated in 50-60% of colon cancers.
SMAD4 is involved in the transforming growth factor b (TGF-b)-signaling pathway. TP53
mutation tends to be a late event and increases the resistance of cancer cells to apoptosis.
Source: Lono DL, Fauci AS, Kasper DL, Hauser SL, Jameson JL, Loscalzo J: Harrison’s
Principles of Internal Medicine, 18th
Edition: www.accessmedicine.com
The CRC occurs mostly sporadically and only about 20-25% of colon cancer patients
have a family history, suggesting genes and environmental factor interaction. Indeed, an
accumulation of multiple genetic (Fearon and Vogelstein 1990) and epigenetic alterations (Wong,
Hawkins et al. 2007) has been found in colon epithelial cells that have transformed into
adenocarcinomas. These alterations may be defined on the basis of two molecular features
including DNA microsatellite instability (MSI) status, classified as MSI-high (MSI-H), MSI-low
(MSI-L) and MS stable (MSS), and the CIMP status, classified as CIMP-high, CIMP-low and
CIMP-negative (CIMP-neg). The most common comprehensive molecular classification system
46
of colon cancer was first proposed by Jass, defined according to MSI and CIMP status in
conjunction with clinical and pathological features (Jass 2007): Type 1 (CIMP-high⁄ MSI-H ⁄
BRAF mutation), Type 2 (CIMP-high ⁄ MSI-L or MSS ⁄ BRAF mutation), Type 3 (CIMP-low ⁄
MSS or MSI-L ⁄ KRAS mutation), Type 4 (CIMP-neg ⁄ MSS) and Type 5 or Lynch syndrome
(CIMP-neg ⁄ MSI-H). Type 4 is the major subtype of CRC (Figure 10) (Jass 2007).
Figure 10. Derivation of molecular CRC groups 1-5
based on CIMP status and MSI status (Jass 2007).
1.3.2. Genetic and epigenetic alterations of CRC
The cellular transformation process includes molecular alterations of oncogenes and
tumor suppressor genes via mechanisms such as point mutations, rearrangements and
amplifications that can disrupt regulated gene expression (Wong, Hawkins et al. 2007). The
earliest genetic change in colon cancer is often the inactivation of the APC (adenomatous
polyposis coli) gene which is a negative regulator of the Wnt signaling pathway (Gregorieff and
Clevers 2005). Also, genetic variations and altered gene expression levels in other tumor
suppressor genes (SMAD2 and TP53), oncogenes (KRAS) and multiple pathways (Wnt/beta, TGF
beta and/or base excision repair (BER) pathways) accompany transitions from normal cells to
highly malignant tumor cells (Frosina, Fortini et al. 1996; Bellacosa 2003; Gregorieff and
Clevers 2005; Slattery, Herrick et al. 2011).
47
About 65-70% of sporadic colon cancer exhibits chromosomal instability (CIN), which
leads to increase in a rate of loss or gain of whole or parts of chromosomes (Lengauer, Kinzler et
al. 1998). It was proposed that cancer cells must acquire the intrinsic genomic instability to
increase the rate of new mutations, by Loeb et al. (Loeb, Loeb et al. 2003). The CIN phenotype is
caused by alteration of the chromosome segregation pathway (Pino and Chung 2010). CIN in
colon cancer has been shown to be a marker of poor prognosis (Pritchard and Grady 2011).
A defect in the DNA mismatch repair genes leads to instability in DNA microsatellites
(MSI). MSI is the condition of a rapid genetic mutation that results from loss of function of a
DNA mismatch repair gene (MMR) (Boland and Goel 2010). Cells with an abnormally
functioning MMR tend to accumulate errors and novel microsatellite fragments are created.
Microsatellites are repeated sequences of DNA (1-6 bp) (Queller, Strassmann et al. 1993).
Although the length of these microsatellites is highly variable from person to person (part of
DNA fingerprint), each individual has microsatellites of a set length. Five markers have been
recommended by the National Cancer Institute to screen for MSI (Umar, Boland et al. 2004).
Generally, MSI detection in two of the markers is considered a positive result or as a high
probability of MSI-H. About 15% of colon cancers display MSI because of either epigenetic
silencing by methylation of a mismatch repair gene, MLH1, or a germline mutation in MLH1,
MSL2, MSH6 or PMS2. The remaining 85% of colon cancers are characterized as MSS (Wong,
Hawkins et al. 2007), but the clinicopathologic features of this group remain to be investigated.
Recently, millions of single nucleotide polymorphisms (SNPs) have been studied by
means of GWAS and the meta-analysis of GWAS. Polymorphisms underlying genetic
susceptibility to colon cancer have been intensively investigated (Table 1) (Migliore, Migheli et
al. 2011).
48
Table 1. Genomic loci associated with CRC risk.
Several independent GWAS have implicated the most promising cancer risk loci at the
8q24 region (128.0-130 Mb) in multiple epithelial cancers, including colon cancer (Easton,
Pooley et al. 2007; Zanke, Greenwood et al. 2007; Ghoussaini, Song et al. 2008). The 800 kb
region of 8q24 contains multiple cancer risk loci and the MYC proto-oncogene. This region
includes at least three regions that independently influence the risk of prostate cancer (region 2:
128.14–128.28, region 3: 128.47–128.54, and region 1: 128.54–128.62), colon cancer (128.47–
128.54) and breast cancer (128.35–128.51). Interestingly, this region contains no known protein
coding genes, but is bounded distally at its centromeric end by FAM84B and at its proximally
telomeric end by c-MYC, two candidate cancer susceptibility genes. In addition to c-MYC and
FAM84B, pseudogenes POU5FP1 and PVT1 within the 128.0- to 130-Mb region of 8q24 have
been shown to be associated with cancer risk. The over expression of POU5F1P1 in prostate
cancer and its genomic location to harbor genetic variation were suggested to have a genetic
49
function variants to modulate prostate cancer susceptibility (Kastler, Honold et al. 2010).
Previous studies have revealed various genetic alterations in PVT1 locus including chromosome
translocation, amplification and SNP in human disease (Huppi, Pitt et al. 2012) The rs6983267
SNP at 8q24.21 has been consistently associated with an increased risk of colon cancer with the
G risk allele (Pomerantz, Ahmadiyeh et al. 2009). Interestingly, signatures of functional elements
such as enhancers have been found at the genomic region spanning rs6983267 (Tuupanen,
Turunen et al. 2009).
Colon cancer can be classified into three subtypes based on methylation level causing
epigenetic instability: CIMP-high, CIMP-low, and CIMP-neg. On the genetic level, CIMP-high
are characterized by MSI and BRAF mutations and relatively rare KRAS and p53 mutations;
CIMP-low is associated with KRAS mutations and rare MSI, BRAF, or p53 mutations; CIMP-
neg cases have a high rate of p53 mutations, but lower rates of MSI or mutations of KRAS or
BRAF (Shen, Toyota et al. 2007; Ogino and Goel 2008; Hinoue, Weisenberger et al. 2012)
(Figure 11).
50
Figure 11. Classification of 125 CRCs and heatmap representation of Illumina
HumanMethylation27 BeadChip analysis. DNA methylation profiles of 1,401 probes with
most variable DNA methylation values (Standard deviation >0.2). A color scale from dark
blue as low DNA methylation to yellow as high DNA methylation is represented (Hinoue,
Weisenberger et al. 2012).
The understanding of epigenetic changes in colon cancer has advanced recently.
Examples of the altered epigenetic events are shown in Figure 12. Aberrant methylation of tumor
suppressors, oncogenes and repetitive elements such as LINE1 (Goto, Mizukami et al. 2009;
Kim, Lee et al. 2010; Migliore, Migheli et al. 2011) and also epigenetic regulation changes by
miRNAs have been identified in colon cancer (Yamakuchi, Ferlito et al. 2008; Liu and Chen
2010; Melo and Esteller 2011) (Figure 12).
51
Figure 12. Epigenetic
alterations in colon cancer.
Many signaling pathways are
affected by altered epigenetic
events which include DNA
methylation and aberrant
expression of miRNAs.
1.4. Genome-wide State-of-the-art methods used for epigenetics
1.4.1. SOLiD Next Generation Sequencing (NGS) for miRNAs
Profiling of mature miRNAs in specific tissue types is one of the key approaches to
investigating the biological roles of miRNAs. Considerable effort has been devoted to
developing methods for high throughput detection of miRNAs. Because of the short length of the
mature miRNAs, very little sequence is available to design assays for quantitative PCR or
microarrays for analyzing miRNAs without bias. Moreover, since miRNAs have shown
similarity in their sequences within a family, often with only one nucleotide difference, it is also
tricky to detect the certain miRNAs specifically (Wark, Lee et al. 2008). Northern blotting is one
of the earliest simple methods to detect a single miRNA without chemical or enzymatic
52
modification of the target miRNA before analysis (Wark, Lee et al. 2008). However, this method
is of relatively low sensitivity, requires high time consumption, and a large amount of starting
RNA (Varallyay, Burgyan et al. 2008).
Recently, next generation sequencing (NGS) approach to sequence miRNAs, i.e. via
massively parallel high throughput sequencing, has overcome the limitations of quantitative PCR,
microarrays, and northern blotting methods. NGS offers many advantages to profile miRNA
expression, such as sample throughput and capability to discover novel miRNAs (Metzker 2010;
Vigneault, Ter-Ovanesyan et al. 2012).
The variety of NGS platforms have enhanced our understanding of how miRNAs affect
diseases including cancer, (such as) 454 pyrosequencing (Roche), MiSeq/HiSeq (Illumina), PGM
(Ion Torrent), and the SOLiD system (Life Technologies) are common commercially available
technologies.
For my thesis work, I used the SOLiD system to profile miRNA expression. SOLiD
sequencer uses the sequencing by ligation approach followed by library fragmentation, and uses
an emulsion PCR approach with small magnetic beads to amplify the fragments clonally for
sequencing (http://www.lifetechnologies.com). This method uses two-base encoded probes
which give the primary advantage of improved accuracy in color calling. A universal primer
complementary to adaptor sequence is hybridized to templates which are then amplified to
cDNA and size selected. Next, size selected cDNA libraries are amplified by emulsion PCR for
clonal amplification. Each cycle of 1,2-probe hybridization and ligation, imaging, and probe
cleavage is repeated. The SOLiD NGS chemistry is illustrated in Figure 13. In this method,
fluorescently-labeled oligonucleotide probes are ligated to the primer only if they are perfectly
matched to the upstream sequences. This ligated DNA now serves as a primer, and the next
53
labeled probe is ligated to this if it matches the upstream sequences. The extended product is
removed and the template is reset with a primer complementary to the n-1 position for a second
round of ligation cycles. Five rounds of primer reset are completed for each sequence tag. Thus,
this method has significantly higher specificity and a higher accuracy than the sequencing by
synthesis approach.
Figure 13. Overview of SOLiD sequencing chemistry. (http://www.lifetechnologies.com)
1.4.2. Illumina Infinium HumanMeth450 BeadChip for Methylation
DNA methylation microarrays allow a researcher to study methylation in genome-scale.
Illumina first offered Illumina GoldenGate DNA Methylation BeadArrays for 1,505 CpGs (Byun,
Siegmund et al. 2009) and Infinium HumanMethylation27 BeadChips for 27,000 CpGs (Kanduri,
Cahill et al. 2010). More recently, Infinium HumanMeth450 BeadChip was developed to allow
researchers to study comprehensive genome-scale methylation, with expert-selected coverage
and high sample throughout (Sandoval, Heyn et al. 2011). These unique features make it an ideal
54
solution for epigenome-wide association studies. This chip covers over 450,000 CpGs across a
large number of genes as well as non-coding regions at a single nucleotide resolution. Coverage
is targeted across 99% of RefSeq genes with sites in TSS1500, TSS200, 5’UTR, 1stExon, Gene
body, and 3’UTR (Table2). Furthermore, it covers 96% of CGIs with additional coverage in CGI
shores and shelves. It also covers non-CpG methylated sites identified in human stem cells,
differentially methylated sites identified in tumor tissues compared to normal tissues and miRNA
promoter regions.
Table 2. HumanMethylation450 BeadChip coverage through gene regions.
Gene
location
Description Genes mapped from UCSC
database (% genes covered)
TSS1500 Region between 200 and 1,500 bp upstream
of the transcription start site (TSS)
NM: 17,820 (94%)
NR: 2,672 (88%)
TSS200 Region from the TSS to 200bp upstream of
the transcription start site (TSS)
NM: 14,895 (79%)
NR: 1,967 (65%)
5’UTR Untranslated region at the 5’end NM: 13,865 (78%)
1stExon 1st exon NM: 15,127 (80%)
Body Region between 1st exon and 3’UTR NM: 17,071 (97%)
NR: 2,345 (77%)
3’UTR Untranslated region at the 3’end 13,042 (72%)
Intergenic Sites which are not in the above categories
NM for mRNA confirmed by experiment evidence, NR for RNA - not coding
This BeadChip method is based on a combination of Infinium I and Infinium II techniques,
which both are analyzed using bisulfite converted DNA (Figure 14) (Bibikova, Barnes et al.
2011). The Infinium I use two site-specific probes for each targeted CpGs, one designed for the
methylated locus and another one for the unmethylated locus. The Infinium II uses single-base
extension of probes incorporating a labeled ddNTP. The level of DNA methylation is determined
by the ratio of the methylated probe intensity and the overall intensity (sum of methylated and
55
unmethylated probe intensities) and is called a beta-value (Bibikova, Lin et al. 2006).
Figure 14. Schematic of Infinium I (A) and II (B) technology.
1.5. Research aims
The goal of this introduction is to highlight why identification of epigenetic events in
colon cancer is important scientifically and clinically, and how the identification of genes
affected by epigenetic regulation can be achieved. My thesis project focuses on identification of
aberrant epigenetic events in the MSS/CIMP-neg colon cancer, and includes the identification of
these changes via non-coding RNAs and DNA methylation, using a number of different standard
genetic and epigenetic methods, as well as state-of-the-art techniques such as NGS and Illumina
HumanMethylation450 BeadChip. Figure 15 illustrates the outline of the three aims for this
56
thesis. The three aims are focusing to the study of the following epigenetic changes; (1) Profiling
of miRNAs and their associated potential target genes in colon cancer using data from NGS and
Affymetrix Exon arrays, respectively; (2) Identification of small and long novel ncRNAs at the
8q24 region; a region which contains one of the most relevant colon cancer risk variants, SNP
rs6983267, and including an in-depth look on the possible role of these ncRNAs in colon cancer;
(3) Elucidation of the landscape of the genome-scale DNA methylation in colon cancer using the
Illumina HumanMethylation450 BeadChip.
Through these projects, I have conducted studies to understand the aberrant epigenetic
events in colon cancer and how they are related to colon cancer development.
Figure 15. Outline of the three Thesis Aims.
1.6. Significance
Thanks to the technology revolution, state-of-art technologies such as the microarray
and NGS give us an excellent way to discover new epigenetic alterations. Moreover, they allow
us to comprehensively to understand the landscape of epigenetic changes. My dissertation
57
project on the genome-wide epigenetic changes including non-coding RNAs and DNA
methylation may further our understanding of the genes and pathways involved in MSS/CIMP-
neg colon cancer development and can give biological insights into possible new genes to be
used as biomarkers.
58
CHAPTER 2
COMPREHENSIVE PROFILING OF EXPRESSION ALTERATIONS IN
KNOWN AND NOVEL MIRNAS IN COLON CANCER
2.1. Introduction
It is now known that miRNA alterations are involved in the initiation and progression of
human cancer. Next Generation Sequencing (NGS) offers an opportunity to identify these
alterations genome-wide, comprehensively and accurately. An additional advantage of NGS is
the ability to detect expression differences for even the low-abundance miRNAs which may be
functionally significant but cannot be detected by hybridization-based methods (such as
microarrays). The goal of this aim was to comprehensively profile alterations in the expression of
miRNAs that may contribute to the development and progression of the MSS/CIMP-negative
colon cancer.
2.2. Materials and method
2.2.1. Small RNA extraction and quality checks
Colon tissue samples (tumor and normal) were collected from patients with biopsy-
confirmed adenocarcinoma of the colon. These samples are a subgroup of samples described in
Table 3. The tissues, sectioned by a pathologist, were fresh frozen and stored in liquid nitrogen.
The tissues were immersed in RLT Plus Lysis buffer (Qiagen Inc, Valencia, CA), thawed and
homogenized. This was followed by simultaneous DNA/RNA extraction using the AllPrep
DNA/RNA kit (Qiagen). Isolated total RNA was stored at a -80ºC freezer and the DNA at -20ºC.
59
For small RNA, the flow-through from the RNA-column was collected and miRNA-enriched
fractions were extracted with RNeasy MinElute cleanup kit according to the manufacture’s
Supplementary protocol (Qiagen). Small RNA fractions contained small RNA with 5 to42%
miRNA (mean 22%) as could be seen as the 10-40 nt fraction on the Agilent Small RNA chip.
Sample information is shown in Table 1. Ten fresh frozen colon adenocarcinomas and 10
adjacent normal tissues were collected from colon cancer patients at three participating centers of
the Colorectal Cancer Family Registry (Mayo Clinic, Mount Sinai Hospital, and Cleveland
Clinic).
Table 3. Sample information for NGS miRNA study.
Tumor
sample
Tissue
type Gender Site KRAS TNM stage
T1* Tumor Female Right Mutation IV
T2 Tumor Female Right Mutation III
T3 Tumor Male Right Wild type II
T4 Tumor Male Right Mutation III
T5* Tumor Male Right Mutation N/A
T6* Tumor Female Right Mutation II
T7* Tumor Male Right Wild type II
T8* Tumor Male Left Mutation III
T9* Tumor Male Right Wild type II
T10 Tumor Male Left Mutation N/A
* Paired normal was also analyzed
N/A; not available
2.2.2. SOLiD sequencing
Small RNA fraction (<200bp) was processed into sequencing libraries using the Small
RNA Expression Kit (SREK, Applied Biosystems, Foster City, CA). Briefly, RNA was ligated
overnight with the “A” adaptors from the kit, reverse transcribed, RNAse H-treated, and PCR
amplified before size selection on polyacrylamide gels to isolate the amplicons with 18-30
60
nucleotides of insert sequences. In addition to the expected 100-120 bp amplicons there was an
additional distinct band at or just above the 120 bp length containing a 30 nucleotide transcript,
so two libraries were made from each of the 20 samples to obtain a larger range of small RNAs;
one library from the region between the 100 and 120 bp of ligated amplicons for the classic
miRNAs (18-24 bp long) and the other for the 30 bp long small RNAs. After checking the size of
the prepared two sets of libraries on the DNA 1000 chip (Agilent, Santa Clara, CA), libraries
were amplified onto beads using emulsion PCR, deposited onto slides, and sequenced using the
SOLiD sequencing system at the Applied Biosystems (ABI) facility. Results were obtained in a
csfasta format. The ten different samples were distinguished by labeled amplification primers in
the SREK kit providing unique barcodes and all ten libraries of certain size were mixed and
sequenced on a single slide. However, since the initial data analysis indicated that a significant
proportion of classic miRNAs were included in the library of the larger small RNA species
(likely due to the inaccuracy of gel based size selection), the reads from the two separate libraries
for each sample were merged before analysis.
2.2.3. Statistical data analysis of SOLiD sequencing
GeneSifter (Geospiza, Inc., Seattle, WA) was used to align the sequences to miRBase
version 14, and Partek Genomics Suite (Partek Inc., St. Louis, MO) was used to carry out
statistical analyses. Two mismatches for read lengths were allowed. To quantify and compare
miRNA expression across datasets, corrected read counts were scaled into “reads per million”
(RPM) (the most common way to normalize reads in the NGS samples) by GeneSifter. Using
the Partek software, fold change filters were applied to select the miRNAs that were regulated
more than 2-fold of log2 transformed RPM. P-values were calculated using one-way ANOVA
61
and adjusted by Bonferroni correction for multiple testing.
2.2.4. Technical Validation and Replication using realtime RT-qPCR
Technical validation and replication analyses were performed using real-time RT-qPCR
and TaqMan assays (Applied Biosystems, Foster City, CA) for mature miRNAs. The cDNA was
synthesized from small RNA (<200bp) using gene specific primers according to the Multiplex
RT-TaqMan Assay protocol with preamplification of the RT product for 6 newly identified
miRNAs (miR-549, miR-602, miR-638, miR-935, miR-1180, and miR-1268) and 5 cancer
marker miRNAs (miR-18a, miR-20a, miR-21, miR-31 and miR-143), respectively. Reverse
transcription was performed with 0.05x RT primer pools using the following program: 30 min at
16°C, 30 min at 42°C, 5 min at 85°C, and a hold at 4°C. After reverse transcription,
preamplification was done with 0.2x TaqMan miRNA assay pool according to the
PreAmplification Protocol provided by Applied Biosystems’ technical support team. Briefly, 500
pg of small RNA based on the Bioanalyzer Small RNA chip analysis was used for multiplex
reverse transcription, and 2.25 ul of RT product was used for the preamplification step. The RT-
qPCR was performed using 4.5 ul of 1 in 8 diluted preamplified sample, each specific miRNA
assay, and 2x Universal master mix (Applied Biosystems). All reactions were done in a total
reaction volume of 10 ul using relative quantification by real-time PCR on an Applied
Biosystems 7900HT system. Thermal cycling program used for the quantification was as follows:
95°C for 10 min, followed by 45 cycles of 95°C for 15 sec and 60°C for 1 min. Small RNA input
was normalized with the average of two endogenous controls: RNU48 and miR-16 using the
deltadelta Ct method with the formula 2−∆∆CT to calculate the fold change (Livak and
Schmittgen 2001). Each measurement was performed in duplicates and no-template (water)
controls were included for each assay. Data analysis was performed with the RQ Manager 1.2.1.
62
(Applied Biosystems).
2.2.5. Sample processing for Affymetrix Exon Arrays
Nine colon normal and 10 tumor samples including 6 paired samples analyzed by SOLiD
sequencing were also analyzed on Affymetrix Exon arrays (Affymetrix Inc, Santa Clara, CA). 1
μg of total RNA for each sample was first processed using a ribosomal RNA (rRNA) reduction
procedure as suggested by Affymetrix. The rRNA reduction was verified by running the reduced
RNA samples on the Bioanalyzer (Agilent Technologies, Santa Clara, CA). After rRNA
reduction, the Affymetrix GeneChip® Whole Transcript (WT) Sense Target Labeling Assay
(Affymetrix) was used to generate amplified and biotinylated sense-strand DNA targets for
hybridization on GeneChip® Exon 1.0 ST Arrays following the Affymetrix protocol. Briefly,
double stranded cDNA was derived from 1ug of concentrated rRNA-reduced RNA using T7-
(N)6 random hexamers. This was followed by in vitro transcription to produce amplified
antisense cRNA, which was converted back to single-stranded sense DNA. 5.5ug of sense DNA
was enzymatically fragmented, checked on Bioanalyzer for the appropriate size, terminally
labeled with biotin and hybridized onto Exon Arrays. After an 18 hour-hybridization, the arrays
were washed and stained using the GeneChip® Hybridization, Wash and Stain Kit and the
suggested protocol. The arrays were scanned on The GeneChip® Scanner 3000 7G using the
AGCC (Affymetrix GeneChip® Command Console®) Software to measure the fluorescent
signal intensities at each probe location.
2.2.6. Integrated analysis and Ingenuity pathway analysis
The data used for integration analysis consisted of 715 miRNAs and 18,415 target
mRNAs. The predicted targets for 19 miRNAs were extracted from microCosm and
63
TargetScan 5.1 by Partek Genomics Suite software and analyzed for correlation to differential
miRNA expression. Correlation analysis of the differentially expressed predicted target mRNAs
for the top 19 miRNAs was conducted using Pearson’s correlation and the Benjamini-Hochberg’s
false discovery rate (BH-FDR) (with a q-value cut-off <0.05). Ingenuity Pathways Analysis (IPA,
Ingenuity Systems, Inc., Redwood City, CA) was used to identify the biological functions of the
target genes and the involved pathways.
2.3. Results and discussion
2.3.1. Quality checks of small RNAs
To verify the extracted small RNA quantity and quality, 1uL of each RNA sample was
analyzed on the RNA 6000 Nano chips for total RNA, and on the Small RNA chips for the small
RNA, using the Agilent Bioanalyzer RNA chips (Figure 16).
Figure 16. The quality of the total RNAs and the small RNA preparations. Total RNAs and
A
B
64
the small RNAs were checked on Bioanalyzer using the RNA 6000 Nano chip (A) and the
Small RNA chip (B), respectively.
2.3.2. Small RNA library preparation
Amplified cDNA library was made via multiple steps including small RNA ligation with
adaptors that include a defined sequence required for SOLiD NGS, reverse transcription of the
ligated small RNAs, and PCR amplification with “barcode” sequences. To concentrate the
amplified cDNA library and to remove the PCR by-products, size selection was done on
polyacrylamide gel electrophoresis and the purified cDNA library was analyzed on the DNA
1000 chip (Bioanalyzer) (Figure 17).
Figure 17. The quality of the size selected cDNA libraries was checked on the Bioanalyzer
using the DNA 1000 chip.
Interestingly, there was a distinct band at 120 bp containing 30 nucleotides (Figure 17,
65
bottom figure), so I made two libraries from each of the 20 samples to obtain all of the small
RNAs; one from the region between the 100 bp and 120 bp for classic miRNAs (18-24 bp long)
and the other of the 120 bp fragments for small RNAs, as the ABI recommended. The prepared
libraries were amplified on beads using emulsion PCR and sequenced by ABI.
2.3.3. Deep sequencing of small RNAs
The sequencing process yielded on average 35,6 million and 31,9 million sequences from
normal (n=10) and tumor tissues (n=10), respectively. Among the total sequencing reads, 11.9%
of the reads in normals and 5.1% in tumors were mapped to miRBase version 14 (Table 4). These
miRNAs were further analyzed for differential expression in colon cancer. Read length
distribution of the miRBase mapped sequences (miRBase version 14) were from 18 nt to 33 nt
(Figure 18) in all samples. Most common sizes were 21 nt (13%), 22 nt (30.4 %) and 23 nt
(33.6%), as expected.
Table 4. NGS Mapping Results.
66
Figure 18. Read
length distribution
(nt, number of
nucleotides) of
sequences mapped
to miRBase. The pie
chart depicts the
percentage of read
lengths relative to
the total number of
reads, averaged
over all 20 samples.
2.3.4. Evaluation and preliminary analyses of the NGS data
For a preliminary feasibility study and to further evaluate the quality of the small RNA
preparations, the expression patterns of well-studied oncogenic (n=13) and tumor suppressor
miRNAs (n=8) in colon adenoma and carcinoma miRNAs were confirmed first (Figure 19).
Except for three miRNAs (let-7g, miR-200c, miR-320), these known miRNAs were significantly
differentially expressed in our NGS data with 100% and 63% (5/8) concordance in the direction
for the oncogenic and tumor suppressor miRNAs, respectively, at BH-FDR p<0.05. Among those
miRNAs, five selected miRNAs were confirmed by quantitative PCR in 5 normal and 6 tumor
tissues; including 4 oncogenic miRNAs (miR-18a, miR-20a, miR-21 and miR-31) and 1 tumor
67
suppressor miRNA (miR-143) (Figure 20). A high correlation (on average, r=-0.91) between
SOLiD (reads) and qPCR (delta Ct) was found. This preliminary data showed that the NGS
performed well and the results were reliable for the further analysis.
Figure 19. The strongest oncogenic and tumor suppressor miRNA candidates in colon
tumor tissues. The SOLiD data were compared to published data of 21 well-studied
miRNAs (Faber, Kirchner et al. 2009). Big green and red arrows indicate previously
reported changes, and small ones indicate expression change directions in our SOLiD data.
68
Figure 20. Expression values from SOLiD sequencing (X-axis) plotted against qPCR delta
Ct values (Y-axis). qPCR deltaCt = (Ct miRNA - Ct average of RNU48 and miR-16).
A second preliminary data analysis was performed to see the effect of clinical phenotype
factors and the risk genotype on miRNA expression. This analysis was done using a t-test, and
the p-value was adjusted with Benjamini-Hochberg (BH) or Bonferroni correction. A p-value less
than 0.05 was considered to be indicating a significant difference between groups. Table 5 shows
the numbers of significant miRNAs from 21 comparisons including phenotype factors such as
different tissue type (tumor and normal), tissue origin (left and right), gender (male and female),
and the genotype of rs6983267 (GG or TT). Surprisingly, about 35% of analyzed miRNAs
(252/715) were significantly differentially expressed at BH-FDR of <0.05, and differential
expression of the majority of these miRNAs was observed in tumors compared to the normals in
both the pooled and paired analyses. While all the 21 comparisons didn’t yield significant
findings, I Interestingly, there was also a significant difference between the pooled GG tumors
69
and GG normals; between the right side tumors and normals; as well as between the male tumors
and male normals, but these findings should be carefully investigated because of the limited
sample sizes in the reference groups (patients with TT genotype, left tumors, and the female
patients).
Table 5. Numbers of significant mRNAs in 21 comparisons.
2.3.5. MiRNA expression profiles in colon normal and tumor tissues
To determine expression alterations among 715 known miRNAs in colon tumors, I
analyzed the SOLiD NGS data using the Analysis of Covariance (ANCOVA). As I observed in
the preliminary analysis (Table 5), miRNA profiling showed substantial differential expression
patterns between tumors and normals (F-Ratio=6.71) while other factors such as gender (male
and female, F-Ratio=1.46), tumor tissue origin (right and left, F-Ratio=1.15), KRAS mutation
status (F-Ratio=1.08) and TNM stage (II, III and IV, F-Ratio=1.8) had a smaller effect on
miRNA expression (Figure 21A) and were not confounding factors in the tumor versus normal
Analysis Group1(control) Group2 Up Down Up Down Up Down
1 AllNormal(n=10) AllTumor(n=10) 261 25 231 21 11 1
2 TTTumor(n=10) GGTumor(n=10) 28 0 0 0 0 0
3 TTNormal(n=5) GGNormal(n=5) 3 20 0 0 0 0
4 TTNormal(n=5) TTTumor(n=5) 109 5 0 0 0 0
5 GGNormal(n=5) GGTumor(n=5) 179 19 52 8 1 0
6 AllNormal,paired(n=6) AllTumor,paired(n=6) 403 15 235 4 1 0
7 TTNormal,paired(n=4) TTTumor,paired(n=4) 130 4 0 0 0 0
8 GGNormal,paired(n=2) GGTumor,paired(n=2) 144 8 0 0 0 0
9 AllRight(n=16) AllLeft(n=4) 0 2 0 0 0 0
10 Righttumor(n=8) Lefttumor(n=2) 0 11 0 0 0 0
11 Rightnormal(n=8) Leftnormal(n=2) 9 22 0 0 0 0
12 Rightnormal(n=8) Righttumor(n=8) 250 19 204 15 10 1
13 leftnormal(n=2) lefttumor(n=2) 18 3 0 0 0 0
14 AllFemale(n=5) AllMale(n=15) 33 2 0 0 0 0
15 Femalenormal(n=2) Femaletumor(n=3) 21 2 0 0 0 0
16 Malenormal(n=8) Maletumor(n=7) 241 23 187 17 3 1
17 Femalenormal(n=2) Malenormal(n=8) 8 0 0 0 0 0
18 Femaletumor(n=3) Maletumor(n=7) 24 6 0 0 0 0
19 KRASwtintumor(n=3) KRASmtintumor(n=7) 1 8 0 0 0 0
20 StageIItumor(n=4) StageIIItumor(n=3) 4 18 0 0 0 0
21 Localizedtumor(n=5) Advancedtumor(n=3) 7 37 0 0 0 0
Rawp-valueBenjaminiand
HochbergBonferroni
70
comparison. Interestingly, Principal Component Analysis (PCA) showed clear separation of
tumors (red) from normals (blue) based on the entire set of the 715 microRNAs analyzed (Figure
21B).
Figure 21. Source of Variation and Principal Component Analysis (PCA). A. Sources of
variation. B. Principal component analysis (PCA) scatter plot of all normalized NGS data
for tumors versus normals. The points are colored and connected to the centroid of each
tissue group, and ellipsoids are drawn for each group as well. X-axis, first principal component
(PC1); y-axis, second principal component (PC2).
Apparent deregulation of 392 (375 up and 17 down) miRNAs out of the 715 mapped
miRNAs was observed at absolute fold change ≥2 with a raw p-value of <0.05. Nineteen
miRNAs (18 up and 1 down) were significantly differentially expressed after Bonferroni
correction at p<0.01 (corresponding to a raw p-value <0.000007). Six out of the 19 miRNAs;
miR-30a, miR-31, miR-135b, miR-182, miR-183 and miR-202 have previously been reported in
colon cancer and have also been studied for their effect on putative target gene expression in
both/either colon and/or other cancers (Table 6). Among them, miR-31 showed the largest fold
change in tumors compared to normals in our study (fold change of 41.88). Importantly, the
A B
71
rest of the miRNAs (miR-220b, miR-365-1, miR-549, miR-588, miR-602, miR-638, miR-935,
miR-937, miR-1180, miR-1268, miR-1292, miR-1909 and miR-1914) were first to be found in
our study suggesting a novel association of these 13 miRNAs with colon cancer (Figure 22A).
Expression levels based on average log2 transformed RPM numbers (Reads Per Million) for the
13 newly identified miRNAs (new-miRNAs) (RPM: 2.66 in 10 normals, RPM: 4.44 in 10 tumors)
were substantially lower than those for the 6 known-miRNAs (RPM: 6.66 in 10 normals, RPM:
9.14 in 10 tumors) (data not shown). Likely for this reason, I was able to successfully amplify
only six (miR-549, miR-602, miR-638, miR-935, miR-1180, and miR-1268) out of the 13 new-
miRNAs using real-time RT-qPCR assays for the mature miRNAs. Subsequently, these 6
miRNAs were successfully technically validated in the samples used for the SOLiD analysis (5
normals and 6 tumors, including 4 paired samples). The extent of differential expression between
the tumor and normal samples was not identical between the NGS and RT-qPCR, but all 6
miRNAs were confirmed as upregulated in tumor versus the normal tissue. To confirm the
relevance of these CRC new-miRNAs, Differential expression of five of the newly discovered
miRNAs (miR-549, miR-602, miR-638, miR-935 and miR-1268) was further replicated in
another set of 8 MSS/CIMP-negative paired tumor and normal samples (Figure 22B).
Interestingly, miR-365-1 is actually a pre-miRNA form of miR-365 but I was only able to
examine the expression level of the mature miR-365 because there were no pre-designed
TaqMan assays for the pre-miRNA of miR-365-1. Unlike overexpression of the pre-miR-365-1
in tumors by NGS analysis, downregulation of the mature miRNA was observed in the tumors of
the validation sample set by RT-qPCR.
Table 6. Expression levels and putative targets of 6 differentially expressed known colon tumor miRNAs.
73
Figure 22A. Tumor vs. normal miRNA expression profiles of 13 newly identified miRNAs.
A. Box plots of gene expression levels for 13 newly identified miRNAs in tumors (n=10, blue
dots) compared to normals (n=10, red dots). Each point represents the normalized miRNA
expression levels for an individual. The median gene expression level for each genotype
specific group is indicated by a line inside each box within the graph. Paired samples are
connected by a line. The p-value indicates the significance of the miRNAs’ expression
tumor versus normal. Corrected p-value indicates the significance after the Bonferroni
correction for multiple comparisons.
A
74
Figure 22B. Quantitative real-time RT-PCR levels of the 6 newly identified miRNAs (new-
miRNAs) in the validation (n=11) and replication (n=16) samples. Expression values were
calculated relative to the average of the RNU48 and miR-16 levels, and assays were
performed in duplicates. The data are shown as mean ± standard error of the mean.
B
75
2.3.6. Correlation of expression levels between 19 differentially expressed tumor miRNAs
and their predicted target genes
Expression levels of possible targets of the 19 altered miRNAs were verified using
summarized gene expression levels from Affymetrix Exon arrays. Since one of the normal
samples was not available for the Affymetrix Exon arrays, the integrative analysis was done
using 9 normals and 10 tumors, which were profiled for miRNAs using NGS. In general, the
expression levels of miRNAs were both positively and negatively correlated with their predicted
targets for all the miRNAs analyzed with a Pearson correlation FDR q-value <0.05. Although the
negatively correlated miRNA::mRNA pairs suggest potential direct interactions (i.e. upregulated
miRNA expression correlates with target gene downregulation), positively correlated ones were
also indicated in the analysis, suggesting an indirect mechanism of the gene regulation (not
mediated via the miRNA seed region). The predicted target genes for the 6 known and the 13
new colon cancer miRNAs were extracted from TargetScan 5.1 (TS) and microCosm (MC) using
Partek Genomics Suite and the expression levels of the mRNAs were integrated with the
expression levels of miRNAs using the Pearson correlation analysis.
Using targets from both sources, 166 and 41 genes were significantly correlated with 6
known miRNAs by TS and MC, respectively (data not shown). Among them, 10 genes were
predicted by both databases (r=±0.62-0.83, FDR q<0.05). Since there were no predicted targets
for a subset of the 13 new-miRNAs from neither TS nor MC, only 7 and 8 new-miRNAs were
analyzed for the correlation with their potential target by TS and MC, respectively. Ninety-two
pairs with 88 genes were significantly correlated at an FDR q<0.05 using combined TS and MC
databases (data not shown). The Table 7 shows the list of significantly differentially expressed
target genes (FDR q<0.05) among correlated targets (with Pearson Correlation FDR q<0.05) for
76
known and the newly identified colon cancer miRNAs.
Table 7. Significantly differentially expressed target genes (FDR q<0.05) among correlated
targets (FDR q<0.05). Negative correlations between miRNAs and target genes are indicated by
bolded gene names.
KnownmiRNAs TargetScan microCosm NewlyidentifiedmiRNAs TargetScan microCosm
hsa-mir-135b RNF43 TCFL5 hsa-mir-220b RICH2 -
MMP11 - hsa-mir-365-1 ANK3 CLMN
PCMTD2 - hsa-mir-365-1 CPT1A GTF2F2
hsa-mir-182 EFNA5 RPUSD4 hsa-mir-365-1 CNTN4 RPS6
ZZEF1 TCFL5 hsa-mir-365-1 NR3C2 TNFSF15
KIAA0513 - hsa-mir-365-1 PDE4D INHBA
NUMB - hsa-mir-365-1 MYLK RPUSD4
RELL1 - hsa-mir-365-1 XPO4 TFAP4
ME2 - hsa-mir-365-1 CCND1 C13orf18
TTYH3 - hsa-mir-365-1 CDC25A LSM8
TRIB3 - hsa-mir-365-1 SET -
hsa-mir-183 - PIAS2 hsa-mir-549 NEGR1 TMIGD1
- DDX31 hsa-mir-549 KIAA0430 C6orf105
- TCFL5 hsa-mir-549 RICH2 NKD1
hsa-mir-202 AKAP13 ABR hsa-mir-549 MFAP5 -
SYT7 RABEPK hsa-mir-549 LPIN2 -
ACSL6 C13orf18 hsa-mir-549 MYO5A -
CCND1 CDH3 hsa-mir-549 RAG1 -
- IFRD2 hsa-mir-549 C1orf135 -
hsa-mir-30a C13orf18 C13orf18 hsa-mir-549 MET -
C19orf50 C19orf50 hsa-mir-549 GALNT2 -
PLXNA1 CSNK1E hsa-mir-588 KLF12 -
CDCA7 - hsa-mir-588 FOXN3 -
EYA2 - hsa-mir-588 TNS4 -
NEDD4L - hsa-mir-588 RHBDF1 -
WDR7 - hsa-mir-602 NKX2-3 -
SCARA5 - hsa-mir-638 EPHA7 TMEM161A
RELL1 - hsa-mir-935 APC TACSTD2
NR3C2 - hsa-mir-935 SH3TC2 C13orf18
LIFR - hsa-mir-935 PLXNA1 -
NDEL1 - hsa-mir-935 C13orf18 -
MARCH8 - hsa-mir-937 - LSM7
C14orf43 -
hsa-mir-31 NUMB ITPA
77
To explore the biological implications associated with the significantly correlated target
genes from TS and MC targets, for both the known and newly identified colon cancer miRNAs,
269 unique genes were analyzed to identify functional networks that are possibly impacted by
the miRNAs. IPA analysis indicated that the target gene list is enriched for genes having a role in
cell cycle, cancer, and gastrointestinal disease (Table 8). About 13% (36/269) of these genes have
previously been reported to be deregulated in colon cancer (Table 9) either by gene expression
alterations, mutation or methylation. The altered expression profiles agreed with the previous
studies in 89% and 100% in down- (tumor suppressors) and upregulation (for oncogenes) in
colon tumor tissues (Table 9). Interestingly, Wnt/beta-catenin signaling was shown as the top
canonical pathway in the IPA analysis and ten potential targets are involved in this pathway
(Figure 23). Wnt/beta-catenin signaling is one of the best known activated pathways related to
colon cancer development (Kinzler and Vogelstein 1996). The activation of this pathway is
associated with the downregulation or mutation of APC genes. This gene was negatively
correlated with miR-935 expression in this study.
In conclusion, the 19 altered miRNAs that were found may have an important role in
colon cancer development through the regulation of target oncogenic or tumor suppressor
mRNAs that play important roles in cell cycle and the Wnt/beta-catenin pathway.
Table 8. Top IPA network of the CRC miRNA target genes.
a score = -log10(p-value)
b Number (percentage) of the genes in the given input gene list which are involved in a
network
78
Table 9. Significantly correlated target mRNAs that have been reported in other studies and their differential expression in
colon tumor compared to normal tissues in the current study.
79
Figure 23. Genes significantly correlated with top miRNAs in the Wnt/beta-catenin
signaling pathway in colon tumor tissue. Wnt/beta-catenin pathway genes that were
significantly correlated with significant altered expression of miRNAs in colon tumors
compared to adjacent normal tissues are indicated by red for positive correlation or green
for negative correlation. Each miRNA that is correlated with the gene’s expression is
shown in the blue box.
80
2.3.7. Previous findings of known colon cancer miRNAs in cancers
Although bioinformatics tools provide means to predict potential target genes for
miRNAs via computational algorithms, the predicted target genes should be further validated
experimentally. It is important to elucidate the roles of the miRNAs via the target mRNAs
because one miRNA could downregulate several target genes. Thus, I searched the literature to
support and validate my findings, in order to understand the potential roles of the known and
the newly identified miRNAs in this study.
Six out of the 19 top miRNAs; miR-30a, miR-31, miR-135b, miR-182, miR-183 and
miR-202 have previously been described among the most consistently deregulated in colon
cancer in general (Bandres, Cubedo et al. 2006; Nagel, le Sage et al. 2008; Kim, Choi et al.
2009; Ng, Chong et al. 2009; Sarver, Li et al. 2010). All of them except miR-30a are up-
regulated in colon cancer (indicating oncogenic miRNAs). MiR-30a, which is suggested to
have a tumor suppressor-like function, was downregulated in our data, supporting the findings
by others (Schetter, Leung et al. 2008; Zhong, Bian et al. 2013) (Wang, Zhou et al. 2009).
MiR-30a can inhibit mitochondrial fission by suppressing the expression of p53 (Li, Donath et
al. 2010). In our study, upregulation of C13orf18, C19orf50, PLXNA1, CSNK1E and CDCA7
and downregulation of EYA2, NEDD4L, WDR7, SCARA5, RELL1, NR3C2, LIFR, NDEL1,
MARCH8 and C14orf43 was significantly correlated with downregulated miR-30a (r>±0.62,
FDR corrected q<0.05). Notably, C13orf18 and C19orf50 were identified in both microCosm
and TargetScan 5.1 databases as targets of miR-30a. Genes EYA2, NR3C2 and CSNK1E
function as transcription factors or activators. Transcription factors E2F3 and TFDP1 have
been found overexpressed in various cancers and are thought to have a key role in cell cycle
regulation (Yasui, Arii et al. 2002; Foster, Falconer et al. 2004). In our study, E2F3 was
81
significantly negatively correlated with miR-30a (r=-0.64, q=0.46), but not differentially
expressed in tumors compared to normals.
In the study of Wang et al., miR-31 expression was positively associated with
advanced TNM stage, suggesting overexpression of miR-31 may be involved in the
development and progression of CRC (Wang, Zhou et al. 2009). In our analysis of a limited
number of samples, there was no difference in miR-31 expression in stages III and IV versus
stage II, only in MSS tumor versus normal. Interestingly, among putative targets of miR-31, the
most notable oncogenic targets are a member of Wnt signaling pathway, AXIN1, and the
forkhead family transcription factors FOXP3 (as indicated by microCosm). However, I didn’t
see significant correlations in expression of these two genes with miR-31 (AXINI: r=0.46,
raw p=0.45; FOXP3: r=0.39, raw p=0.1). The most significantly correlated genes were NUMB
(r=-0.75, q=0.027) and ITPA (r=0.79, q=0.03) and those genes were also significantly
differentially expressed in tumors compared to normals (q<0.05). Interestingly, NUMB, a
negative-regulator of Notch-1, has been reported to regulate p53, preventing its degradation
(Colaluca, Tosoni et al. 2008) and down-regulation of NUMB has been studied in advanced
colon cancers (Meng, Shelton et al. 2009).
More than 60% of all colorectal adenomas and carcinomas carry a mutation in the
APC gene. The APC is known as a tumor suppressor in its capacity to properly regulate
intracellular β-catenin levels (Powell, Zilz et al. 1992), and it encodes a multifunctional protein
that may participate in several cellular processes such as cell adhesion and migration, signal
transduction, microtubule assembly and chromosome segregation. Studies of the functional
roles of individual miRNAs has demonstrated that miR-135a and miR-135b directly target the
3'-UTR region of the APC, suppresses its expression and activates Wnt/beta-catenin signaling,
82
and overexpression of miR-135 has been observed in colorectal adenomas and carcinomas
(Nagel, le Sage et al. 2008). However, I didn’t see any significant correlation between APC and
miR-135b expression in our tumor set (r=-0.21, raw p=0.38). I observed downregulation of
SMAD4, a key mediator of the TGF-β pathway (and which is mutated and/or deleted in many
cancers including colon cancer (Ali, McKay et al. 2010)) to correlate with miR-135b
upregulation (r=-0.66, raw p=0.0.002), although this gene was not significantly differentially
expressed in tumors compared to normals. I also observed significant positive correlation
between miR-135b and RNF43, TCFL5, MMP11 and PCMTD2 (r>±0.7, FDR corrected
q<0.05).
Recently, highly significant differential expression of miR-182 and miR-183 in colon
tumors was reported in both MSI-high and MSS colon tumors (Sarver, French et al. 2009)
and miR-183 was also studied as an oncogene with the transcription factor EGR1 as a
putative target (Sarver, Li et al. 2010). Here, in this study, among the predicted target genes,
I didn’t see a correlation between miR-183 and EGR1 expression (r=0.25, raw p=0.29), but
instead, downregulation of PIAS2 and upregulation of DDX31 and TCFL5 were associated
with miR-183 overexpression. Downregulation of EFNA5, ZZEF1, KIAA0513, NUMB,
RELL1, and ME2 and upregulation of TTYH3, TRIB3, RPUSD4, and TCFL5 were correlated
with miR-182 overexpression. All of the above genes (except for EGR1) were significantly
correlated with miR-182 or miR-183 (r>±0.62, FDR corrected q<0.05) as well as
significantly differentially expressed in tumors relative to normals (FDR corrected q<0.05).
2.3.8. Previous findings of colon cancer new-miRNAs in cancers
Up-regulation of miR-202 (9.15 fold compared to healthy controls) has previously been
83
shown in plasma and biopsy samples of CRC using RT-qPCR based miRNA profiling arrays
(Ng, Chong et al. 2009). A putative target gene of miR-202, CCND1, is a key regulatory
protein of the cell cycle and overexpression of this gene has been associated with increased cell
proliferation and poor prognosis in CRC (Le Marchand, Seifried et al. 2003). In this study, I
observed significant positive correlation between miR-202 and CCND1 and also CDH3, a gene
involved in Wnt/beta-catenin signaling, as well as negative correlation between miR-202 and
AKAP13, a gene related to apoptosis (r>±0.65, q<0.05). These genes are also significantly
differentially expressed in the MSS tumors compared to normals (with q<0.05).
Although one might expect some markers to be expressed specifically in one tumor
type alone, it is conceivable that others would be expressed in a range of tumors. Here, I have
found 13 miRNAs to be potential new colon tumor markers and to our knowledge, this is the
first finding of these miRNAs in resected colon cancer tissues; them being miR-220b, miR-
365-1, miR-549, miR-588, miR-602, miR-638, miR-935, miR-937, miR-1180, miR-1268,
miR-1292, miR-1909 and miR-1914 (except for the finding of upregulated miR-549 in a mixed
set of tumors from colon and rectum, more details below). Cummins et al. previously
reported colorectal “microRNAome” results in 4 colorectal cancer cell lines, 2 colon normals
and 2 tumor tissues using a new developed approach called miRNA serial analysis of gene
expression (miRAGE) (Cummins, He et al. 2006). In his study, four of our newly identified
miRNAs (miR-549, miR-588, miR-602 and miR-638) were identified in one of the colorectal
cancer cell lines (HCT-116). However, none of them had differential expression in colon
tumors as compared to normal tissue. A few past studies have examined expression levels of
these 13 miRNAs in other cancers. Our study is looking into the roles of these newly
identified miRNAs in colon cancer based on the observed miRNA::mRNA interactions.
84
Interestingly, miR-220b, one of our newly identified colon tumor miRNAs, was
withdrawn from the miRBase based on a study by Chiang’s group (Chiang, Schoenfeld et al.
2010), and it has been suggested that the absence of miR-220b in data sets might reflect either
inaccuracy of its annotation or very low expression. Here, I identified this low expressed
miRNA as consistently upregulated in colon tumors, which suggest that miR-220b may play an
important role as an oncogenic miRNA despite of its low abundance (average log2 transformed
RPM: 3.75 in normals, 6.37 in tumors). Downregulation of RICH2 mRNA significantly
correlated with upregulation of miR-220b in tumors (r=-0.66, corrected FDR q=0.04) and the
RICH2 was also differentially expressed in tumors compared to normals (FDR corrected
q=0.04) in our data set.
I identified upregulation of miR-365-1, a pre-miRNA of miR-365, in colon cancer by
NGS. Despite of the consistent and significant upregulation of this small pre-miRNA in our
study, the mature miR-365 was not significantly deregulated in our NGS data and in fact I
observed downregulation of the mature form by RT-qPCR (Figure 20). Downregulation of the
mature miR-365 was also observed in a recent colon cancer study (Nie, Liu et al. 2012) as well
as in a small study using 2 paired normal and tumor samples and 4 colorectal cancer cell lines
(Cummins, He et al. 2006). The potential target genes (of the mature miRNA) include
members of the RAS oncogene family, such as RAB1B and RAB22A, and ubiquitin specific
peptidase 33 (USP33) (Yan, Huang et al. 2008), but I didn’t find significant correlation of these
three genes to the miR-365-1. Instead, the most notable targets for miR-365-1 were
upregulated cell cycle related genes CCND1, INHBA, and CDC25A and cell death related
genes PDE4D, GTF2F2, PDE4D, RPS6, TFAP4, TNFSF15 and MYLK (r>±0.66, FDR
corrected q<0.05). These genes were also significantly differentially expressed in tumors
85
compared to normals (q<0.05).
MiR-549 was very recently found to be upregulated in a study that utilized high-
throughput sequencing for a mixed set of colon and rectal adenocarcinomas (Hamfjord,
Stangeland et al. 2012). No further molecular classification was provided for the tumors, but
the finding of an upregulation of miR-549 matches our finding in the MSS/CIMP-negative
tumors of colon. Surprisingly, only one more miRNA (miR-135b) was significantly
deregulated in both studies, which may be due to the higher heterogeneity between the tumors
in the Hamfjord study (with a total of 37 significantly deregulated miRNAs). MiR-549 is
thought to be co-transcribed with the co-located KIAA1199, a gene previously found to be
upregulated in colon cancer (Sabates-Bellver, Van der Flier et al. 2007). However, this gene
was not on the predicted target gene list, so I didn’t study the expression correlation at this time.
MiR-549 has also previously been reported as an oncogenic miRNA targeting leucine zipper
putative tumor suppressor LZTS1 gene in uveal metastatic melanoma (Radhakrishnan,
Badhrinarayanan et al. 2009), but this gene was not significantly correlated with miR-549 in
our colon cancers. Instead, I found downregulation of Dihydropyrimidine dehydrogenase gene
(DPYD) to be associated with overexpression of miR-549 in our study (r=-0.74, FDR corrected
q=0.04). DPYD is the initial rate-limiting enzyme in the degradation of 5-fluorouracil (5-FU),
and is known to be a principal factor in clinical response to the anticancer agent 5-FU. Low
expression of DPYP has been previously observed in colon cancer and one study showed that
aberrant methylation of the DPYD promoter region acted as one of the repressor mechanisms
for DPYD expression (Noguchi, Tanimoto et al. 2004). I also observed DPYD to be
downregulated in the tumor versus normal tissue. Our findings suggest a regulatory mechanism
of DPYD by miR-549.
86
Yang et al. found that miR-602 expression increased with progression of HBV-related
hepatitis to cirrhosis and hepatocellular carcinoma, and noted that the tumor suppressor RAS
association family 1 gene (RASSF1A) was inhibited in cell lines that highly expressed miR-602
(Yang, Ma et al. 2010). However, I could not observe the correlation between miR-602 and
RASSF1 (r=0.15, raw p=0.55) in colon tissues. I did observe that downregulation of a predicted
target gene NKX2-3 was significantly correlated with overexpression of miR-602 (r=-0.64,
FDR corrected q=0.04), although the gene expression was not significantly different in tumors
compared to normals. The NKX2-3 belongs to a large family of related genes that encode
homeodomain-containing transcription factors and is involved in gut and lymphoid organ
formation (Pabst, Forster et al. 2000).
Overexpression of miR-638 (19p13.2) has been found to be associated with
mesenchymal stem cells (MSCs) (Liu, Fu et al. 2009) and it was found to be upregulated in the
plasma of patients with pancreatic cancer (Ali, Almhanna et al. 2010), while downregulation of
this miRNA was observed in gastric cancer tissue compared to normal gastric tissue (Katada,
Ishiguro et al. 2009; Yao, Suo et al. 2009). MiR-638 is significantly upregulated by 5-FU in
MCF-7 breast cancer cells (Shah, Pan et al. 2010) and it has also been identified as regulator of
breast cancer 1, early onset (BRCA1) gene via binding to its target site inside the coding
sequence (Nicoloso, Sun et al. 2010). In our study, BRCA1 was not significantly correlated
with miR-638 (r=0.47, FDR corrected q=0.13), but overexpression of TMEM161A and
downregulation of EPHA7 were significantly correlated with upregulation of miR-638
(r>±0.67, FDR corrected q<0.05). These genes were also significantly differentially expressed
in tumors compared to normals (FDR corrected q<0.05).
Overexpression of miR-935 and miR-937 has previously been identified in cervical
87
cancer (Lui, Pourmand et al. 2007), although, in contrast, miR-935 was found to be
downregulated in stage III/IV ovarian carcinomas by deep sequencing (Wyman, Parkin et al.
2009). Most interestingly, APC is one of the potential targets of miR-935 (based on TargetScan
database) and the expression levels of these two genes were significantly inversely correlated (r
= -0.64 and p=0.0028) in our colon cancer data. I also observed significant downregulation of
the Krüppel-like factor 4 (KLF4) to be associated with miR-935 (r=-0.72, FDR corrected q
=0.045) in colon tumor tissues, although this gene was not significantly differentially
expressed in tumors compared to normal (expression in tumors vs normals: p=0.0047. FDR
corrected q=0.08). KLF4 is a zinc finger-containing transcription factor that inhibits cell
proliferation and its downregulation promotes proliferation and differentiation in epithelial
cells, both during development and in tumorigenesis. Furthermore, KLF4 is suggested to act
as a tumor suppressor in colon tissue (Yori, Johnson et al. 2010). A microRNA mimic to
miR-935 has been found to sensitize the HCT-116 CRC cell line to a BCL-2 family inhibitor
therefore promoting apoptosis (Lam, Lu et al. 2010). Sensitizing miRNAs are expected to be in
low levels in tumors, but surprisingly, I found upregulation of miR-935 in the MSS colon
tumors. Also, interestingly, Shah et al. reported upregulation of miR-935, miR-1180, miR-1268
and miR-1292 in response to 5-FU chemotherapeutic drug in breast cancer cells (Shah, Pan et
al. 2010). MiR-937 has been found to be significantly upregulated in inflammatory versus non-
inflammatory breast cancer (Lerebours, Cizeron-Clairac et al. 2013). The expression levels of
two potential target genes including ABHD11 (r=0.75, FDR corrected q=0.042) and LSM7
(r=0.77, FDR corrected q=0.034) (microcosm) were significantly correlated with expression of
miR-937 in our study. Moreover, LSM7 gene was also differentially expressed in tumors
compared to normals (p=0.0003, FDR corrected q=0.042).
88
MiR-1180 and miR1268 were found upregulated in the plasma of patients with cancers
of pancreas and prostate (Ali, Almhanna et al. 2010). MiR-1180 has been found to be
significantly deregulated by tamoxifen in MCF-7 cells bearing an oncogenic isoform of HER2
(Cittelly, Das et al. 2010). However, there are no significantly correlated predicted targets for
miR-1180 and there aren’t any predicted targets for miR-1268 at all.
Recently, a set of miRNAs called T/B-miRs that are post-transcriptionally regulated by
TGFb/BMP (transforming growth factor b (TGFb)/bone morphogenetic protein (BMP )) were
identified (Davis, Hilyard et al. 2010). The stem region of primary transcripts of T/B-miRs
contains a conserved sequence similar to the Smad binding element (SBE) found in the
promoters of TGFb/BMP regulated genes. MiR-1292 contains the repressive SBE sequence
(R-SBE) (5’-CAGAC-3’) and it is suggested that the biosynthesis of miR-1292 is controlled by
the TGFb-Smad signaling pathway. MiR-1909 and miR-1292 were first identified in human
embryonic stem cells (hESC) by deep sequencing of small RNA libraries and their predicted
targets are related to chromatin remodeling (TargetScan) (Bar, Wyman et al. 2008). However,
there has been no study of this miRNA associated with any disease so far.
MiR-1914, the final miRNA identified in this study, is located at a breakpoint region
(chr20:62,043,262-62,043,341) and has been studied in chronic myelogenous leukemia (CML)
(Albano, Anelli et al. 2010) However it has not been implicated in any other solid cancers
except for the star form, which was found to be associated with the differentiation status in
liver cancer (Murakami, Tamori et al. 2013) and has also been studied in plasma samples of
patients with gastric cancer (Konishi, Ichikawa et al. 2012). Currently, there are no predicted
targets for miR-1914 by neither TargetScan 5.1 nor microCosm
89
2.4. Conclusion
Deep small RNA sequencing in MSS/CIMP-negative colon cancer confirmed
differential expression of 6 known-miRNAs) and 13 new-miRNA; miRNAs that have not been
associated with colon cancer before our study. Most of the 13 are relatively late additions to the
miRBase and may have not been included in many pre-designed miRNA assay panels. This
combined with our observance that all 13 are low abundance RNAs, may have prevented them
from emerging in other colon cancer studies. Our findings underline the importance of an
unbiased and sensitive analysis platform, such as provided by a next generation sequencing
application, for any genome-wide transcriptome analyses. The present study points exciting
directions for future biological functional gene research; the target genes of the newly
identified miRNAs play roles in cell cycle; cell death and cell proliferation and tumorigenesis.
It will be important to examine the functional significance of the identified miRNA::mRNA
interactions. If their relevant role for colon cancer is confirmed in our follow-up In Vitro
studies, these new-miRNAs may be candidates for new biomarkers of colon cancer.
90
CHAPTER 3
IDENTIFICATION OF NONCODING RNAS IN THE 8q24 REGION
SPANNING THE MULTIPLE CANCER RISK LOCUS SNP rs6983267
3.1. Introduction
Large genome-wide association studies (GWAS) have recently identified CRC-
associated loci on 8q23.3, 10p14, 11q23, 15q13, and 18q21 (Tenesa, Farrington et al. 2008;
Tomlinson, Webb et al. 2008). Most interestingly, germline genetic variations in a very gene
poor 8q24 region (128.1-128.7 Mb) were identified by GWAS studies in patients that
developed prostate, colon and ovarian cancers. Among them, the SNP rs6983267 has been
considered as the most promising variant for functional assessment (Poynter, Figueiredo et al.
2007; Huppi, Pitt et al. 2012). The G allele of rs6983267 SNP located in 8q24.21 has been
associated with an increased risk of prostate, ovarian, breast, and colon cancer (Esteller 2008;
Tenesa, Farrington et al. 2008). Despite of the consistent association between rs6983267 and
cancer risk, the molecular mechanism mediating the risk is still largely unknown. The genomic
region spanning rs6983267 at this 8q24 gene desert region was found to contain DNA enhancer
elements (Jia, Landan et al. 2009) and the risk allele G has been shown to produce a stronger
binding site for the Wnt-regulated transcription factor TCF4 (Tuupanen, Turunen et al. 2009).
The rs6983267 is located in the intergenic region 335 kb upstream from the MYC proto-
oncogene which is found to be the most frequently amplified protein-coding gene in cancers
including the colon cancer (Beroukhim, Mermel et al. 2010). I hypothesized that the “gene
desert” locus upstream of c-MYC oncogene that contains rs6983267 may contain non-coding
RNAs that may play a role in colon cancer development.
91
Aim 1: Identification of small non-coding RNAs, miRNAs, using computational
algorithms in 8q24. The aim was to identify novel miRNAs around rs6983267 SNP on
8q24 using computational algorithms In Silico.
Aim 2: Genotype associated expression of known miRNAs in the 8q24 region
The aim was to study expression levels of known 8q24 region miRNAs depending on the
rs6983267 SNP genotype.
Aim 3: Identification of lncRNAs. A highly conserved lncRNA in the 8q24.21 genomic
region encompassing the rs6983267 SNP was recently found. My aim was to study the
expression of this lncRNA in our colon cancer samples and to try to elucidate the possible role
of this lncRNA in colon cancer carcinogenesis.
3.2. Materials and methods
3.2.1. In Silico prediction of potential miRNAs in the 8q24 region
I have utilized several publicly available algorithms.to predict novel miRNAs in a 30kb
region corresponding to the strongest LD block in CEU population representing Caucasians
that harbors the 8q24 SNP rs6983267. Algorithms “ProMiR II” and “miR-abela” were used for
novel pre-miRNA prediction using the genomic sequence. To further confirm that the candidate
sequences were likely microRNAs, the “MiPred” and “CID-miRNA” were used. Furthermore,
the candidates were aligned to known precursor and/or mature miRNAs on the miRBase
website database.
92
3.2.2. Total RNA extraction
All tumor samples were sectioned and stained with hematoxylin and eosin, then
reviewed by a pathologist to determine tumor cell content. Tumor tissue samples with >70%
tumor cell content were used for this study. Total RNA was extracted from the tissue samples
using the AllPrep DNA/RNA Mini kit (QIAGEN, Valencia, CA) following manufacturer’s
recommendations. Isolated RNA quality was checked by Bioanalyzer analysis (Agilent; Foster
City, CA). Sample information is shown in Table 10. Samples were collected from colon
cancer patients at three participating centers of Colorectal Cancer Family Registry (Mayo
Clinic, Mount Sinai Hospital, and Cleveland Clinic).
Table 10. Sample information for the ncRNA study.
Number %
8q24
genotype GG 8 35%
TT 3 13%
GT 12 52%
sex Female 6 26%
Male 17 74%
smoking
status Ex smoker 10 43%
Current
smoker 3 13%
Never smoker 4 17%
N/A 6 26%
TNM stage II 1 4%
III 1 4%
N/A 21 91%
Location Left 18 78%
Right 5 22%
3.2.3. Reverse Transcriptase Quantitative PCR
cDNA was synthesized using the High Capacity cDNA Reverse Transcription kit (Life
93
Technologies, Foster City, CA) and 25ng of cDNA based on the RNA input was used for
quantitative PCR analysis in duplicates using the ABI 7900HT real-time PCR system (Life
Technologies) with the appropriate primers. The 2-deltaCt
method was used to calculate the
relative abundance of the lncRNA to GAPDH.
3.2.4. Affymetrix Genome-Wide Human SNP 6.0 Array
Samples were genotyped using the Affymetrix Genome-Wide Human SNP 6.0 Array.
DNA samples were processed, labeled and hybridized according to the manufacturer's
recommendations. All arrays were scanned on the GeneChip® Scanner 3000 7G using the
Affymetrix GeneChip Command Console (AGCC) Software to measure the fluorescent signal
intensities at each probe location.
3.3. Results and discussion
3.3.1. Identification of novel miRNAs in the 8q24 region using computational algorithms
Bioinformatics-computational algorithms predict new miRNAs by homology and
examine them for the stem-loop structure. One of the advantages to use this method is that a
large number of sequences can be scanned and examined for the characteristic hairpin structure
in pri-miRNA and pre-miRNA precursor structures to predict the existence of miRNAs. To
predict the novel miRNAs in a 30kb region corresponding to the LD block harboring the 8q24
SNP rs6983267 (hg18, chr8:128,472,000-128,501,999), I first used ProMiRII (Nam, Kim et al.
2006) and miR-abela (Sewer, Paul et al. 2005) (Figure 24) which are algorithms for unknown
pre-miRNAs from genomic targets. Eleven and 21 potential pre-miRNAs were identified by
ProMiRII (red arrow) and miR-abela (blue arrow), respectively. The genomic location of four
of them partially overlapped by both algorithms (green arrow) and they are shown as same
94
colored numbers in Table 8. Seven red star marked candidates included SNPs. Examples of
secondary structures for predicted pre-miRNAs by ProMiRII are shown in Figure 25.
Figure 24. In Silico-predicted pre-miRNAs in the 30kb region flanking the 8q24 SNP
rs6983267.
Figure 25. Examples of secondary stem-loop structures of predicted miRNA precursors
from 3’region of rs6983267 by ProMiRII.
95
Next, MiPred (Jiang, Wu et al. 2007) and CIDmiRNA (Tyagi, Vaz et al. 2008) algorithms were
used to distinguish the real pre-miRNAs from other hairpin sequences such as pseudo pre-
miRNAs among the 32 pre-miRNA candidates (Table 10). Furthermore, the candidates were
aligned with known precursors and mature miRNAs across all the available species (mmu;
mouse, hsa; Human, mo; morpholino, bta; bos taurus, ppt; physcomitrella patens) in the
miRBase (http://www.mirbase.org). Three out of the eleven predicted candidates by ProMiRII
and five out of twenty-one predicted by miR-abela were indicated as potential real miRNAs by
MiPred (Table 11). Table 10 shows the genomic location (hg18), length and free-energy of
each candidate. Seven out of the 32 pre-miRNAs had significantly high absolute values,
beyond the optimal cut-off at 0.6 by CID-miRNA. However, none of the 32 candidates
matched any of our reads from the colon cancer NGS (Chapter 2). Furthermore, while I was
conducting this project, another study (on prostate cancer) also failed to find any evidence for
significant novel miRNAs in this genomic region using NGS (Pomerantz, Beckwith et al.
2009). This suggests that it is difficult to computationally identify miRNAs or there are indeed
no novel miRNAs in this region.
Table 11. Lists of predicted pre-miRNAs by ProMiRII and miR-abela.
96
The colored numbers indicate partially overlapping pre-miRNAs predicted by the two
algorithms, ProMiRII and miR-abela.
3.3.2. Altered expression of five known miRNAs located in the 8q24 region
While no known miRNAs reside in a about 800 kb genomic area at 8q24.1
encompassing the SNP rs6983267, five miRNAs (miR-1204, miR-1205, miR-1206, miR-1207,
miR-1208) were recently found in the region of the non-coding PVT1 locus on 8q24, 400 kb
downstream of the rs6983267 (Huppi, Pitt et al. 2012). To study if the expression levels of
those miRNAs were associated with colon cancer, as well as with the risk allele (G) of
rs6983267 SNP, in cis, the expression levels of these miRNAs were studied in the SOLiD NGS
data (Figure 26) (detailed materials and methods are shown in Chapter 3).
97
Figure 26. Expression levels of five known miRNAs located in the 8q24 region by
different tissue types (10 tumors and 10 normals) and the genotype of the rs6983267 SNP
(5 GG versus 5 TT for each group). Green and purple dots represent the normal and tumor,
respectively. Blue dots represent the TT and red dots the GG genotype.
The five microRNAs; miR-1204, miR-1205, miR-1207, and miR-1208 demonstrated
differential expression levels by the colon tissue types (tumor or normal, Figure 26), indicating
association of these miRNAs with colon cancer. This finding is supported by Huppi’s finding
(Huppi, Volfovsky et al. 2008) that the enhanced expression of the hsa-miR-1204 (precursor of
miR-1204) was seen in two colon cell lines (HCT-116 and COLO-320) and breast cancer cell
98
lines (Huppi, Volfovsky et al. 2008). The genotype of rs6983267 was confirmed by using the
Affymetrix Genome-Wide Human SNP 6.0 Array on germline DNA and the samples were
separated into GG and TT genotypes. Interestingly, the expression level of miR-1204 was
significantly related to the GG genotype in both colon tumor (p=0.011) and adjacent normal
(p=0.012) tissues, suggesting a potential association of this miRNA with the rs6983267 SNP.
Since the genomic region spanning rs6983267 was found to contain DNA enhancer elements
(Jia, Landan et al. 2009), this miRNA may be deregulated by the SNP.
3.3.3. Altered expression of novel lncRNAs in the 8q24 region
Inspired by the localization of the SNP rs6983267 (the most promising risk variant in
colon cancer) within a highly conserved region of the genome (Sotelo, Esposito et al. 2010)
and a recent discovery of transcription of UTRs (Calin, Liu et al. 2007), our collaborator found
novel lncRNAs encompassing rs6983267 SNP by RACE (rapid amplification of cDNA ends)
in bone marrow cDNA (Ling, Spizzo et al. 2013). I identified significantly upregulated
expression of this lncRNA, CCAT2, in 23 colon tissues as compared to the paired 23 adjacent
normals (p<0.002, nonparametric Mann-Whitney-Wilcoxon test) (Figure 27A). To study the
role of this lncRNA in colon cancer, which may be mediated via the rs6983267 SNP, I
examined if the rs6983267 SNP status affects the differential expression of the CCAT2 between
the colon tumor and adjacent normal tissues. Interestingly, higher fold changes of CCAT2
expression between normal and tumors were found in GG (N=8) and GT (N=12) samples
compared to those with the TT (N=3) genotype (Figure 27B). Although significant difference
was observed in the slightly larger Italian cohort by our collaborator (p=0.009) for GG (N=19)
versus TT (N-24) in CRC (data not shown), no statistical difference was observed in our study
99
between the two genotypes (GG vs TT), in neither tumor nor normal samples, probably due to
the limited sample size. Interestingly, a significant difference was only observed in GT (N=12)
versus TT (N=3) in the tumor samples (p<0.048). However, these data should be further
validated in a larger number of the samples to prove if this lncRNA may indeed be associated
with the rs6983267 cancer risk SNP.
B. Different genotypes in Tumor C. Different genotypes in Normal
Figure 27. Elevated expression of a novel lncRNA, CCAT2, in tumors (A) and the
differential expression between GG, GT and TT samples. A. Relative expression levels of
the CCAT2 to GAPDH in 23 normal and the paired tumor tissues. B &C. Expression
levels of the CCAT2 in tumor (B) and normal tissues (C) by the genotypes of rs6983267.
The data are presented as box-whisker plots showing the 25th
percentile (lower box), median (a
line), and the 75th
percentile (upper box).
A . Tumor and Normal
100
3.4. Conclusion
Although no evidence was found for significant miRNA transcription within the 8q24
colon cancer risk locus using computational algorithms and NGS, the expression level of one
of the five miRNAs in 8q24 (miR-1204) was significantly associated with the GG type in both
colon tumor (p=0.011) and adjacent normal (p=0.012) tissues, suggesting a potential
association of this miRNA with the rs6983267 SNP. Interestingly, a lncRNA encompassing the
rs6983267 (called CCAT2) was also identified and confirmed in our sample set. Moreover,
higher expression of CCAT2 was observed in colon tumors compared to matched normal
tissues, suggesting an oncogenic role in the colon cancer development. The discovery of
CCAT2 may represent a novel mechanism suggesting an involvement of ncRNAs in colon
cancer pathogenesis, and providing a potential new diagnostic and/or prognostic marker if
further confirmed in a large number of independent samples.
101
CHAPTER 4
LANDSCAPE OF ALTERED METHYLATION IN COLON CANCER
4.1. Introduction
Aberrant methylation of tumor suppressors, oncogenes and repetitive elements such as
LINE1 has been identified in colon cancer, but the studies so far have focused on small
numbers of specific genes. Therefore, there is a limited knowledge of the genome-wide
methylation events of the colon and how this may contribute to colon carcinogenesis. I
hypothesize that altered methylation in various functional genomic locations (promoter, gene
body, UTR) of the CGIs as well as the surrounding regions, and at imprinted genes, may
contribute to the development and progression of colon cancer via disturbed gene regulation.
4.2. Materials and methods
4.2.1. Information on Patient Specimen
A total of 40 MSS/CIMP-neg colon tumors and 30 colon normal tissues, including 26
pairs, were collected from colon cancer patients at three participating centers of Colorectal
Cancer Family Registry (Mayo Clinic, Mount Sinai Hospital, and Cleveland Clinic). Sample
information is shown in Table 12.
102
Table 12. Sample information for methylation analysis.
3.2.2. Total RNA extraction
All tumor samples were sectioned and stained with hematoxylin and eosin, then
reviewed by a pathologist to determine tumor cell content. Tumor samples with >70% tumor
cell content were used for this study. Total RNA was extracted from the tissue samples using
the AllPrep DNA/RNA Mini kit (QIAGEN, Valencia, CA) following manufacturer’s
recommendations. Isolated RNA quality was checked by Bioanalyzer analysis (Agilent; Foster
City, CA).
103
4.2.3. Affymetrix Exon Arrays
1 μg of total RNA for each sample was first processed using a ribosomal RNA (rRNA)
reduction procedure as suggested by Affymetrix (Affymetrix, Santa Clara, CA). The rRNA
reduction was verified by running the reduced RNA samples on the Bioanalyzer (Agilent
Technologies, Santa Clara, CA). After rRNA reduction, the Affymetrix GeneChip® Whole
Transcript (WT) Sense Target Labeling Assay (Affymetrix) was used to generate amplified and
biotinylated sense-strand DNA targets for hybridization on GeneChip® Exon 1.0 ST Arrays
following the Affymetrix protocol. Briefly, double stranded cDNA was derived from 1ug of
concentrated rRNA-reduced RNA using T7-(N)6 random hexamers. This was followed by in
vitro transcription to produce amplified antisense cRNA, which was converted back to single-
stranded sense DNA. The sense DNA (5.5 ug) was enzymatically fragmented, checked on
Bioanalyzer for the appropriate size, terminally labeled with biotin and hybridized onto Exon
Arrays. After an 18-hour hybridization, the arrays were washed and stained using the
GeneChip® Hybridization, Wash and Stain Kit and the suggested protocol. The arrays were
scanned on The GeneChip® Scanner 3000 7G using the AGCC Software to measure the
fluorescent signal intensities at each probe location.
4.2.2. DNA Extraction and Bisulfite Conversion
DNA was extracted from the tissue specimens using the AllPrep DNA/RNA Mini kit
(Qiagen Inc, Valencia, CA). The purity and quantity of extracted DNA was examined by the
NanoDrop-2000 (Thermo Scientific, Wilmington, DE) and the integrity of the DNA was
checked by agarose gel electrophoresis. Bisulfite conversion of 500 ng of DNA was performed
on each sample according to the manufacturer’s recommendations for the Methylation450
104
BeadChip using the EZ DNA Methylation kit (Zymo Research, Irvine, CA). The treatment
protocol includes 16 cycles of denaturing at 95°C for 30 sec and incubation at 50°C for 60 min,
as well as a final step of holding at 4°C.
4.2.3. HumanMethylation450 BeadChips
Figure 28. The workflow for
HumanMethylation450 BeadChips.
HumanMethylation450 BeadChips
(Illumina, San Diego, CA) were used to
analyze the genome-wide DNA
methylation profiles across 485,577 CpGs.
These CpGs cover 96% of the known CpG
islands and 99% of the NCBI Reference
Sequence (Illumina) genes, with an
average of 17 CpGs per gene distributed
across the upstream of the TSS1500,
TSS200, 5’UTR, 1st exon, gene body, and
the 3’UTR (Table 2 in Chapter1). The 485,577 cytosine positions in the genome include
482,421 (99.35%) CpG dinucleotides, 3,091 (0.64%) CNG targets, and 65 (0.01%) SNP sites.
The workflow is shown in Figure 28. Four µl of bisulfite-converted DNA was used for
hybridization onto the HumanMethylation450 BeadChips, following the Illumina Infinium HD
Methylation protocol. This consisted of a whole genome amplification step followed by
enzymatic end-point fragmentation, precipitation and resuspension. The resuspended samples
105
were hybridized onto the BeadChips for 16 hours at 48°C. After hybridization, the
unhybridized and non-specifically bound DNA was washed away, followed by single
nucleotide extension using the hybridized bisulfite-treated DNA as a template. The Illumina
iScan SQ scanner was used to create images of the single arrays and the intensities of the
images were extracted using GenomeStudio (v.2011.1) Methylation module (v.1.9.0) software.
4.2.4. Raw data normalization
The data was normalized using the ‘Background Subtraction’ and ‘Normalization to
Internal Controls’ methods offered by the Genome Studio software. First, the background
subtraction value was derived from the signals of built-in negative control bead types for each
channel, setting the background level at the 5% percentile of the negative controls in the given
channel. Background was then subtracted from probe intensities in the same channel. If
intensity becomes negative, it is set to 0. Secondly, the internal control probe pairs on the
HumanMethylation450 BeadChips were utilized for normalization. The normalization control
probe pairs (over 90 of them) are designed to target the same region within housekeeping genes
and have no underlying CpGs in the probe. For normalization, probe intensity in the given
sample was multiplied by a constant normalization factor (for all samples) and divided by the
average of normalization controls in the probe’s channel in the given sample.
4.2.5. Initial filtering of beta-values
The methylation score for each CpG site is represented by a beta-value calculated
according to the normalized probe fluorescence intensity ratios between methylated and
unmethylated signals. Beta-values vary between 0 (fully unmethylated) and 1 (fully
106
methylated). Every beta-value on the HumanMethylation450 BeadChip is accompanied by a
detection p-value indicating signals significantly greater than background. Any sites with
detection p-values greater than 0.05 were filtered out before further analysis. Finally, we
excluded probes that are designed for sequences on either the X or Y chromosome. Average
delta-beta-values indicating the differential methylation between colon cancer and the adjacent
normal tissue were calculated by subtracting the average beta-value of pooled colon cancer
samples from that of pooled adjacent normal tissues.
4.2.6. Statistical analysis of differential methylation
To address the heteroscedasticity of beta-values, they were converted to M-values for the
statistical analysis and the Partek Genomics Suite (Partek Inc., St. Louis, MO) was used for the
statistical analysis of differential methylation. First, data quality control (QC) analyses on the
normalized average beta-values generated by the Genome Studio were performed. These
included graphing the sample histograms for signal distributions (Figure 29). For the actual
differential methylation analysis, a multivariate ANOVA (ANCOVA), including factors such as
tissue (tumor vs. normal), stage and scan date, was performed to evaluate the contribution of
these factors to differential methylation. Heat maps were created using the Partek Genomics
Suite. The Euclidian distance between the two groups of samples (tumors and normals) was
calculated by the average linkage.
4.2.7. Ingenuity pathway analysis (IPA)
The Ingenuity Pathway Analysis (IPA) program (http://www.ingenuity.com/index.html)
was be used to identify the possibly affected gene networks, functional categories and
107
canonical pathways related to colon cancer. IPA ranks gene networks by a score (-log (p-value))
that takes into account of the number of focus genes and the size of the network.
4.3. Results and discussion
4.3.1. Aims of data analysis
There were five major aims for the data analysis: 1) To identify differentially methylated
CpGs genome-wide in 26 paired samples; 2) To study the biological functional roles of genes
covered by differentially methylated (DM) CpGs; 3) To identify specific colon cancer DM
CpGs by a comparison to Hepatocellular Carcinoma; 4) To identify DM CpGs in imprinted
genes; 5) To identify correlations between methylation and miRNA expression. The results of
these analyses are described in the following sections.
4.3.2. Conversion of beta-values to M-values
In a high-throughput statistical analysis using ANOVA, which is based on the
assumption of homoscedasticity (normal distribution), the variable variances should be
approximately constant. The heteroscedasticity associated with beta-values suggests
transforming beta-values to M-values as M-values are more appropriate for ANOVA (Du,
Zhang et al. 2010). M-value was computed in the Partek Genomics by applying the logit
transformation (Figure 29). In addition, logit transformation reduces Infinium type I and type II
probe bias. Figure 30 shows histograms of beta-values and M-values for all samples analyzed
by HumanMethylation450 BeadChip interrogating the entire 485,577 CpGs. The histogram of
M-values shows a clear bimodal distribution with one positive (methylation) and one negative
(unmethylation) peak. In contrast, beta-values are severely distributed in the low (between 0
108
and 0.1) and high (between 0.7-1.0) ranges. The range of beta-values is between 0 and 1, which
can be interpreted as the percentages of DNA methylation for the population of given CpG
sites in the samples. The beta-value has a more intuitive biological interpretation since beta-
values represent methylation levels. However, M-value is more statistically appropriate for
the differential analysis of methylation levels, but it is difficult to directly interpret the DNA
methylation based on M-values. Therefore, Du et al. recommends using the M-value method
for performing differential methylation analysis and including the beta-values to report the
results (Du, Zhang et al. 2010).
Figure 29. M-value transformation to address the issue of heteroscedasticity. (Source:
Institute of Genetic Medicine)
A. B.
Figure 30. Histograms of beta-values (A) and M-values (B) interrogating CpGs in the
total of 485,577 CpGs
109
4.3.3. Quality checks based on the distribution of the beta-values
A total of 40 MSS/CIMP-neg colon tumor and 36 adjacent normal tissues were analyzed
on the HumanMethylation450 BeadChips. While all tumor samples showed good quality of
DNA, ten normal samples showed a decreased quality of DNA with some smearing on the gel
analysis (Data not shown). To increase a number of adjacent normal samples for the statistical
power, the lower quality of DNA samples were also initially analyzed on the BeadChips. First,
data quality control analyses for the 40 tumor (Figure 31A) and 36 normal colon samples
(Figure 31B) were performed based on the normalized average beta-values on entire 485,577
CpGs. Relatively lower average beta-values were seen for five normal samples (red arrows in
Figures 31B).
A. B.
Figure 31. Histogram of average beta-values for 485,577 CpGs in 40 tumor samples (A)
and 36 adjacent normal samples (B). Quality of DNA analyzed on the gel is indicated on the
top row. B; bad, G; good, F; fair. The x-axis and y-axis represent the individual samples and
average beta-values, respectively. The arrows indicate the samples with relatively low average
beta-values.
Next, to check if the average beta-values on these five normal samples in Figure 32
were expected or not, 113 CpGs of interest based on a HumanMethylation27 BeadChip
110
analysis by a Peter Laird’s group were checked in our normal samples. These 113 CpGs were
constitutively methylated in normal samples, but showed variable levels of DNA
hypomethylation in colon tumor tissues in the previous study (Hinoue, Weisenberger et al.
2012). Six samples including the anticipated five bad normal samples showed relatively lower
average beta-values for the 113 CpGs (as shown in Figure 32). These six samples with the
average beta-values range between 0.35 and 0.50 were thus excluded from further statistical
analysis aiming to identify the differentially methylated CpGs in MSS/CIMP-neg colon cancer
compared to normal tissues.
A. B.
Figure 32. Distribution (A) and median of average beta-values (B) on 113 CpGs
consistently methylated in normal, but not in tumor tissues by Peter Laird’s group. Six
bad samples are indicated.
Next, I validated if our data rediscovered previously identified DM CpGs presented by
other studies in colon cancer. Using data generated by Illumina HumanMethylation27
BeadChip, Hinoue and colleagues proposed a new “two-panel method” to differentiate CIMP-
high and CIMP-low subtypes (Hinoue, Weisenberger et al. 2012). Using this approach,
Karpinski and colleagues recently classified new subgroups that include HME, IME, and LME
corresponding to CIMP-high, CIMP-low, and CIMP-neg, respectively (Karpinski, Walter et al.
111
2013). From their study, I selected 8 CpGs that were all relatively hypermethylated in LME
(comparable to CIMP-neg) cancer compared to adjacent normal tissues, although they are less
methylated in CIMP-neg cancer compared to other cancer groups such as HME and IME. The
8 loci are located in genes THBD, FBN2, RAB31, IGF2 (or INS-IGF2 or IGF2AS), ELMO1,
RAB31, FAM78A, and SLIT1. These eight CpGs were checked in 70 samples comprised of 40
MSS/CIMP-neg colon tumors and 30 adjacent normals and further confirmed using only the 26
paired samples, as shown in Figures 33A and 33B, respectively. Unsupervised clustering
showed that tissue types (tumor and normal) by the eight CpGs were more clearly clustered in
the paired sample analysis compared to all pooled samples. This result indicated that DNA
methylation patterns on some MSS/CIMP-neg colon tumor tissues behave more like normals.
Since there were no matched normal samples for them, there was a limitation to explain the
heterogeneous methylation patterns in pooled samples. For this reason, I decided to first study
the paired samples by further analysis and examine the signatures in pooled samples later.
A. B.
Figure 33. Unsupervised hierarchical clustering of beta-values for 8 CpGs (rows) in
pooled samples (A), and only paired samples (B) (columns). Green and red blocks on the
maps represent 30 normals and 40 MSS/CIMP-neg tumor tissues (A) or 26 normals and
matched 26 tumors (B), respectively. The red and blue for the CpGs represent
hypermethylation and hypomethylation, respectively.
112
Next, I checked how many paired samples were in the right direction
(hypermethylation in tumor compared to normal tissues) for these eight previously identified
DM CpGs. A high frequency of hypermethylation at these loci in MSS/CIMP-neg colon tumor
tissues was observed, up to 84.6%, with a 100% consistency in the direction of methylation
changes (Figure 34). Moreover, all eight CpGs showed significant difference by p-value <0.05
(ANCOVA adjusted factors: pairs and batch effect).
Figure 34. Dot plots of beta-values in 26 paired colon tissues for 8 previously identified
hypermethylated CpGs by Karpinski et al. (Karpinski, Walter et al. 2013). Each point
represents the beta-values for an individual. The paired samples are connected by a line.
Numbers (proportion) of observed hypermethylated paired samples among the 26 total
paired samples are presented in the bottom of the plots. (N: Normal, T: Tumor).
113
4.3.4. Distribution and classification of CpGs
Before doing statistical analysis to identify differentially methylated CpGs, distribution
of CpGs on the HumanMethylation450 BeadChip was checked. After exclusion of unreliable
CpGs at detection p-value >0.05 and CpGs in either X or Y chromosome, the proportions of
472,826 CpGs across functional genomic locations (A) and CGIs and their surrounding regions
(B) are illustrated in Figure 35. For the functional genomic location, 44% CpGs are located in
proximal promoters including CpGs in TSS1500 (15%), TSS200 (11%), 5’UTR (11%), and
1stExon (7%) (Sandoval, Heyn et al. 2011). Moreover, 4%, 31%, and 21% CpGs corresponded
to 3’UTR, gene body and intergenic sequences, respectively (Figure 35A). Of the 472,826
CpGs, 31% CpGs are in the CGIs, 23% in CGI shores, 10% in CGI shelves, while 36% are in
other regions of the genome (Open Sea) (Figure 35B).
A. B.
Figure 35. Distribution of CpGs across functional genomic locations (A) and CGIs (B).
4.3.5. Identification of the genome-wide methylation profiles in colon cancer
After normalization of the data with internal controls and the background subtraction,
114
Principal Component Analysis (PCA) was performed on the beta-values of 472,826 CpGs to
show methylation signal clustering of the samples by tissue type. The results showed distinctly
different overall methylation patterns between all colon tumors and matched normal samples
(data not shown). To explore the landscape of consistently aberrant DNA methylation in the
MSS/CIMP-neg colon cancer, 26 MSS/CIMP-neg colon tumor and 26 matched normal tissues
were analyzed. A total of six chips were used to profile all samples in 2 batches, and tumor-
normal sample pairs were analyzed on the same chip to minimize experimental variation.
Differential methylation levels (delta-beta values) against raw p-values are shown by volcano
plots in Figure 36. I observed an enrichment of CpGs with negative delta-beta values among
the genome-wide 472,826 CpGs at Bonferroni corrected p-value <0.05 (horizontal red line),
reflecting the general hypomethylation in MSS/CIMP-neg colon cancer. When it comes to
specific functional location, general hypomethylation was observed in the 3’UTR and
intergenic regions and general hypermethylation was found in TSS1500, TSS200, 5’UTR, and
the 1stExon which are considered the functional promoter regions (Figure 36A).
115
A.
Figure 36A. Volcano plots showing the magnitude of differential methylation levels (delta-
beta) in the entire CpGs sets at various functional regions. The x-axis is used for delta-beta
values and the y-axis shows the negative log10 of p-values (a higher value indicates greater
significance). The horizontal red and blue lines mark the thresholds at Bonferroni corrected p-
value = 0.05 for defining differentially methylated CpGs in colon tumors compared to normal
tissues. The vertical lines represent delta-beta values at -0.2 and 0.2, respectively.
For the CGIs and the surrounding regions, the enrichment of hypermethylation was
observed in N_Shores and CGIs, but less hypermethylation was observed in the N_Shelf,
S_Shore, S_Shelf, and Open Sea loci (Figure 36B).
116
B.
Figure 36B. Volcano plots showing the magnitude of differential methylation levels (delta-
beta) in the entire CpGs sets at CpG islands and the surrounding regions. The x-axis is
used for delta-beta values and the y-axis shows the negative log10 of p-values (a higher value
indicates greater significance). The horizontal red and blue lines mark the thresholds at
Bonferroni corrected p-value = 0.05 for defining differentially methylated CpGs in colon
tumors compared to normal tissues. The vertical lines represent delta-beta values at -0.2 and
0.2, respectively.
4.3.6. Genome-wide methylation patterns of significant DM CpGs in colon cancer
Significantly differential methylation events between the MSS/CIMP-neg colon tumors
and matched normal tissues were observed at 0.6% of the CpGs analyzed (304/472,826) at a
Bonferroni corrected p-value <0.05 (corresponding to raw p-value< 1.04997e-007), and clear
separation by these 304 CpGs was observed between the colon tumors and the adjacent normal
tissues by PCA and by the unsupervised hierarchical clustering of beta-values (Figure 37A). Of
the 304 CpGs, 50% (152/304) had delta-beta values ≥ l0.2l which means greater than or equal
117
to 20% methylation differences between tumor and normal tissues (Figure 37B). Among these
CpGs, 88% (134/152) and 12% (18/152) were hypomethylated and hypermethylated,
respectively. This general hypomethylation is supported by previous findings using other
methods such as a antibody-based method combined with restriction enzymes (Hernandez-
Blazquez, Habib et al. 2000) and MethyLight for the LINE-1 repetitive element (Sunami, de
Maat et al. 2011) in colon cancer development.
A.
B.
Figure 37. Methylation profiles of (A) 304 DM CpGs with Bonferroni corrected p<0.05
and (B) 152 DM CpGs with delta-beta values ≥ l0.2l by PCA (left), and unsupervised
hierarchical clustering (right). The heat maps show beta-values, with red being more
methylated and blue less. Columns represent individual samples, and rows represent 304
DM CpGs.
118
To explore the landscape of the observed DM distribution, I further analyzed functional
location of the 152 significant hyper- or hypomethylated DM CpGs separately (Figure 38A). A
CpG site can be in more than one functional location since a locus can reside in several
transcript variants of the same gene or in different genes. Among the 18 hypermethylated CpGs,
most were localized in the 5’UTR (38%) and TSS200 (23%), followed by 12% and 10% in the
TSS1500 and Body regions (Figure 38A). The highest frequency of hypermethylation was
observed in the 5’UTR (6/15, 40%) (Figure 38B). In contrast, all three DM CpGs in the 3’UTR
were hypomethylated in colon cancer compared to matched normal tissues. The
hypomethylated CpGs were similarly distributed across 7 different functional locations.
(Figure 38A).
Next, the localization of the 152 DM CpGs respective to the CGIs and the surrounding
areas was studied (Figure 39). While most of the DM CpGs found in this study reside in Open
Seas (103/152, 68%) and are hypomethylated (100/134, 75%), most of the hypermethylated
loci were in the CGIs (12/16, 67%).
119
A
Figure 38A. Functional location of the 152 DM CpGs. Distribution of 152 DM CpGs
including 18 hypermethylated CpGs (Left) and 134 hypomethylated CpGs (Right).
120
B
Figure 38B. Functional location of the 152 DM CpGs. DNA methylation patterns of 152 DM CpGs by a functional
location.
121
A
Figure 39A. Distribution of CGIs and surrounding regions of 152 DM CpGs. Distribution of 152 DM CpGs including
18 hypermethylated CpGs (Left) and 134 hypomethylated CpGs (Right).
122
B
Figure 39B. Distribution of CGIs and surrounding regions of 152 DM CpGs. DNA methylation patterns of 152 DM
CpGs by CGIs and surrounding regions.
123
4.3.7. MSS/CIMP-neg colon cancer DM CpGs compared to another cancer type
Genome-wide studies have shown that DNA methylation profiles in mammals are tissue
specific (Kitamura, Igarashi et al. 2007; Rakyan, Down et al. 2008) as well as tumor specific
(Rakyan, Hildmann et al. 2004; Eckhardt, Lewin et al. 2006). Therefore, I questioned if colon
DM CpGs found in this study are colon tissue specific (T-DM) or colon cancer specific (C-DM).
To answer this question, I first investigated the degree of DM CpG methylation between colon
cancer (N=26 paired samples) and Hepatocellular carcinoma (HCC, N=19 paired samples) using
the 152 DM colon DM CpGs identified in this thesis (Figure 40). For HCC samples, the same
comprehensive genome-wide approach was used and the landscape of aberrant methylation in
HCC was recently published by our group (Song, Tiirikainen et al. 2013). Interestingly, clear
separation was observed by cancer types (colon cancer and HCC, y-axis; PC1, 66%) as well as
tissue types (colon and liver, x-axis; PC2, 11.8%) in PCA using the 152 DM colon DM CpGs.
(Figure 40A). By definition, the colon cancers and matched normal tissues were segregated in
unsupervised clustering (Figure 40B). Moreover, the two major branches in the dendrograms
correspond perfectly to tissue type except for a few samples.
Next, the 18 hypermethylated and 134 hypomethylated colon DM CpGs were separately
investigated by PCA analysis to see if these differential methylation patterns represent either
colon T-DM or colon C-DM CpGs (Figure 40C and 40D). Interestingly, the colon tumor sample
(red) cluster lays to the right while the colon normal (blue), liver normal (green) and HCC
(purple) samples are located on the left using the 18 hypermethylated colon DM CpGs on PC1
(65.5%), suggesting that they could be C-DM CpGs, or colon cancer specific (Figure 40C).
Unlike with the hypermethylated colon DM CpGs, the tumor and normal samples are located on
124
A B
C D
Figure 40. Clustering of normal tissues (colon and liver) and tumor tissues (colon cancer
and HCC) using 152 colon DM CpGs resulting in near perfect discrimination of tissues.
The beta-values of all tissues for the 152 colon DM CpGs were used for the PCA (A) and
unsupervised clustering (B). The heat map shows beta-values, with red being more
methylated and blue less. Columns represent individual samples, and rows represent 152
DM CpGs. PCA was performed with 18 hypermethylated (C) and 134 hypomethylated
colon DM CpGs.
125
left and right regardless of cancer types, but with the colon tissues in the upper sections and liver
tissues in the lower sections, using the 134 hypomethylated colon DM CpGs on PC1 (72.3%),
indicating that they could be T-DM CpGs, or tissue specific (Figure 40D).
To explore the biological implications of the 152 significant DM loci, I characterized the
associated genes using IPA (Table 13). A gene list of 83 unique genes harboring the 152 DM
CpGs was created, and 73 genes were analyzed by IPA to identify their biological networks and
possible functional roles. Fifteen genes were involved in the top IPA network related to “Cell
Death and Survival, Nervous System Development and Function, Cellular Assembly and
Organization”. Moreover, 14 genes were found to be involved in “Dermatological Diseases and
Conditions, Organismal Injury and Abnormalities, Cell Death and Survival”, and another 11
genes were identified in “Embryonic Development, Organismal Development, Tissue
Development” (Table 13). IPA analysis indicated that this gene list is enriched for genes having
roles in cancer (47 genes) and gastrointestinal disease (31 genes) (data not shown). About 32%
(24/73) of these genes have previously been reported to be deregulated in colon cancer (Table 14)
by mutation.
Looking at their cellular functions in more detail, the DM genes identified in this study
have a variety of functional roles; as enzymes, transcription regulators, and ion channels.
- Enzyme: DPYS is involved in the catabolism of pyrimidine base (Hamajima, Kouwaki
et al. 1998) and has been suggested as a novel tumor marker in cancers including colon cancer
(Chung, Kwabi-Addo et al. 2008). MACF1, microtubule-actin crosslinking factor 1, is a member
of ATPase and involved in the Wnt signaling pathway and functions as a positive regulator in the
126
translocation of Axin and its associated complex from the cytoplasm to the cell membrane (Chen,
Lin et al. 2006). NADPH oxidase 4, NOX4 (hypomethylation in cancer), has recently been
identified to be overexpressed in colon cancer, and associated with carcinogenesis (Wang,
Dashwood et al. 2011).
- Transcription regulator: AFF1 is a sequence specific DNA binding transcription factor
and is associated with leukemia. The t(4;11)(q21;q23) involves the genes MLL and AFF1 and
this fusion related leukemia is associated with poor prognosis (Tamai, Miyake et al. 2011). An
important paralog of AFF1 is AFF3, which is known as a putative transcription activator that
may function in lymphoid development and as an oncogene (Luo, Lin et al. 2012). However,
these genes have not been studied in colon cancer development. MYT1L is a myelin
transcription factor 1-like gene that activates A2BP1, resulting in blocking glioblastoma cancer
(Hu, Ho et al. 2013). Two other transcription regulators, OSR2 and SIM1, were found to be
aberrantly methylated in our study; however, those genes are poorly understood in cancers.
- Ion channel: Membrane ion channels are essential for cell proliferation and have an
important role in the development of cancers (Kunzelmann 2005). Voltage-gated potassium
channels have an oncogenic function. KCNJ1 is a potassium channel and was recently presented
for its important role in colon cancer by Zhu and colleagues at AACR Annual Meeting 2013.
NOX5 is a calcium-dependent NADPH oxidase and has been suggested to be involved in cell
growth and apoptosis in a prostate cancer cell line (Brar, Corbin et al. 2003).
127
Table 13. IPA top networks for genes with DM CpGs.
128
Table 14. List of genes that have previously been reported to be mutated in colon cancer.
129
4.3.8. Deregulated methylation at imprinted genes in MSS/CIMP-neg colon cancer
Imprinting is an important epigenetic regulator mechanism in the expression of
certain genes that can be expressed in a parent-of-origin-specific manner (Robertson
2005). Although loss of imprinting (LOI) in IGF2 has been found in colon cancer against
the matched normal colonic mucosa of 30% of CRC patients, as compared with 10% of
individuals without CRC (Cui, Cruz-Correa et al. 2003), colon cancer related changes in
methylation across all imprinted regions has not been systemically investigated.
Thanks to the genome scale coverage of the Illumina HumanMethylation450
BeadChip, I was able to analyze 1,257 CpGs co-localizing to 41 imprinted genes in 26
MSS/CIMP-neg colon cancer pairs. Among the CpGs, methylation increased
significantly at 28 CpGs and decreased at 27 CpGs across 18 unique genes, as observed
at Bonferroni corrected p<0.05 (Table 15). The strongest altered hypomethylation and
hypermethylation events were observed at 3 and 5 CpGs, with hypomethylation at gene
DLX5 and hypermethylation at four genes; LRTM1, GNASAS, MEST, and KCNK9, at
delta-beta ≥ l0.2l, respectively (Table 15). Furthermore, a higher frequency of both
hypomethylation and hypermethylation among the 26 paired samples was observed with
up to 92% consistency (24/26 pairs) in the direction of the methylation changes (Figure
41). Interestingly, two genes out of the 5 DM imprinted genes (DLX5 and MEST) were
recently shown to be differentially methylated in prostate tumor tissues by the
HumanMethylation27 BeadChip (Jacobs, Mao et al. 2013).
Next, I questioned if these methylation differences were correlated to the host
gene expression levels. The Affymetrix Exon Arrays were used for the gene expression
study on 23 pairs that were also analyzed on the Illumina HumanMethylation450
130
BeadChips. Four out of the five DM imprinted genes were analyzed on the GeneChip®
Human Exon 1.0 ST Array. Methylation levels of two MEST promoter (TSS1500) CpGs
were significantly inversely correlated with MEST gene expression values (Figure 42).
This result is supported by other findings that LOI of MEST is linked to colon cancer
(Nishihara, Hayashida et al. 2000), breast cancer (Pedersen, Dervan et al. 1999), and lung
cancer (Nakanishi, Suda et al. 2004). Aberrant methylation of MEST is also associated
with male infertility (Marques, Costa et al. 2008).
131
Table 15. Significant differential methylation in imprinted gene loci between colon
tumor and normal tissues at Bonferroni corrected p<0.05.
132
Figure 41. Dot plots of beta-values for 8 differentially methylated CpGs in 5 imprinted genes in MSS/CIMP-negative colon
cancer compared to adjacent normal tissues. Each point represents the beta-values for an individual. The paired samples are
connected by a line. The number and proportion of observed paired samples among 26 total paired samples are presented in
the bottom of the plots. (N: Normal, T: Tumor)
133
Figure 42. Inverse correlation between DNA methylation and gene expression level
of MEST. Gene name, TargetID (locus) by Illumina, and the correlation coefficients
(r) are presented. Beta-values and expression levels for each individual colon tumor
(red) and normal (blue) samples are presented by dots. The x-axis and y-axis
indicate the levels of gene expression from Affymetrix Exon arrays and the DNA
methylation by Illumina HumanMethylation450 BeadChips, respectively.
4.3.9. Correlation between DNA methylation and miRNA expression
Chapter 2 describes the discovery of nineteen miRNAs, including 6 previously
colon cancer associated and 13 new-miRNAs (not previously implicated in colon cancer),
which were identified to be differentially expressed between the MSS/CIMP-neg colon
cancer and normal tissues. Although methylation patterns do not always correlate with
gene expression because of diversity of epigenetic changes (Eckhardt, Lewin et al. 2006),
I had a question if altered expression of these miRNAs could potentially be regulated by
DNA methylation. First, the CpGs that cover the miRNAs were selected. Illumina
HumanMethylation450 BeadChip includes 117 CpGs that cover four of the new-miRNAs
134
and 11 new-miRNAs. The number of analyzed CpGs for each miRNA is shown in Table
16. All miRNAs except the miR-30a were highly expressed in MSS/CIMP-neg colon
cancer compared to adjacent normal tissues.
Table 16. The number of analyzed CpGs for differentially expressed miRNAs.
# of CpG sites on HumanMethylation450
Known-miRNAs
miR-30a 3
miR-135b 4
miR-182 8
miR-202 10
New-miRNAs
miR-183 9
miR-365-1 6
miR-549 5
miR-602 3
miR-638 18
miR-935 7
miR-937 6
miR-1180 11
miR-1292 9
miR-1909 11
miR-1914 7
Because not all of samples were analyzed by both the NGS (for miRNA expression)
and HumanMethylation450, only a subset of samples (7 normal and 6 tumor samples
including 3 pairs) were analyzed to study the correlation between miRNA expression and
methylation levels in pooled samples (normal plus tumor) and colon tumor samples
separately (Figure 43). Spearman’s rank correlation test showed that four CpGs were
significantly correlated either negatively (miR-1292) or positively (miR-135b, miR-182,
135
and miR-602) with four corresponding miRNAs in colon tumor samples at Spearman
correlation p<0.05 (Figure 43, right). However, this finding should be confirmed in a
larger number of samples and further experimentally studied if they are correlated by
direct interaction.
136
Figure 43. Correlation between
miRNA expression and their
DNA methylation. MiRNAs
name, TargetID (locus) by
Illumina, and the correlation
coefficients (R) are presented.
Beta-values for each individual
colon cancer (red) and normal
(green) samples are presented
by dots. The x-axis and y-axis
indicate the miRNA expression
from NGS and DNA
methylation from
HumanMethylation450
BeadChip, respectively.
137
4.4. Conclusion
The main observations of this methylation study are: (1) General hypomethylation
was observed in MSS/CIMP-neg colon cancer, concentrating in the intergenic regions
and gene bodies; (2) Hypermethylation was observed in promoter regions; (3)
Enrichment of hypermethylation was observed in the N_Shore and CGIs; (4) Colon
cancer specific altered methylation was identified at hypermethylated CpGs and tissue
specific (normal or tumor, colon or liver) altered methylation was identified at
hypomethylated CpGs (in colon cancer compared to matched normal tissues); (5)
Significant DM CpGs were enriched for genes having a role in cancer and
gastrointestinal disease; (6) The observations in imprinted genes genome-wide suggest a
more widespread dysregulation of imprinting in colon cancer than previously reported as
well as consistent with previous reports for some of genes; (7) A subset of differentially
expressed miRNAs had a significant correlation with their genomic methylation. These
findings from the genome-wide profiling of MSS/CIMP-neg colon cancer may help to
define the landscape of aberrant DNA methylation in MSS/CIMP-neg colon cancer and
give more depth to the observation of altered methylation in imprinted genes.
Understanding epigenetic changes may lead to identification of molecular markers for the
MSS/CIMP-neg colon cancer to be utilized in diagnosis, treatment and prognosis.
138
GENERAL DISCUSSION
This dissertation comprises of three main studies of epigenetic events in microsatellite
stable (MSS) and CpG island methylator phenotype-negative (CIMP-negative) colon
cancer. These studies have (i) identified differential expression of known and novel
microRNAs (miRNAs), (ii) characterized non-coding RNAs (ncRNAs) in the 8q24 gene
desert region containing the cancer risk variant rs6983267, and (iii) elucidated the
landscape of altered genome-wide DNA methylation in colon tumors compared to
adjacent normal tissues.
Thanks to the technology revolution, state-of-the-art technologies such as microarrays
and Next Generation Sequencing (NGS) give us excellent ways to study epigenetic
alterations comprehensively. My dissertation projects discovered widespread epigenetic
changes, as accomplished via miRNAs and DNA methylation in colon cancer.
First, I utilized SOLiD NGS for comprehensive profiling of novel and known miRNAs in
colon cancer. To enable this cutting-edge project, Dr. Maarit Tiirikainen who is my long
time supervisor and my academic advisor, received an award from Applied Biosystems
based on the preliminary results from my work. I hypothesized that the MSS/CIMP-
negative colon cancer is likely to have altered expression of miRNAs, which may
contribute to the development and progression of the disease. For this study, I used 10
colon tumor tissues and 10 adjacent normal tissues. I identified statistically significant
differential expression in 19 miRNAs, including 13 new colon cancer miRNAs, as
compared to the adjacent normal. Because these newly identified miRNAs were of low
139
abundance, only 6 of them could be measured by quantitative PCR for the validation of
the findings. However, the findings for these six new colon cancer miRNAs were
confirmed, and for five of them tumor-specific expression was replicated in another set of
samples. Furthermore, I identified close to 100 significantly correlated potential target
mRNAs for a subset of the newly identified miRNA and pathway analysis revealed
plausible roles of these target genes in the colon cancer development. Early results of this
project were presented at the AACR Annual Meeting in 2011 and the manuscript is under
preparation.
It is now known that miRNAs are found throughout the genome, and interestingly, many
miRNAs are located near cancer susceptibility loci or are associated with regions o
f genomic instability. Several studies have also revealed that polymorphisms in mi
RNAs and in their target sites can lead to aberrant gene regulation. The genome-
wide association studies (GWAS) have found many cancer susceptibility loci in non
-coding genomic areas such as the multiple cancer susceptibility locus at 8q24. Thi
s region is very interesting because there are no coding genes in about 600 kb. The
SNP rs6983267 is located upstream of the well-known proto-oncogene MYC. However,
despite the consistent association between the SNP and colon cancer risk, the molecular
mechanism/s of action are still not that clear.
I hypothesized that the “gene desert” locus harboring rs6983267 may contain non-coding
RNAs that may thus play a role in the colon cancer development. First, to predict novel
miRNAs in that genomic region, I used publicly available algorithms, and over thirty
140
potential candidates were identified within close proximity (30kb) of the SNP. Secondly,
I tried to identify novel miRNAs in the same genomic region by utilizing the SOLiD
NGS data. I realized however, that in silico methods to predict miRNAs are very limited
for finding the miRNAs: none of the in silico candidates were matched to the SOLiD
NGS reads. Interestingly, no other novel miRNA reads were found neither anywhere near
the SNP in our NGS data. Therefore, it is not surprising, that another study on prostate
cancer also failed to find novel miRNAs in this region using NGS (Pomerantz, Beckwith
et al. 2009). This suggests that it is difficult to computationally or even empirically to
identify novel miRNAs or rather, that there are indeed no miRNAs in this region.
However, five miRNAs (miR-1204, miR-1205, miR-1206, miR-1207 and miR-1208)
were recently found in a more distal 8q24 region, in the non-coding PVT1 locus, 400 kb
downstream of the rs6983267 (Huppi, Volfovsky et al. 2008). Interestingly, four of the
microRNAs; miR-1204, miR-1205, miR-1207 and miR-1208 demonstrated differential
expression levels between tumor and normal, indicating association of these miRNAs
with colon cancer.
Several groups have been trying to study if the cancer risk SNP rs6983267 is associated
with altered gene expression to explain the etiology of the cancer risk conferred by the
SNP. These studies have shown that an ~335 kb DNA loop brings the genomic region
containing the SNP close to the MYC locus, and this physical association may enable
enhancer function of the SNP-containing region, thus affecting MYC transcription.
Therefore, next, I was wondering if the risk allele (G) of rs6983267 SNP could, in cis,
affect the expression of the PVT1 locus miRNAs. I studied the expression levels of these
141
miRNAs in the SOLiD NGS data. Interestingly, the expression level of miR-1204 was
significantly related to the GG genotype in both colon tumor and adjacent normal tissues,
suggesting a potential association of this miRNA with the risk allele of the rs6983267
SNP. Furthermore, although I could not find any miRNAs in the 8q24 region proximal
to the SNP, our collaborator (Dr. George A. Calin, MD Anderson, Texas) recently
discovered a long ncRNA called CCAT2 in this region, harboring the cancer risk SNP
rs6983267 (Ling, Spizzo et al. 2013). Interestingly, this lncRNA is significantly
overexpressed in colon tumors compared to paired adjacent normal tissues. To study how
the lncRNA expression correlates with the different genotypes of the SNP, the expression
level of CCAT2 was separately analyzed in tumors and normals by the genotypes GG, GT,
TT. There was a trend for a higher expression in the presence of the G risk allele as
compared to the T allele, especially in the GT heterozygotes among our tumors. The
multicenter CCAT2 study looked into the genotype-CCAT2 expression correlation in
several colon cancer cohorts and highest expression level in the GG tumors vs the other
genotype tumors was found in one of the cohorts. These findings should be further
confirmed in a larger number of samples and in various colon cancer subtypes.
The first two parts have focused on RNA-based epigenetics. Since DNA methylation is
the most well-known epigenetic mechanism, I also studied altered DNA methylation in a
larger sample set from the same MSS/CIMP-negative colon cancer cohort (26 colon
tumor tissues and 26 paired adjacent normal tissues). Altered methylation events in tumor
suppressors, oncogenes, imprinted genes and repetitive elements such as LINE1 have
been identified in colon cancer, but these studies have focused on mostly promoter CpG
142
islands in small number of specific genes or on certain repetitive elements. Therefore, I
hypothesized that methylation changes in various functional genomic locations (promoter,
gene body, UTR), enhancer elements, CpG islands, as well as their surrounding regions
are expected in colon tumors of even the methylator-negative type, and these alterations
may contribute to the development and progression of this type of colon cancer. Twenty-
six colon tumor and matched adjacent normal tissues were analyzed using the genome-
scale Human Methylation450 Bead Chip. This platform, although not giving a totally
unbiased genome-wide coverage, covers 99% of the RefSeq genes, with an average of 17
CpG sites per gene region; distributed across the promoter, 5'UTR, first exon, gene body,
and 3'UTR. It covers also 96% of CpG islands, with additional coverage in island shores
and island shelves. Thanks to this genome-scale coverage, I was able to use this platform
to test my hypothesis. My first finding was general hypomethylation in intergenic regions
and gene bodies in tumors compared to normal tissues, which was an expected result
since global hypomethylation is found in many cancers including colon, using other
techniques such as HPLC and MethyLight for the LINE1 repetitive element. The most
well-known epigenetic alteration, hypermethylation in promoters and CpG islands, was
also observed.
Although overall methylation levels are similar in individuals, it has been known that
there are significant differences in overall and specific methylation levels between
different tissue types and between normal and cancer cells from the same tissue.
Interestingly, I observed colon
cancer specific altered methylation at hypermethylated loci (as compared to liver cancer)
143
and tissue specific (tumor vs normal, colon vs liver) altered methylation at the
hypomethylated loci. Those significant differentially methylated loci were enriched in
genes that play role in cancer and gastrointestinal disease. The other interesting finding
was the observervation of widespread altered methylation in the form of imprinting in the
MSS/CIMP-neg colon cancer, especially in two promoter related loci in MEST, the
methylation of which was shown to be correlated with the gene’s expression.
Although methylation patterns do not always correlate with gene expression due to the
variety of epigenetic changes, I questioned if altered expression of the miRNAs found by
NGS could potentially be regulated by DNA methylation. Interestingly, Spearman’s rank
correlation test showed that four CpGs were significantly correlated either negatively
(miR-1292) or positively (miR-135b, miR-182 and miR-602) with four corresponding
miRNAs in the colon tumor samples at p<0.05. However, this finding should be
confirmed in a larger number of samples and further experimentally studied whether the
correlation is by direct interaction. Although I observed the above interesting altered
methylation events in tumors compared to normal tissues, there are also limitations that
should be considered when interpreting the results. One limitation is that many of the
differentially methylated loci were presented by a single statistically significant CpG site
and it may be difficult to explain how a single CpG site could contribute to altered
complex biological mechanisms such as gene expression. However, this limitation can be
overcome by further analysis of the surrounding CpGs. This can be best accomplished by
other types of analysis such as NGS of bisulfite converted DNA giving a more
comprehensive resolution at single nucleotide level.
144
Significance and Future Perspectives
The above findings may hopefully give biological insight into the genes and pathways
involved in the MSS/CIMP-neg colon cancer development and furhermore, the identified
specific molecular changes may potentially lead to the discovery of new colon cancer
biomarkers. My dissertation projects focused only on the identification of the aberrant
epigenetic events, but there are several interesting areas for future work.
1) Functional research to understand the biological significance of the epigenetic
alterations found:
Project Further research question
miRNAs Are the correlated potential target genes direct targets? Specific miRNA mimics or inhibitors could be designed for each
identified miRNAs for artificial upregulation and downregulation of
target mRNA translation.
Are the identified miRNAs regulated by other epigenetic mechanism such as DNA methylation?
The 5’- Azacytidine and 5’-aza-2’ deoxycytidine as inhibitors of DNA
methylation could be utilized to treat colon cancer cell lines to study if
hypermethylation or hypomethylation is directly associated with
decreased or increased miRNA expression.
Is the expression of the identified miRNAs associated with copy number variation?
The relationship of DNA copy number variation and expression of the
miRNAs in matched colon cancer tissues and adjacent normal tissues
could be studied by quantitative PCR.
lncRNAs Does the CCAT2 lncRNA regulate genes? Gain and loss of function experiments could be done.
Does the CCAT2 lncRNA regulate proteins? RNA immunoprecipitation could be used to isolate proteins bound to
CCAT2.
145
DNA
methylation Do differentially methylated loci regulate host gene
expression directly? The 5’- Azacytidine and 5’-aza-2’ deoxycytidine as inhibitors of DNA
methylation could be utilized to treat colon cancer cell lines to study if
hypermethylation or hypomethylation is directly associated with
decreased or increased host gene expression.
2) Translational research using the potential new epigenetic biomarkers in tissues or
body fluids such as urine, blood, plasma, serum, and stool samples; for diagnosis and
prognosis:
Project Further research question
miRNAs Can the identified miRNAs be used for markers of colon cancer development?
The identified miRNAs could be analyzed in different colon cancer
stages such as adenoma, adeno-carcinoma, carcinoma, and advanced
carcinoma to see if they are involed in the early changes in the
tumorigenesis or they are rather markers for advanced carcinomas.
Can the identified miRNAs be used as biomarkers for diagnosis and prognosis?
The identified miRNAs could be analyzed in peripheral fluids like urine, blood, plasma, serum, and stool samples.
Are SNPs in the identified miRNAs associated with colon
cancer risk? MiRNA-related SNPs (MirSNPs) might promote carcinogenesis by
affecting miRNA function and/or maturation. Among 19 identified
miRNAs, four miRNAs (miR-182 (rs76481776), miR-202
(rs12355840), miR-602 (rs201175632), miR-1268 (rs28599926))
include SNPs in their pre-miRNA sequences. Therefore, an
association study between the MirSNPs and colon cancer risk in a
larger number of samples could be done.
lncRNAs Can the CCAT2 lncRNA be used as a colon cancer development marker?
The CCAT2 lncRNA could be analyzed in different colon cancer
stages such as adenoma, adeno-carcinoma, carcinoma, and advanced
carcinoma to see if its expression is related to early changes in the
tumorigenesis or whether it is a marker for advanced carcinomas.
Is the CCAT2 lncRNA associated with patients’ survival or other risk factors of colon cancer?
Since there was very limited clinical information for the samples in
146
this cohort, I could not study this. However, an association study
between expression of the CCAT2 and clinical parameters could give
insight into the role of this potential biomarker.
DNA
methylation Can the altered DNA methylation events be used as
biomarkers? The altered DNA methylation events could be studied in peripheral
fluids like urine, blood, plasma, serum, and stool samples for
diagnosis and prognosis.
3) Epigenetic Epidemiology studies to understand the epigenetic markers of response to
environmental exposures and life style that contribute to the development of colon cancer:
miRNAs,
ncRNAs, and
DNA
methylation
Increasing evidence shows that aging, environmental factors and lifestyle, as well as dietary factors may influence epigenetic mechanisms such as miRNAs and DNA methylation. Therefore, epigenetic changes could represent an important pathway to bridge the effects of environmental factors and the lifestyle. If there were information for these factors for the individuals, a correlation study could be done between the factors and epigenetic mechanisms. However, we have to keep in mind that these epigenetic changes are cumulative and they manifest over time.
147
REFERENCES
(2004). "Finishing the euchromatic sequence of the human genome." Nature 431(7011):
931-945.
(2012). "Comprehensive molecular characterization of human colon and rectal cancer."
Nature 487(7407): 330-337.
Albano, F., L. Anelli, et al. (2010). "Non random distribution of genomic features in
breakpoint regions involved in chronic myeloid leukemia cases with variant t(9;22)
or additional chromosomal rearrangements." Mol Cancer 9: 120.
Ali, N. A., M. J. McKay, et al. (2010). "Proteomics of Smad4 regulated transforming
growth factor-beta signalling in colon cancer cells." Mol Biosyst 6(11): 2332-
2338.
Ali, S., K. Almhanna, et al. (2010). "Differentially expressed miRNAs in the plasma may
provide a molecular signature for aggressive pancreatic cancer." Am J Transl Res
3(1): 28-47.
Arora, R., C. M. Brun, et al. (2012). "Transcription regulates telomere dynamics in human
cancer cells." RNA 18(4): 684-693.
Bandres, E., E. Cubedo, et al. (2006). "Identification by Real-time PCR of 13 mature
microRNAs differentially expressed in colorectal cancer and non-tumoral
tissues." Mol Cancer 5: 29.
Banfai, B., H. Jia, et al. (2012). "Long noncoding RNAs are rarely translated in two human
cell lines." Genome Res 22(9): 1646-1657.
Bar, M., S. K. Wyman, et al. (2008). "MicroRNA discovery and profiling in human
embryonic stem cells by deep sequencing of small RNA libraries." Stem Cells
26(10): 2496-2505.
Bellacosa, A. (2003). "Genetic hits and mutation rate in colorectal tumorigenesis:
versatility of Knudson's theory and implications for cancer prevention." Genes
Chromosomes Cancer 38(4): 382-388.
Beroukhim, R., C. H. Mermel, et al. (2010). "The landscape of somatic copy-number
alteration across human cancers." Nature 463(7283): 899-905.
Bestor, T. H. (1992). "Activation of mammalian DNA methyltransferase by cleavage of a
Zn binding regulatory domain." EMBO J 11(7): 2611-2617.
Bibikova, M., B. Barnes, et al. (2011). "High density DNA methylation array with single
CpG site resolution." Genomics 98(4): 288-295.
Bibikova, M., Z. Lin, et al. (2006). "High-throughput DNA methylation profiling using
universal bead arrays." Genome Res 16(3): 383-393.
Birney, E., J. A. Stamatoyannopoulos, et al. (2007). "Identification and analysis of
functional elements in 1% of the human genome by the ENCODE pilot project."
Nature 447(7146): 799-816.
Boland, C. R. and A. Goel (2010). "Microsatellite instability in colorectal cancer."
Gastroenterology 138(6): 2073-2087 e2073.
Borun, T. W., D. Pearson, et al. (1972). "Studies of histone methylation during the HeLa
S-3 cell cycle." J Biol Chem 247(13): 4288-4298.
Brannan, C. I., E. C. Dees, et al. (1990). "The product of the H19 gene may function as an
RNA." Mol Cell Biol 10(1): 28-36.
Brar, S. S., Z. Corbin, et al. (2003). "NOX5 NAD(P)H oxidase regulates growth and
apoptosis in DU 145 prostate cancer cells." Am J Physiol Cell Physiol 285(2):
148
C353-369.
Brown, C. J., A. Ballabio, et al. (1991). "A gene from the region of the human X
inactivation centre is expressed exclusively from the inactive X chromosome."
Nature 349(6304): 38-44.
Brown, C. J., B. D. Hendrich, et al. (1992). "The human XIST gene: analysis of a 17 kb
inactive X-specific RNA that contains conserved repeats and is highly localized
within the nucleus." Cell 71(3): 527-542.
Bullrich, F., H. Fujii, et al. (2001). "Characterization of the 13q14 tumor suppressor locus
in CLL: identification of ALT1, an alternative splice variant of the LEU2 gene."
Cancer Res 61(18): 6640-6648.
Burk, U., J. Schubert, et al. (2008). "A reciprocal repression between ZEB1 and members
of the miR-200 family promotes EMT and invasion in cancer cells." EMBO Rep
9(6): 582-589.
Byun, H. M., K. D. Siegmund, et al. (2009). "Epigenetic profiling of somatic tissues from
human autopsy specimens identifies tissue- and individual-specific DNA
methylation patterns." Hum Mol Genet 18(24): 4808-4817.
Calin, G. A., C. G. Liu, et al. (2007). "Ultraconserved regions encoding ncRNAs are
altered in human leukemias and carcinomas." Cancer Cell 12(3): 215-229.
Calin, G. A., C. Sevignani, et al. (2004). "Human microRNA genes are frequently located
at fragile sites and genomic regions involved in cancers." Proc Natl Acad Sci U S
A 101(9): 2999-3004.
Cannell, I. G., Y. W. Kong, et al. (2008). "How do microRNAs regulate gene expression?"
Biochem Soc Trans 36(Pt 6): 1224-1231.
Chan, H. M. and N. B. La Thangue (2001). "p300/CBP proteins: HATs for transcriptional
bridges and scaffolds." J Cell Sci 114(Pt 13): 2363-2373.
Chang, T. C., E. A. Wentzel, et al. (2007). "Transactivation of miR-34a by p53 broadly
influences gene expression and promotes apoptosis." Mol Cell 26(5): 745-752.
Cheetham, S. W., F. Gruhl, et al. (2013). "Long noncoding RNAs and the genetics of
cancer." Br J Cancer 108(12): 2419-2425.
Chen, H. J., C. M. Lin, et al. (2006). "The role of microtubule actin cross-linking factor 1
(MACF1) in the Wnt signaling pathway." Genes Dev 20(14): 1933-1945.
Chervona, Y. and M. Costa (2012). "Histone modifications and cancer: biomarkers of
prognosis?" Am J Cancer Res 2(5): 589-597.
Chiang, H. R., L. W. Schoenfeld, et al. (2010). "Mammalian microRNAs: experimental
evaluation of novel and previously annotated genes." Genes Dev 24(10): 992-
1009.
Chung, S., H. Nakagawa, et al. (2011). "Association of a novel long non-coding RNA in
8q24 with prostate cancer susceptibility." Cancer Sci 102(1): 245-252.
Chung, W., B. Kwabi-Addo, et al. (2008). "Identification of novel tumor markers in
prostate, colon and breast cancer by unbiased methylation profiling." PLoS One
3(4): e2079.
Cittelly, D. M., P. M. Das, et al. (2010). "Downregulation of miR-342 is associated with
tamoxifen resistant breast tumors." Mol Cancer 9: 317.
Cohen, I., E. Poreba, et al. (2011). "Histone modifiers in cancer: friends or foes?" Genes
Cancer 2(6): 631-647.
Colaluca, I. N., D. Tosoni, et al. (2008). "NUMB controls p53 tumour suppressor activity."
Nature 451(7174): 76-80.
Cooper, K., H. Squires, et al. (2010). "Chemoprevention of colorectal cancer: systematic
review and economic evaluation." Health Technol Assess 14(32): 1-206.
149
Costa, F. F. (2005). "Non-coding RNAs: new players in eukaryotic biology." Gene 357(2):
83-94.
Cui, H., M. Cruz-Correa, et al. (2003). "Loss of IGF2 imprinting: a potential marker of
colorectal cancer risk." Science 299(5613): 1753-1755.
Cummins, J. M., Y. He, et al. (2006). "The colorectal microRNAome." Proc Natl Acad Sci
U S A 103(10): 3687-3692.
Davis, B. N., A. C. Hilyard, et al. (2010). "Smad proteins bind a conserved RNA sequence
to promote microRNA maturation by Drosha." Mol Cell 39(3): 373-384.
Davis, P. K. and R. K. Brackmann (2003). "Chromatin remodeling and cancer." Cancer
Biol Ther 2(1): 22-29.
Denli, A. M., B. B. Tops, et al. (2004). "Processing of primary microRNAs by the
Microprocessor complex." Nature 432(7014): 231-235.
Du, P., X. Zhang, et al. (2010). "Comparison of Beta-value and M-value methods for
quantifying methylation levels by microarray analysis." BMC Bioinformatics 11:
587.
Easton, D. F., K. A. Pooley, et al. (2007). "Genome-wide association study identifies
novel breast cancer susceptibility loci." Nature 447(7148): 1087-1093.
Eckhardt, F., J. Lewin, et al. (2006). "DNA methylation profiling of human chromosomes 6,
20 and 22." Nat Genet 38(12): 1378-1385.
Esteller, M. (2007). "Cancer epigenomics: DNA methylomes and histone-modification
maps." Nat Rev Genet 8(4): 286-298.
Esteller, M. (2008). "Epigenetics in cancer." N Engl J Med 358(11): 1148-1159.
Esteller, M. (2011). "Non-coding RNAs in human disease." Nat Rev Genet 12(12): 861-
874.
Faber, C., T. Kirchner, et al. (2009). "The impact of microRNAs on colorectal cancer."
Virchows Arch 454(4): 359-367.
Fang, F., S. Turcan, et al. (2011). "Breast cancer methylomes establish an epigenomic
foundation for metastasis." Sci Transl Med 3(75): 75ra25.
Fearon, E. R. (2011). "Molecular genetics of colorectal cancer." Annu Rev Pathol 6: 479-
507.
Fearon, E. R. and B. Vogelstein (1990). "A genetic model for colorectal tumorigenesis."
Cell 61(5): 759-767.
Feinberg, A. P. and B. Tycko (2004). "The history of cancer epigenetics." Nat Rev
Cancer 4(2): 143-153.
Feinberg, A. P. and B. Vogelstein (1983). "Hypomethylation distinguishes genes of some
human cancers from their normal counterparts." Nature 301(5895): 89-92.
Feinberg, A. P. and B. Vogelstein (1983). "Hypomethylation of ras oncogenes in primary
human cancers." Biochem Biophys Res Commun 111(1): 47-54.
Fischle, W., Y. Wang, et al. (2003). "Histone and chromatin cross-talk." Curr Opin Cell
Biol 15(2): 172-183.
Foster, C. S., A. Falconer, et al. (2004). "Transcription factor E2F3 overexpressed in
prostate cancer independently predicts clinical outcome." Oncogene 23(35):
5871-5879.
Frosina, G., P. Fortini, et al. (1996). "Two pathways for base excision repair in
mammalian cells." J Biol Chem 271(16): 9573-9578.
Fullgrabe, J., E. Kavanagh, et al. (2011). "Histone onco-modifications." Oncogene 30(31):
3391-3403.
Gardiner-Garden, M. and M. Frommer (1987). "CpG islands in vertebrate genomes." J Mol
Biol 196(2): 261-282.
150
Garzon, R., G. A. Calin, et al. (2009). "MicroRNAs in Cancer." Annu Rev Med 60: 167-179.
Ghoussaini, M., H. Song, et al. (2008). "Multiple loci with different cancer specificities
within the 8q24 gene desert." J Natl Cancer Inst 100(13): 962-966.
Goldberg, A. D., C. D. Allis, et al. (2007). "Epigenetics: a landscape takes shape." Cell
128(4): 635-638.
Goll, M. G., F. Kirpekar, et al. (2006). "Methylation of tRNAAsp by the DNA
methyltransferase homolog Dnmt2." Science 311(5759): 395-398.
Goto, T., H. Mizukami, et al. (2009). "Aberrant methylation of the p16 gene is frequently
detected in advanced colorectal cancer." Anticancer Res 29(1): 275-277.
Greger, V., E. Passarge, et al. (1989). "Epigenetic changes may contribute to the
formation and spontaneous regression of retinoblastoma." Hum Genet 83(2): 155-
158.
Gregorieff, A. and H. Clevers (2005). "Wnt signaling in the intestinal epithelium: from
endoderm to cancer." Genes Dev 19(8): 877-890.
Grewal, S. I. and S. Jia (2007). "Heterochromatin revisited." Nat Rev Genet 8(1): 35-46.
Grimson, A., K. K. Farh, et al. (2007). "MicroRNA targeting specificity in mammals:
determinants beyond seed pairing." Mol Cell 27(1): 91-105.
Guo, C., J. F. Sah, et al. (2008). "The noncoding RNA, miR-126, suppresses the growth of
neoplastic cells by targeting phosphatidylinositol 3-kinase signaling and is
frequently lost in colon cancers." Genes Chromosomes Cancer 47(11): 939-946.
Gupta, R. A., N. Shah, et al. (2010). "Long non-coding RNA HOTAIR reprograms
chromatin state to promote cancer metastasis." Nature 464(7291): 1071-1076.
Hamajima, N., M. Kouwaki, et al. (1998). "Dihydropyrimidinase deficiency: structural
organization, chromosomal localization, and mutation analysis of the human
dihydropyrimidinase gene." Am J Hum Genet 63(3): 717-726.
Hamfjord, J., A. M. Stangeland, et al. (2012). "Differential expression of miRNAs in
colorectal cancer: comparison of paired tumor tissue and adjacent normal mucosa
using high-throughput sequencing." PLoS One 7(4): e34150.
Herman, J. G. and S. B. Baylin (2003). "Gene silencing in cancer in association with
promoter hypermethylation." N Engl J Med 349(21): 2042-2054.
Herman, J. G., F. Latif, et al. (1994). "Silencing of the VHL tumor-suppressor gene by
DNA methylation in renal carcinoma." Proc Natl Acad Sci U S A 91(21): 9700-
9704.
Hernandez-Blazquez, F. J., M. Habib, et al. (2000). "Evaluation of global DNA
hypomethylation in human colon cancer tissues by immunohistochemistry and
image analysis." Gut 47(5): 689-693.
Hershey, A. D., J. Dixon, et al. (1953). "Nucleic acid economy in bacteria infected with
bacteriophage T2. I. Purine and pyrimidine composition." J Gen Physiol 36(6):
777-789.
Hinoue, T., D. J. Weisenberger, et al. (2012). "Genome-scale analysis of aberrant DNA
methylation in colorectal cancer." Genome Res 22(2): 271-282.
Hu, J., A. L. Ho, et al. (2013). "From the Cover: Neutralization of terminal differentiation
in gliomagenesis." Proc Natl Acad Sci U S A 110(36): 14520-14527.
Huarte, M., M. Guttman, et al. (2010). "A large intergenic noncoding RNA induced by p53
mediates global gene repression in the p53 response." Cell 142(3): 409-419.
Huppi, K., J. J. Pitt, et al. (2012). "The 8q24 gene desert: an oasis of non-coding
transcriptional activity." Front Genet 3: 69.
Huppi, K., N. Volfovsky, et al. (2008). "The identification of microRNAs in a genomically
unstable region of human chromosome 8q24." Mol Cancer Res 6(2): 212-221.
151
Hur, K., P. Cejas, et al. (2013). "Hypomethylation of long interspersed nuclear element-1
(LINE-1) leads to activation of proto-oncogenes in human colorectal cancer
metastasis." Gut.
Imamura, T., S. Yamamoto, et al. (2004). "Non-coding RNA directed DNA demethylation
of Sphk1 CpG island." Biochem Biophys Res Commun 322(2): 593-600.
Iorio, M. V. and C. M. Croce (2012). "MicroRNA dysregulation in cancer: diagnostics,
monitoring and therapeutics. A comprehensive review." EMBO Mol Med 4(3):
143-159.
Irizarry, R. A., C. Ladd-Acosta, et al. (2009). "The human colon cancer methylome shows
similar hypo- and hypermethylation at conserved tissue-specific CpG island
shores." Nat Genet 41(2): 178-186.
Jacobs, D. I., Y. Mao, et al. (2013). "Dysregulated methylation at imprinted genes in
prostate tumor tissue detected by methylation microarray." BMC Urol 13(1): 37.
Jass, J. R. (2007). "Classification of colorectal cancer based on correlation of clinical,
morphological and molecular features." Histopathology 50(1): 113-130.
Jia, L., G. Landan, et al. (2009). "Functional enhancers at the gene-poor 8q24 cancer-
linked locus." PLoS Genet 5(8): e1000597.
Jiang, P., H. Wu, et al. (2007). "MiPred: classification of real and pseudo microRNA
precursors using random forest prediction model with combined features."
Nucleic Acids Res 35(Web Server issue): W339-344.
John, B., A. J. Enright, et al. (2004). "Human MicroRNA targets." PLoS Biol 2(11): e363.
Johnson, S. M., H. Grosshans, et al. (2005). "RAS is regulated by the let-7 microRNA
family." Cell 120(5): 635-647.
Jones, P. A. (2012). "Functions of DNA methylation: islands, start sites, gene bodies and
beyond." Nat Rev Genet 13(7): 484-492.
Kanduri, M., N. Cahill, et al. (2010). "Differential genome-wide array-based methylation
profiles in prognostic subsets of chronic lymphocytic leukemia." Blood 115(2):
296-305.
Kapranov, P., J. Cheng, et al. (2007). "RNA maps reveal new RNA classes and a possible
function for pervasive transcription." Science 316(5830): 1484-1488.
Karpinski, P., M. Walter, et al. (2013). "Intermediate- and low-methylation epigenotypes
do not correspond to CpG island methylator phenotype (low and -zero) in
colorectal cancer." Cancer Epidemiol Biomarkers Prev 22(2): 201-208.
Kastler, S., L. Honold, et al. (2010). "POU5F1P1, a putative cancer susceptibility gene, is
overexpressed in prostatic carcinoma." Prostate 70(6): 666-674.
Katada, T., H. Ishiguro, et al. (2009). "microRNA expression profile in undifferentiated
gastric cancer." Int J Oncol 34(2): 537-542.
Kawasaki, T., M. Ohnishi, et al. (2008). "WRN promoter methylation possibly connects
mucinous differentiation, microsatellite instability and CpG island methylator
phenotype in colorectal cancer." Mod Pathol 21(2): 150-158.
Khvorova, A., A. Reynolds, et al. (2003). "Functional siRNAs and miRNAs exhibit strand
bias." Cell 115(2): 209-216.
Kim, M. S., J. Lee, et al. (2010). "DNA methylation markers in colorectal cancer." Cancer
Metastasis Rev 29(1): 181-206.
Kim, S., M. Choi, et al. (2009). "Identifying the target mRNAs of microRNAs in colorectal
cancer." Comput Biol Chem 33(1): 94-99.
Kinzler, K. W. and B. Vogelstein (1996). "Lessons from hereditary colorectal cancer."
Cell 87(2): 159-170.
Kitamura, E., J. Igarashi, et al. (2007). "Analysis of tissue-specific differentially
152
methylated regions (TDMs) in humans." Genomics 89(3): 326-337.
Konishi, H., D. Ichikawa, et al. (2012). "Detection of gastric cancer-associated
microRNAs on microRNA microarray comparing pre- and post-operative plasma."
Br J Cancer 106(4): 740-747.
Koshiishi, N., J. M. Chong, et al. (2004). "p300 gene alterations in intestinal and diffuse
types of gastric carcinoma." Gastric Cancer 7(2): 85-90.
Krek, A., D. Grun, et al. (2005). "Combinatorial microRNA target predictions." Nat Genet
37(5): 495-500.
Kunzelmann, K. (2005). "Ion channels and cancer." J Membr Biol 205(3): 159-173.
Lagos-Quintana, M., R. Rauhut, et al. (2001). "Identification of novel genes coding for
small expressed RNAs." Science 294(5543): 853-858.
Lam, L. T., X. Lu, et al. (2010). "A microRNA screen to identify modulators of sensitivity
to BCL2 inhibitor ABT-263 (navitoclax)." Mol Cancer Ther 9(11): 2943-2950.
Le Marchand, L., A. Seifried, et al. (2003). "Association of the cyclin D1 A870G
polymorphism with advanced colorectal cancer." JAMA 290(21): 2843-2848.
Lee, J., S. J. Jang, et al. (2010). "Presence of 5-methylcytosine in CpNpG trinucleotides
in the human genome." Genomics 96(2): 67-72.
Lee, J. T. (2012). "Epigenetic regulation by long noncoding RNAs." Science 338(6113):
1435-1439.
Lee, R. C., R. L. Feinbaum, et al. (1993). "The C. elegans heterochronic gene lin-4
encodes small RNAs with antisense complementarity to lin-14." Cell 75(5): 843-
854.
Lengauer, C., K. W. Kinzler, et al. (1998). "Genetic instabilities in human cancers." Nature
396(6712): 643-649.
Lerebours, F., G. Cizeron-Clairac, et al. (2013). "miRNA expression profiling of
inflammatory breast cancer identifies a 5-miRNA signature predictive of breast
tumor aggressiveness." Int J Cancer 133(7): 1614-1623.
Lewis, B. P., C. B. Burge, et al. (2005). "Conserved seed pairing, often flanked by
adenosines, indicates that thousands of human genes are microRNA targets." Cell
120(1): 15-20.
Li, J., S. Donath, et al. (2010). "miR-30 regulates mitochondrial fission through targeting
p53 and the dynamin-related protein-1 pathway." PLoS Genet 6(1): e1000795.
Lin, O. S. (2009). "Acquired risk factors for colorectal cancer." Methods Mol Biol 472:
361-372.
Ling, H., R. Spizzo, et al. (2013). "CCAT2, a novel noncoding RNA mapping to 8q24,
underlies metastatic progression and chromosomal instability in colon cancer."
Genome Res 23(9): 1446-1461.
Liu, M. and H. Chen (2010). "The role of microRNAs in colorectal cancer." J Genet
Genomics 37(6): 347-358.
Liu, S. P., R. H. Fu, et al. (2009). "MicroRNAs regulation modulated self-renewal and
lineage differentiation of stem cells." Cell Transplant 18(9): 1039-1045.
Loeb, L. A., K. R. Loeb, et al. (2003). "Multiple mutations and cancer." Proc Natl Acad Sci
U S A 100(3): 776-781.
Lu, J., G. Getz, et al. (2005). "MicroRNA expression profiles classify human cancers."
Nature 435(7043): 834-838.
Lui, W. O., N. Pourmand, et al. (2007). "Patterns of known and novel small RNAs in
human cervical cancer." Cancer Res 67(13): 6031-6043.
Luo, Z., C. Lin, et al. (2012). "The super elongation complex (SEC) family in
transcriptional control." Nat Rev Mol Cell Biol 13(9): 543-547.
153
Marques, C. J., P. Costa, et al. (2008). "Abnormal methylation of imprinted genes in
human sperm is associated with oligozoospermia." Mol Hum Reprod 14(2): 67-74.
Maunakea, A. K., I. Chepelev, et al. (2010). "Epigenome mapping in normal and disease
States." Circ Res 107(3): 327-339.
Melo, S. A. and M. Esteller (2011). "Dysregulation of microRNAs in cancer: playing with
fire." FEBS Lett 585(13): 2087-2099.
Meng, R. D., C. C. Shelton, et al. (2009). "gamma-Secretase inhibitors abrogate
oxaliplatin-induced activation of the Notch-1 signaling pathway in colon cancer
cells resulting in enhanced chemosensitivity." Cancer Res 69(2): 573-582.
Mercer, T. R., M. E. Dinger, et al. (2009). "Long non-coding RNAs: insights into
functions." Nat Rev Genet 10(3): 155-159.
Mercer, T. R. and J. S. Mattick (2013). "Structure and function of long noncoding RNAs in
epigenetic regulation." Nat Struct Mol Biol 20(3): 300-307.
Metzker, M. L. (2010). "Sequencing technologies - the next generation." Nat Rev Genet
11(1): 31-46.
Migliore, L., F. Migheli, et al. (2011). "Genetics, cytogenetics, and epigenetics of
colorectal cancer." J Biomed Biotechnol 2011: 792362.
Mikkelsen, T. S., M. Ku, et al. (2007). "Genome-wide maps of chromatin state in
pluripotent and lineage-committed cells." Nature 448(7153): 553-560.
Morgan, H. D., W. Dean, et al. (2004). "Activation-induced cytidine deaminase deaminates
5-methylcytosine in DNA and is expressed in pluripotent tissues: implications for
epigenetic reprogramming." J Biol Chem 279(50): 52353-52360.
Murakami, Y., A. Tamori, et al. (2013). "The expression level of miR-18b in
hepatocellular carcinoma is associated with the grade of malignancy and
prognosis." BMC Cancer 13: 99.
Nagel, R., C. le Sage, et al. (2008). "Regulation of the adenomatous polyposis coli gene by
the miR-135 family in colorectal cancer." Cancer Res 68(14): 5795-5802.
Nakanishi, H., T. Suda, et al. (2004). "Loss of imprinting of PEG1/MEST in lung cancer
cell lines." Oncol Rep 12(6): 1273-1278.
Nam, J. W., J. Kim, et al. (2006). "ProMiR II: a web server for the probabilistic prediction
of clustered, nonclustered, conserved and nonconserved microRNAs." Nucleic
Acids Res 34(Web Server issue): W455-458.
Ng, E. K., W. W. Chong, et al. (2009). "Differential expression of microRNAs in plasma of
patients with colorectal cancer: a potential marker for colorectal cancer
screening." Gut 58(10): 1375-1381.
Nicoloso, M. S., H. Sun, et al. (2010). "Single-nucleotide polymorphisms inside microRNA
target sites influence tumor susceptibility." Cancer Res 70(7): 2789-2798.
Nie, J., L. Liu, et al. (2012). "microRNA-365, down-regulated in colon cancer, inhibits
cell cycle progression and promotes apoptosis of colon cancer cells by probably
targeting Cyclin D1 and Bcl-2." Carcinogenesis 33(1): 220-225.
Nishihara, S., T. Hayashida, et al. (2000). "Multipoint imprinting analysis in sporadic
colorectal cancers with and without microsatellite instability." Int J Oncol 17(2):
317-322.
Nissan, A., A. Stojadinovic, et al. (2012). "Colon cancer associated transcript-1: a novel
RNA expressed in malignant and pre-malignant human tissues." Int J Cancer
130(7): 1598-1606.
Noguchi, T., K. Tanimoto, et al. (2004). "Aberrant methylation of DPYD promoter, DPYD
expression, and cellular sensitivity to 5-fluorouracil in cancer cells." Clin Cancer
Res 10(20): 7100-7107.
154
Noushmehr, H., D. J. Weisenberger, et al. (2010). "Identification of a CpG island
methylator phenotype that defines a distinct subgroup of glioma." Cancer Cell
17(5): 510-522.
O'Toole, A. S., S. Miller, et al. (2006). "Comprehensive thermodynamic analysis of 3'
double-nucleotide overhangs neighboring Watson-Crick terminal base pairs."
Nucleic Acids Res 34(11): 3338-3344.
Ogino, S. and A. Goel (2008). "Molecular classification and correlates in colorectal
cancer." J Mol Diagn 10(1): 13-27.
Okano, M., D. W. Bell, et al. (1999). "DNA methyltransferases Dnmt3a and Dnmt3b are
essential for de novo methylation and mammalian development." Cell 99(3): 247-
257.
Pabst, O., R. Forster, et al. (2000). "NKX2.3 is required for MAdCAM-1 expression and
homing of lymphocytes in spleen and mucosa-associated lymphoid tissue." EMBO
J 19(9): 2015-2023.
Pedersen, I. S., P. A. Dervan, et al. (1999). "Frequent loss of imprinting of PEG1/MEST in
invasive breast cancer." Cancer Res 59(21): 5449-5451.
Penn, N. W., R. Suwalski, et al. (1972). "The presence of 5-hydroxymethylcytosine in
animal deoxyribonucleic acid." Biochem J 126(4): 781-790.
Pino, M. S. and D. C. Chung (2010). "The chromosomal instability pathway in colon
cancer." Gastroenterology 138(6): 2059-2072.
Pomerantz, M. M., N. Ahmadiyeh, et al. (2009). "The 8q24 cancer risk variant rs6983267
shows long-range interaction with MYC in colorectal cancer." Nat Genet 41(8):
882-884.
Pomerantz, M. M., C. A. Beckwith, et al. (2009). "Evaluation of the 8q24 prostate cancer
risk locus and MYC expression." Cancer Res 69(13): 5568-5574.
Powell, S. M., N. Zilz, et al. (1992). "APC mutations occur early during colorectal
tumorigenesis." Nature 359(6392): 235-237.
Poynter, J. N., J. C. Figueiredo, et al. (2007). "Variants on 9p24 and 8q24 are associated
with risk of colorectal cancer: results from the Colon Cancer Family Registry."
Cancer Res 67(23): 11128-11132.
Pritchard, C. C. and W. M. Grady (2011). "Colorectal cancer molecular biology moves into
clinical practice." Gut 60(1): 116-129.
Queller, D. C., J. E. Strassmann, et al. (1993). "Microsatellites and kinship." Trends Ecol
Evol 8(8): 285-288.
Radhakrishnan, A., N. Badhrinarayanan, et al. (2009). "Analysis of chromosomal
aberration (1, 3, and 8) and association of microRNAs in uveal melanoma." Mol
Vis 15: 2146-2154.
Rakyan, V. K., T. A. Down, et al. (2008). "An integrated resource for genome-wide
identification and analysis of human tissue-specific differentially methylated
regions (tDMRs)." Genome Res 18(9): 1518-1529.
Rakyan, V. K., T. Hildmann, et al. (2004). "DNA methylation profiling of the human major
histocompatibility complex: a pilot study for the human epigenome project." PLoS
Biol 2(12): e405.
Redis, R. S., A. M. Sieuwerts, et al. (2013). "CCAT2, a novel long non-coding RNA in
breast cancer: expression study and clinical correlations." Oncotarget.
Redon, S., P. Reichenbach, et al. (2010). "The non-coding RNA TERRA is a natural ligand
and direct inhibitor of human telomerase." Nucleic Acids Res 38(17): 5797-5806.
Reinhart, B. J., F. J. Slack, et al. (2000). "The 21-nucleotide let-7 RNA regulates
developmental timing in Caenorhabditis elegans." Nature 403(6772): 901-906.
155
Rice, J. C. and C. D. Allis (2001). "Histone methylation versus histone acetylation: new
insights into epigenetic regulation." Curr Opin Cell Biol 13(3): 263-273.
Rinn, J. L., M. Kertesz, et al. (2007). "Functional demarcation of active and silent
chromatin domains in human HOX loci by noncoding RNAs." Cell 129(7): 1311-
1323.
Robertson, K. D. (2005). "DNA methylation and human disease." Nat Rev Genet 6(8):
597-610.
Sabates-Bellver, J., L. G. Van der Flier, et al. (2007). "Transcriptome profile of human
colorectal adenomas." Mol Cancer Res 5(12): 1263-1275.
Sandoval, J., H. Heyn, et al. (2011). "Validation of a DNA methylation microarray for
450,000 CpG sites in the human genome." Epigenetics 6(6): 692-702.
Sarver, A. L., A. J. French, et al. (2009). "Human colon cancer profiles show differential
microRNA expression depending on mismatch repair status and are characteristic
of undifferentiated proliferative states." BMC Cancer 9: 401.
Sarver, A. L., L. Li, et al. (2010). "MicroRNA miR-183 functions as an oncogene by
targeting the transcription factor EGR1 and promoting tumor cell migration."
Cancer Res 70(23): 9570-9580.
Saxonov, S., P. Berg, et al. (2006). "A genome-wide analysis of CpG dinucleotides in the
human genome distinguishes two distinct classes of promoters." Proc Natl Acad
Sci U S A 103(5): 1412-1417.
Schetter, A. J., S. Y. Leung, et al. (2008). "MicroRNA expression profiles associated with
prognosis and therapeutic outcome in colon adenocarcinoma." JAMA 299(4): 425-
436.
Sewer, A., N. Paul, et al. (2005). "Identification of clustered microRNAs using an ab initio
prediction method." BMC Bioinformatics 6: 267.
Shah, M. Y., X. Pan, et al. (2010). "5-Fluorouracil drug alters the microRNA expression
profiles in MCF-7 breast cancer cells." J Cell Physiol.
Shen, L., M. Toyota, et al. (2007). "Integrated genetic and epigenetic analysis identifies
three different subclasses of colon cancer." Proc Natl Acad Sci U S A 104(47):
18654-18659.
Shen, L. and R. A. Waterland (2007). "Methods of DNA methylation analysis." Curr Opin
Clin Nutr Metab Care 10(5): 576-581.
Shi, X., M. Sun, et al. (2013). "Long non-coding RNAs: a new frontier in the study of
human diseases." Cancer Lett 339(2): 159-166.
Shtivelman, E., B. Henglein, et al. (1989). "Identification of a human transcription unit
affected by the variant chromosomal translocations 2;8 and 8;22 of Burkitt
lymphoma." Proc Natl Acad Sci U S A 86(9): 3257-3260.
Siddiqui, H., D. A. Solomon, et al. (2003). "Histone deacetylation of RB-responsive
promoters: requisite for specific gene repression but dispensable for cell cycle
inhibition." Mol Cell Biol 23(21): 7719-7731.
Simmen, M. W. (2008). "Genome-scale relationships between cytosine methylation and
dinucleotide abundances in animals." Genomics 92(1): 33-40.
Singal, R. and G. D. Ginder (1999). "DNA methylation." Blood 93(12): 4059-4070.
Slaby, O., M. Svoboda, et al. (2009). "MicroRNAs in colorectal cancer: translation of
molecular biology into clinical application." Mol Cancer 8: 102.
Slattery, M. L., J. S. Herrick, et al. (2011). "Genetic variation in the TGF-beta signaling
pathway and colon and rectal cancer risk." Cancer Epidemiol Biomarkers Prev
20(1): 57-69.
Song, M. A., M. Tiirikainen, et al. (2013). "Elucidating the landscape of aberrant DNA
156
methylation in hepatocellular carcinoma." PLoS One 8(2): e55761.
Song, U. L. a. M.-A. (2011). "Dietary and Lifestyle Correlates of DNA Methylation."
Springer Science (Human Press).
Sotelo, J., D. Esposito, et al. (2010). "Long-range enhancers on 8q24 regulate c-Myc."
Proc Natl Acad Sci U S A 107(7): 3001-3005.
Strathdee, G., K. Appleton, et al. (2001). "Primary ovarian carcinomas display multiple
methylator phenotypes involving known tumor suppressor genes." Am J Pathol
158(3): 1121-1127.
Struhl, K. (1998). "Histone acetylation and transcriptional regulatory mechanisms." Genes
Dev 12(5): 599-606.
Sunami, E., M. de Maat, et al. (2011). "LINE-1 hypomethylation during primary colon
cancer progression." PLoS One 6(4): e18884.
Tahiliani, M., K. P. Koh, et al. (2009). "Conversion of 5-methylcytosine to 5-
hydroxymethylcytosine in mammalian DNA by MLL partner TET1." Science
324(5929): 930-935.
Takeda, J., S. Seino, et al. (1992). "Human Oct3 gene family: cDNA sequences,
alternative splicing, gene organization, chromosomal location, and expression at
low levels in adult tissues." Nucleic Acids Res 20(17): 4613-4620.
Tamai, H., K. Miyake, et al. (2011). "Resistance of MLL-AFF1-positive acute
lymphoblastic leukemia to tumor necrosis factor-alpha is mediated by S100A6
upregulation." Blood Cancer J 1(11): e38.
Tenesa, A., S. M. Farrington, et al. (2008). "Genome-wide association scan identifies a
colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24
and 18q21." Nat Genet 40(5): 631-637.
Tillinghast, G. W., J. Partee, et al. (2003). "Analysis of genetic stability at the EP300 and
CREBBP loci in a panel of cancer cell lines." Genes Chromosomes Cancer 37(2):
121-131.
Ting, D. T., D. Lipson, et al. (2011). "Aberrant overexpression of satellite repeats in
pancreatic and other epithelial cancers." Science 331(6017): 593-596.
Tomlinson, I. P., E. Webb, et al. (2008). "A genome-wide association study identifies
colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3." Nat
Genet 40(5): 623-630.
Toyota, M., N. Ahuja, et al. (1999). "CpG island methylator phenotype in colorectal
cancer." Proc Natl Acad Sci U S A 96(15): 8681-8686.
Toyota, M., N. Ahuja, et al. (1999). "Aberrant methylation in gastric cancer associated
with the CpG island methylator phenotype." Cancer Res 59(21): 5438-5442.
Trievel, R. C. (2004). "Structure and function of histone methyltransferases." Crit Rev
Eukaryot Gene Expr 14(3): 147-169.
Tuupanen, S., M. Turunen, et al. (2009). "The common colorectal cancer predisposition
SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt
signaling." Nat Genet 41(8): 885-890.
Tyagi, S., C. Vaz, et al. (2008). "CID-miRNA: a web server for prediction of novel miRNA
precursors in human genome." Biochem Biophys Res Commun 372(4): 831-834.
Umar, A., C. R. Boland, et al. (2004). "Revised Bethesda Guidelines for hereditary
nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability." J
Natl Cancer Inst 96(4): 261-268.
Varallyay, E., J. Burgyan, et al. (2008). "MicroRNA detection by northern blotting using
locked nucleic acid probes." Nat Protoc 3(2): 190-196.
Venturini, L., K. Battmer, et al. (2007). "Expression of the miR-17-92 polycistron in
157
chronic myeloid leukemia (CML) CD34+ cells." Blood 109(10): 4399-4405.
Vigneault, F., D. Ter-Ovanesyan, et al. (2012). "High-throughput multiplex sequencing of
miRNA." Curr Protoc Hum Genet Chapter 11: Unit 11 12 11-10.
Wang, C. J., Z. G. Zhou, et al. (2009). "Clinicopathological significance of microRNA-31, -
143 and -145 expression in colorectal cancer." Dis Markers 26(1): 27-34.
Wang, Q., M. Williamson, et al. (2007). "Hypomethylation of WNT5A, CRIP1 and S100P in
prostate cancer." Oncogene 26(45): 6560-6565.
Wang, R., W. M. Dashwood, et al. (2011). "NADPH oxidase overexpression in human
colon cancers and rat colon tumors induced by 2-amino-1-methyl-6-
phenylimidazo[4,5-b]pyridine (PhIP)." Int J Cancer 128(11): 2581-2590.
Wang, Z., C. Zang, et al. (2008). "Combinatorial patterns of histone acetylations and
methylations in the human genome." Nat Genet 40(7): 897-903.
Wark, A. W., H. J. Lee, et al. (2008). "Multiplexed detection methods for profiling
microRNA expression in biological samples." Angew Chem Int Ed Engl 47(4):
644-652.
Wei, E. K., E. Giovannucci, et al. (2004). "Comparison of risk factors for colon and rectal
cancer." Int J Cancer 108(3): 433-442.
Wilusz, J. E., H. Sunwoo, et al. (2009). "Long noncoding RNAs: functional surprises from
the RNA world." Genes Dev 23(13): 1494-1504.
Wiseman, M. (2008). "The second World Cancer Research Fund/American Institute for
Cancer Research expert report. Food, nutrition, physical activity, and the
prevention of cancer: a global perspective." Proc Nutr Soc 67(3): 253-256.
Witkos, T. M., E. Koscianska, et al. (2011). "Practical Aspects of microRNA Target
Prediction." Curr Mol Med 11(2): 93-109.
Wojcik, S. E., S. Rossi, et al. (2010). "Non-codingRNA sequence variations in human
chronic lymphocytic leukemia and colorectal cancer." Carcinogenesis 31(2): 208-
215.
Wong, J. J., N. J. Hawkins, et al. (2007). "Colorectal cancer: a model for epigenetic
tumorigenesis." Gut 56(1): 140-148.
Wright, J. B., S. J. Brown, et al. (2010). "Upregulation of c-MYC in cis through a large
chromatin loop linked to a cancer risk-associated single-nucleotide
polymorphism in colorectal cancer cells." Mol Cell Biol 30(6): 1411-1420.
Wyman, S. K., R. K. Parkin, et al. (2009). "Repertoire of microRNAs in epithelial ovarian
cancer as determined by next generation sequencing of small RNA cDNA
libraries." PLoS One 4(4): e5311.
Yamakuchi, M., M. Ferlito, et al. (2008). "miR-34a repression of SIRT1 regulates
apoptosis." Proc Natl Acad Sci U S A 105(36): 13421-13426.
Yan, L. X., X. F. Huang, et al. (2008). "MicroRNA miR-21 overexpression in human breast
cancer is associated with advanced clinical stage, lymph node metastasis and
patient poor prognosis." RNA 14(11): 2348-2360.
Yang, L., Z. Ma, et al. (2010). "MicroRNA-602 regulating tumor suppressive gene
RASSF1A is overexpressed in hepatitis B virus-infected liver and hepatocellular
carcinoma." Cancer Biol Ther 9(10): 803-808.
Yao, Y., A. L. Suo, et al. (2009). "MicroRNA profiling of human gastric cancer." Mol Med
Rep 2(6): 963-970.
Yasui, K., S. Arii, et al. (2002). "TFDP1, CUL4A, and CDC16 identified as targets for
amplification at 13q34 in hepatocellular carcinomas." Hepatology 35(6): 1476-
1484.
Yori, J. L., E. Johnson, et al. (2010). "Kruppel-like factor 4 inhibits epithelial-to-
158
mesenchymal transition through regulation of E-cadherin gene expression." J Biol
Chem 285(22): 16854-16863.
Zanke, B. W., C. M. Greenwood, et al. (2007). "Genome-wide association scan identifies a
colorectal cancer susceptibility locus on chromosome 8q24." Nat Genet 39(8):
989-994.
Zhao, J., B. K. Sun, et al. (2008). "Polycomb proteins targeted by a short repeat RNA to
the mouse X chromosome." Science 322(5902): 750-756.
Zhong, M., Z. Bian, et al. (2013). "miR-30a Suppresses Cell Migration and Invasion
Through Downregulation of PIK3CD in Colorectal Carcinoma." Cell Physiol
Biochem 31(2-3): 209-218.