Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Genetic and molecular characterization ofpaediatric endemic and sporadic Burkitt
lymphomaby
Bruno Grande
B.Sc., McGill University, 2013
Thesis Submitted in Partial Fulfillment of theRequirements for the Degree of
Doctor of Philosophy
in theDepartment of Molecular Biology and Biochemistry
Faculty of Science
© Bruno Grande 2019SIMON FRASER UNIVERSITY
Fall 2019
Copyright in this work rests with the author. Please ensure that any reproductionor reuse is done in accordance with the relevant national copyright legislation.
ApprovalName: Bruno Grande
Degree: Doctor of Philosophy (Molecular Biology andBiochemistry)
Title: Genetic and molecular characterization of paediatricendemic and sporadic Burkitt lymphoma
Examining Committee: Chair: Christopher BehProfessor
Ryan D. MorinSenior SupervisorAssociate Professor
Jack N. ChenSupervisorProfessor
Sohrab P. ShahSupervisorAssociate ProfessorDepartments of Pathologyand Computer ScienceUniversity of British Columbia
Sharon M. GorskiInternal ExaminerProfessor
Sandeep S. DavéExternal ExaminerProfessorDepartment of MedicineDuke University
Date Defended: December 3rd, 2019
ii
Ethics Statement
iii
Abstract
Though generally curable with intensive chemotherapy in resourcerich settings, Burkitt
lymphoma (BL) remains a deadly disease in older patients and in subSaharan Africa.
EpsteinBarr virus (EBV) positivity is a feature in over 90% of cases in malariaendemic
regions and up to 30% elsewhere. However, the molecular features of BL have not been
comprehensively evaluated when taking into account tumour EBV status or geographic
origin. In this thesis, I describe an integrative analysis of whole genome and transcriptome
data generated from a large cohort of endemic and sporadic paediatric BL patients. This
approach revealed that the mutational landscape of BL genomes is primarily shaped by
four different processes, and that at least two of them—aberrant somatic hypermutation
and defects in DNA mismatch repair—appear associated with the presence of EBV. After
identifying novel candidate BL genes such as SIN3A, USP7, and CHD8, I explored the
incidence of mutations affecting genes and pathways involved with BL pathogenesis and
found that EBVpositive tumours had significantly fewer driver mutations, especially
among genes with roles in apoptosis, and that this difference did not exist when
comparing geographic subtypes of BL. I also identified a subset of immunoglobulin
variable region genes encoding clonal Bcell receptors (BCRs) that were disproportionally
used in the tumours, including IGHV434, known to produce autoreactive antibodies, and
IGKV320, a feature described in other Bcell malignancies but not yet in BL. Many of
these results suggest that tumour EBV status defines a specific BL entity irrespective of
geographic origin with particular molecular properties and distinct pathogenic
mechanisms. The novel mutation patterns identified here imply potential improvements
that could be brought to BL therapy. This includes the rational use of DNAdamaging
chemotherapy in some BL patients and targeted agents such as the CDK4/6 inhibitor
palbociclib in others. The importance of BCR signaling in BL strengthens the potential
benefit of inhibitors for PI3K, Syk and Src family kinases among these patients. Lastly, the
identification of USP7 as a tumoursuppressor gene in BL highlights the potential clinical
utility of MDM2 inhibitors in treating patients with otherwise wildtype TP53.
iv
Keywords: Burkitt lymphoma; cancer genomics; whole genome and transcriptome
sequencing; pathogenesis; Epstein–Barr virus
v
Dedication
To my dad,
whose fateful battle with brain cancer
inspired me to pursue cancer research,
and my mom,
who did everything in her power to
ensure I could pursue cancer research.
vi
Acknowledgements
In the final year of my undergraduate degree in biochemistry, I realized that I wanted to
pursue graduate studies in bioinformatics. If I had been aware of how grossly
underqualified I was at the time, I might have given up on the ambition altogether.
However, naïve as I was, I submitted applications to join various research groups focused
on cancer genomics. My supervisor, Ryan Morin, was the only professor willing to take a
chance on me, someone with virtually no knowledge of bioinformatics. I will be forever
grateful for the risk you took back then, and I hope this dissertation means the gamble
paid off. Over the past six years, you have been instrumental in my growth as a scientist,
a writer, a teacher, a collaborator, and most importantly, an independent and critical
thinker. The level of support you provided, especially during those pivotal first few years,
was above and beyond what I have come to expect from busy professors. I have never
felt like you were out of reach if I had a question to ask or was seeking feedback. Thank
you for believing in me and providing me with career opportunities.
My PhD journey included many productive collaborations and rewarding interactions with
other researchers and administrative staff. First, I would like to thank Jack Chen and
Sohrab Shah for sitting on my supervisory committee and providing guidance throughout
my degree. I enjoyed picking your brains during committee meetings and having
thoughtprovoking discussions about my research. Similarly, I wish to extend my
appreciation to Sharon Gorski and Sandeep Davé for agreeing to act as my internal and
external examiners, respectively. Second, I want to acknowledge the many collaborators
on the Burkitt Lymphoma Genome Sequencing Project, especially Daniela Gerhard and
Louis Staudt. I have learned much from your scientific rigour, lessons that I shall carry
with me for the rest of my career. Third, I must thank the graduate program assistant for
my department, Mimi Fourie. I am truly grateful for the continual assistance you provided
me throughout my PhD degree. Finally, I would like to recognize the monumental effort
required to manage a project of this scale, particularly the role played by Karen Novik.
You have the patience of a saint, and despite how complicated the project was at times,
everything about it felt organized thanks for you.
vii
I had the pleasure of working with some amazing labmates, many of whom I consider
friends. Together, we achieved something that we should be proud of: building a
supportive and enriching research environment that fosters collaboration and skill sharing.
I enjoyed participating in those spontaneous conversations around the lab on topics
ranging from science to board games, and everything in between. The environment you
helped create made it easier for me to weather the challenges and frustrations of graduate
school. Specifically, I had the privilege of working with these outstanding colleagues:
Marco Albuquerque, Miguel Alcaide, Sarah Arthur, Kevin Bushell, Lauren Chong, Krysta
Coyle, Daniel Fornika, Laura Hilton, Aixiang Jiang, Rebecca Johnston, Marija Jovanovic,
Nicole Knoetze, Prasath Pararajalingam, Christopher Rushton, Selin Jessa, Jeffrey Tang,
and Nicole Thomas. Thank you for being such an amazing team!
This project was made possible with the generous financial support from various funding
agencies. I want to thank the Foundation for Burkitt Lymphoma Research, including its
Scientific Advisory Board, and the National Cancer Institute for their role in initiating,
funding, managing, and advising for this project. I also wish to acknowledge Simon Fraser
University and its private donors for endowing the following awards: Graduate Fellowship,
Dr. Bruce Brandhorst Graduate Prize in MBB, Travel and Minor Research Award,
Weyerhaeuser Molecular Biology Graduate Scholarship, President’s PhD Scholarship,
and Dean’s Graduate Fellowship. My stipend was funded in part by Genome Canada,
Genome British Columbia, the Canadian Institutes of Health Research, Mitacs, and the
Team Finn Foundation. Travel funds were provided by the Canadian Institutes of Health,
the Canadian Cancer Society, the John Bosdet Memorial Fund with BC Cancer, and the
Foundation for Burkitt Lymphoma Research.
These acknowledgements would not be complete if I did not thank my partner, Santina
Lin, for all of the moral support she has given me over the years. As a fellow
bioinformatician, you could actually empathize when I complained about software
installation issues or cryptic error messages in R. I have always felt like I had a shoulder
to lean on when the science proved difficult. You have this amazing knack for inspiring me
with your achievements, which encourages me to push myself harder and aim higher.
Through thick and thin, you stood by me and I will never forget that. I could not ask for a
better best friend.
viii
For my final acknowledgements, I need to provide some context. On Christmas Eve 1997,
my family found out that my dad had a brain tumour. We were told that it was inoperable
and prognosis was bleak. The doctors estimated that my dad had six months to live at
best. That would be the end of it if my parents had accepted their fate. I was 6 years old at
the time, and my brother and sister were even younger. We simply would not have known
our dad. That would indeed be the case if it was not for my parents’ determination. Within
a few weeks, we found a neurosurgeon willing to operate on my dad. The surgery was
successful and had no neurological complications. My dad was back at work a mere two
months later and resumed his life as if the whole thing had just been a nightmare.
Alas, I am afraid this story does not have a happy ending. Five years later, owing to a limp
my dad developed, we became aware that the tumour had started growing again. A
second surgery was performed, but my dad was not so lucky this time. Brain swelling
prevented the neurosurgeon from replacing the part of his skull that had been removed for
the procedure. The operation resulted in a severe loss of motor skills on his left side. The
builtup intracranial pressure led to a steady deteriotation of his vision until he became
completely blind. After years of being under control, his epilepsy started acting up. I will
never forget the moment when I was 14 years old and had to call the ambulance because
my dad was uttering things as if his mind had travelled more than a decade back in time.
Little did I know that was the last time he would ever be home. Three months later, my
dad fell into a coma and drew his final breath on September 14th, 2005.
I share this story because it helps the readers fully appreciate why I am so grateful for my
parents. I remember my dad persevering, not losing his sense of humour, his loving
nature, his soul. I got to witness his courage firsthand in the face of grim adversity, and I
am a better person for it. However, the true hero of this story is one who worked tirelessly
in the background: my mom. You were the one who accompanied dad to every
appointment; who stayed long hours at the hospital; who helped him get around when he
lost his vision; who took on the burden of providing for a family of four as a widow; who
sacrificed much so that I had the opportunity to achieve my dreams. Simply put, I would
not be where I am today if it were not for your incredible determination and strength. From
the bottom of my heart, thank you, mom, for everything you have done for me.
ix
Table of Contents
Approval ii
Ethics Statement iii
Abstract iv
Dedication vi
Acknowledgements vii
Table of Contents x
List of Tables xiii
List of Figures xiv
Glossary xvi
Preface xviii
1 Introduction to Burkitt Lymphoma 1
1.1 Clinical and epidemiological features . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pathogenesis of Burkitt lymphoma . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Celloforigin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Role of MYC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Known genetic and molecular aberrations . . . . . . . . . . . . . . . 10
1.2.4 Epstein–Barr virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.5 Malaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Problem statement and thesis overview . . . . . . . . . . . . . . . . . . . . 19
2 Discovery of genetic and molecular aberrations in BL 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
x
2.2.1 Clinical and molecular characteristics of BL cases . . . . . . . . . . 22
2.2.2 Datadriven inference of tumour EBV status and genome type . . . 25
2.2.3 Structural and copy number variations affecting MYC . . . . . . . . 25
2.2.4 Refining list of genes with potential roles in BL pathogenesis . . . . 28
2.2.5 Challenges with genetic comparison between BL and DLBCL . . . . 31
2.2.6 Novel mutation patterns in BLassociated genes . . . . . . . . . . . 31
2.2.7 Landscape of noncoding mutations shaped by somatic hypermutation 33
2.2.8 Robust identification of mutational signatures in BL genomes . . . . 38
2.2.9 Nonuniform V gene segment usage in immunoglobulin repertoire . 43
2.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Case accrual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.2 Sample processing and nucleic acid extraction . . . . . . . . . . . . 49
2.3.3 Library construction and sequencing . . . . . . . . . . . . . . . . . . 50
2.3.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 EBV defines a BL entity with distinct molecular and pathogenic features 64
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.1 Fewer driver mutations in EBVpositive BL despite mutation burden 66
3.2.2 Variation in mutation burden explained by mutational signatures . . 67
3.2.3 Proteinaltering mutations associated with tumour EBV status . . . . 71
3.2.4 Deregulated AICDA activity in EBVpositive BL . . . . . . . . . . . . 72
3.2.5 EBV genome copy number uncorrelated with EBVassociated effects 73
3.2.6 Genetic comparison of intraabdominal and headonly tumours . . . 76
3.2.7 Variable distribution of MYC breakpoints in BL subtypes . . . . . . . 76
3.2.8 V gene usage not determined by tumour EBV status . . . . . . . . . 77
3.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.1 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4 Discussion and future directions 81
4.1 De novo mutational signatures . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Noncoding mutation peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xi
4.3 Nonsynonymous mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Bcell receptor repertoire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.5 Epstein–Barr virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Hitandrun hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Bibliography 97
Appendix A Supplemental Data File 120
Appendix B Mutation (Lollipop) Plots 121
xii
List of TablesTable 1.1 Overview of clinical variants . . . . . . . . . . . . . . . . . . . . . . . . . 5
Table 2.1 Clinical and molecular summary of discovery cohort . . . . . . . . . . . . 23
Table 2.2 Clinical and molecular summary of validation cohort . . . . . . . . . . . . 24
Table 3.1 Linear regression of mutational signatures . . . . . . . . . . . . . . . . . 71
Table 3.2 McNemar’s test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Table 3.3 Linear regression of AICDA expression . . . . . . . . . . . . . . . . . . . 75
Table 3.4 Linear regression of breakpoint distance from MYC . . . . . . . . . . . . 77
xiii
List of FiguresFigure 1.1 Endemic BL patient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Figure 1.2 BL distribution in Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 1.3 Interplay between EBV and malaria . . . . . . . . . . . . . . . . . . . . 4
Figure 1.4 Diagnostic methodology for highgrade Bcell lymphomas . . . . . . . 6
Figure 1.5 Bcell development and germinal centre Bcell lymphomas . . . . . . . 8
Figure 1.6 Molecular pathways contributing to BL pathogenesis . . . . . . . . . . 11
Figure 2.1 Molecular differences between EBVpositive and EBVnegative BL . . 26
Figure 2.2 Translocations between MYC and immunoglobulin loci . . . . . . . . . 27
Figure 2.3 Landscape of copy number variations . . . . . . . . . . . . . . . . . . 28
Figure 2.4 Nonsynonymous mutations in BLassociated genes . . . . . . . . . . 30
Figure 2.5 Structural variations in DDX3X . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 2.6 Splicing branch point mutations in DDX3X . . . . . . . . . . . . . . . . 32
Figure 2.7 AICDA mutations in BLassociated genes . . . . . . . . . . . . . . . . 34
Figure 2.8 Mutually exclusive mutations in BLassociated pathways . . . . . . . . 34
Figure 2.9 Features of noncoding mutation peaks . . . . . . . . . . . . . . . . . . 36
Figure 2.10 AICDA mutations in noncoding mutation peaks . . . . . . . . . . . . . 37
Figure 2.11 Peak gene expression as a function of peak mutation status . . . . . . 37
Figure 2.12 Correlation between AICDA and mutations within peaks . . . . . . . . 38
Figure 2.13 Known and novel targets of aberrant somatic hypermutation . . . . . . 39
Figure 2.14 Characteristics of de novo mutational signatures . . . . . . . . . . . . 41
Figure 2.15 Prevalence of de novo mutational signatures . . . . . . . . . . . . . . 42
Figure 2.16 Correlation with de novo mutational signatures . . . . . . . . . . . . . 43
Figure 2.17 Dominant immunoglobulin rearrangements . . . . . . . . . . . . . . . . 45
Figure 2.18 Immunoglobulin V gene usage in BL and DLBCL . . . . . . . . . . . . 46
Figure 3.1 Genomewide mutation burden per BL subtype . . . . . . . . . . . . . 68
Figure 3.2 Mutation burden in BLassociated genes per BL subtype . . . . . . . . 69
Figure 3.3 Mutational signatures per BL subtype . . . . . . . . . . . . . . . . . . . 70
Figure 3.4 Differential incidence of nonsynonymous mutations in BL subtypes . . 72
xiv
Figure 3.5 AICDA expression per BL subtype . . . . . . . . . . . . . . . . . . . . 74
Figure 3.6 Correlation with EBV genome copy number . . . . . . . . . . . . . . . 75
Figure 3.7 Genetic comparison of anatomic BL subtypes . . . . . . . . . . . . . . 77
Figure 3.8 Immunoglobulin V gene usage per BL subtypes . . . . . . . . . . . . . 79
Figure 4.1 PVT1 promoter mutations and MYC activation . . . . . . . . . . . . . . 84
Figure 4.2 PVT1 promoter mutations and BL pathogenesis . . . . . . . . . . . . . 85
Figure 4.3 USP7 mutations and/or EBVencoded EBNA1 and TP53 degradation . 86
Figure 4.4 SIN3A and repression of MYC target genes . . . . . . . . . . . . . . . 87
Figure 4.5 CHD8 and repression of gene expression via chromatin remodelling . 88
Figure 4.6 Spontaneous loss of EBV during cell division . . . . . . . . . . . . . . 94
Figure 4.7 Putative model for BL pathogenesis . . . . . . . . . . . . . . . . . . . . 95
xv
GlossaryAICDA: Activationinduced cytidine deaminase. Mutagenic enzyme with a role ingenerating IG diversity during Bcell development, also known as AID.
aSHM: Aberrant SHM. Mutagenesis associated with AICDA activity that targets genomicregions outside of those normally affected by physiologic SHM.
BCR: Bcell receptor. Surfacebound IG.
BL: Burkitt lymphoma. An aggressive Bcell nonHodgkin lymphoma defined by MYCtranslocations and associated with EBV and malaria.
BLG: BLassociated gene. Gene identified as being potentially relevant to BLpathogenesis by virtue of being a recurrently mutated gene previously associated with BLor an SMG supported by at least two different methods.
BLGSP: Burkitt Lymphoma Genome Sequencing Project. International collaboration thatis funding, managing, and sequencing BL tumour genomes and transcriptomes.
CDR3: Complementaritydetermining region 3. Most variable region of an IG chain,spanning the VD, DJ, and/or VJ recombination junctions.
CNV: Copy number variation. Mutation type involving the copy number gain or loss ofgenomic segments of any size.
COSMIC: Catalogue Of Somatic Mutations In Cancer. Database containing variousfeatures of tumour genomes, including reference mutational signatures.
DLBCL: Diffuse large Bcell lymphoma. The most common form of NHL, featuringaggressive growth and molecular heterogeneity.
EBV: Epstein–Barr virus. A ubiquitous ɣherpesvirus initially discovered in BL tumour cellsbut later found in most adults and known to cause infectious mononucleosis.
FF: Fresh frozen. Method for preserving tumour tissue that is considered the goldstandard to ensure the quality of nucleic acids for sequencing.
FFPE: Formalinfixed paraffinembedded. Method for preserving tumour tissue that isassociated with lower quality of nucleic acid for sequencing.
FISH: Fluorescence in situ hybridization. Method for locating DNA/RNA sequences incells using fluorescence, often for determining the presence or absence of SVs.
HIV: Human immunodeficiency virus. Viral cause of AIDS.
ICGC: International Cancer Genome Consortium. Global collaboration of researchersperforming genomic, transcriptomic, and epigenomic analyses of tumours samples forvarious cancer types.
xvi
IG: Immunoglobulin. Term referring to the immunoglobulin protein(s), component(s) of theBCR or antibodies, or the associated gene(s).
IGH: Immunoglobulin heavy chain. IG heavy chain gene locus on chromosome 14.
IGK: Immunoglobulin κ light chain. IG light chain gene locus on chromosome 2.
IGL: Immunoglobulin λ light chain. IG light chain gene locus on chromosome 22.
Indel: Small insertion or deletion. Mutations consisting of inserted or deleted DNAsequence, generally less than 100 bp.
ISH: In situ hybridization. Method for locating DNA/RNA sequences in cells usingdetectable probes, often for determining the presence of absence of foreign nucleic acids(e.g. EBV EBER RNAs).
LCL: Lymphoblastoid cell line. Immortalized cell line derived from B cells.
MMR: Mismatch repair. Pathway for repairing small DNA errors.
NHL: NonHodgkin lymphoma. Class of lymphomas that includes BL and DLBCL.
PCR: Polymerase chain reaction. Method for amplifying nucleic acids.
PI3K: Phosphoinositide 3kinase. Class of enzymes involved in cell growth.
R: R programming language. Statistical programming language.
RNAseq: RNA sequencing.
SHM: Somatic hypermutation. Mutagenesis associated with AICDA activity that can eitherbe physiologic or ontarget, giving rise to IG diversity, or aberrant or offtarget, potentiallyintroducing driver mutations.
SNV: Single nucleotide variant. Singlebase substitution.
SOP: Standard operating procedure.
SSM: Simple somatic variant. Somatic SNV or indel.
SV: Structural variation. Mostly translocations and inversions.
SWI/SNF: Switch/sucrose nonfermentable.
TSS: Transcription start site. First base of the first exon of a gene transcript.
V(D)J: Variable, diversity, and joining gene segments. Gene segments that arerecombined to form the IG CDR3 region.
VAF: Variant allele fraction. Fraction of reads supporting an alternate allele.
VCF: Variant call format. File format for storing mutations.
WGS:Whole genome sequencing.
WHO:World Health Organization.
xvii
Preface
This thesis is an expanded version of the material originally published in Grande et al,
“Genomewide discovery of somatic coding and noncoding mutations in pediatric endemic
and sporadic Burkitt lymphoma”, Blood, 2019;133:13131324.1 Under the supervision of
Ryan Morin, I led the computational component of this project, including the analysis,
interpretation, and presentation of the sequencing data and clinical metadata. More
specifically, I designed and performed data analyses, implemented software tools,
maintained quality control, benchmarked computational methodologies, produced figures
and tables, and wrote the text. Furthermore, I was the first bioinformatics graduate
student in my research group, entailing work that is not captured in this thesis. Notably, I
set up the computational infrastructure for the laboratory virtually from scratch and
established standard analytical pipelines. I also played a central role in training incoming
undergraduate and graduate students as well as postdoctoral fellows in bioinformatics.
These responsibilities were central to my training as a PhD student.
Chapter 2 includes key contributions from coauthors of the above paper. Aixiang Jiang
and Ryan Morin designed and ran the Rainstorm and Doppler methodology for identifying
noncoding mutation peaks. Luka Culibrk and Eric Zhao ran the pipeline for determining
de novo mutational signatures. Nicole Knoetze designed the methodology for identifying
immunoglobulin clonotypes. Christopher Rushton authored a software tool for detecting
mutations that overlap the AICDA recognition motif and quantifying any enrichment or
depletion of such mutations. George Wright designed the McNemar’s test analysis. Corey
Casper, Thomas Gross, Elaine Jaffe, and Sam Mbulaiteye reviewed and advised on
consensus anatomic site classification. Daniela Gerhard, John Irvin, Jean Paul Martin,
MarieReine Martin, Marco Marra, Ryan Morin, and Louis Staudt designed and/or directed
the study. All other coauthors contributed to sample accrual, quality control and
processing, data generation and management, and logistics.
This thesis follows the convention of italicizing gene names whereas nonitalized gene
names refer to any encoded protein.
xviii
Chapter 1
Introduction to Burkitt Lymphoma
Burkitt lymphoma (BL) is a highly aggressive Bcell nonHodgkin lymphoma. It is
considered by some to be the Rosetta Stone of cancer research for its pivotal role in
historical discoveries in the field.2,3 It was the first human malignancy to have a viral
aetiology. It was the first tumour in which the activation of an oncogene via chromosomal
rearrangement was demonstrated. These rearrangements ultimately led to the discovery
of their target, MYC, now recognized as a quintessential protooncogene in many
cancers. It was also one of the first tumours to achieve high cure rates with chemotherapy
alone. To this day though, despite these important discoveries, researchers and clinicians
still face several questions and challenges related to prevention, diagnosis, pathogenesis,
and treatment of BL.
1.1 Clinical and epidemiological features
BL was first described in Uganda as a sarcoma by Denis Burkitt in 1958 but was later
recognized as a lymphoma.4,5 BL is most common in African children aged 2 to 8,
accounting for roughly half of paediatric cancer cases in some areas.4–6 BL predominantly
affects male patients, with maletofemale ratios ranging between 1.6:1 and 4:1.6–9 The
most striking feature of these tumours, other than their rapid growth, is their clinical
presentation. In the regions where this cancer is most common, the majority of BL
tumours affect the upper and/or lower jaw, often resulting in loss of teeth and abnormal
protrusion of the eyes (Figure 1.1).6 The abdomen is the second most frequently involved
anatomic site, presenting as abdominal swelling.6 Due to the rapid tumour growth, most
children die from BL within six months if untreated.6,10
The geographical distribution of BL incidence in Africa was determined through surveys
performed by mail or in person.6,11,12 Most cases were diagnosed in tropical equatorial
Africa, including a “tail” running down the African East Coast, forming the socalled
1
Figure 1.1: Endemic BL patient. “Large facial Burkitt’s Lymphoma” from Mike Blyth, licensedunder CC BYSA 2.5.
“lymphoma belt” (Figure 1.2). The map of BL incidence was found to closely correspond
to areas that (1) are below 1,500 m in altitude where average temperatures are above
15°C and (2) receive over 50 cm of rainfall per year.13 Distant regions with similar
geographical features, namely Papua–New Guinea, were later found to share the
elevated BL incidence first noted in equatorial Africa.10 Notably, the lymphoma belt
overlapped the geographical distribution of certain groups of mosquitos, which led to the
hypothesis that a mosquitoborne pathogen may be playing a role in BL tumour
formation.13 While a virus was initially suspected, other aetiological factors were also
proposed, such as malaria.14–16
In 1964, Epstein and colleagues discovered a ɣherpesvirus infecting tumour cells in
African BL and the same virus was also found in BL tumours from Papua–New
Guinea.18,19 This later became known as the Epstein–Barr virus (EBV) and the causative
agent for infectious mononucleosis.20 Over time, it was established that the virus was not
restricted to Africa, nor was the infection unique to BL patients within Africa.21,22 EBV was
nonetheless significantly more common in BL cases compared to healthy control cases.22
The paediatric nature of BL in equatorial Africa is consistent with early EBV infection seen
in these populations, which typically occurs during the first 16 months of infancy.23 A later
study also found that high serum antibody titres to EBV proteins were a risk factor for
2
Figure 1.2: BL distribution in Africa. Areas indicated in black, roughly corresponding to equatorialAfrica, have the highest BL incidence. This is Figure 1 reprinted with permission from Burkitt,1983.17
developing BL.24 Therefore, these epidemiological findings suggested that EBV alone
could not trigger lymphomagenesis, but an aetiological link between EBV and BL could
not be excluded.
The ubiquity of EBV stimulated an increased focus on malaria as the primary
environmental factor responsible for the unique geographical distribution of BL in
equatorial Africa and Papua–New Guinea. Evidence for this hypothesis steadily
accumulated during the 1960s.17 First, local malarial intensity correlated with BL
incidence.25 The malignancy was rarely diagnosed in areas with little to no malaria,
including certain African islands (e.g. Zanzibar, Pemba, and Seychelles); urban
environments with limited mosquito breeding grounds; and areas with malarial control or
complete eradication (e.g. Kinshasa, Sri Lanka).25 For example, a decrease in severe
malaria infection in the Mengo Districts of Uganda coincided with a substantial decline in
BL incidence.26 In addition, preliminary studies showed an interesting relationship
between BL and the sickle cell trait, which protects against malarial infection. Despite
sharing a similar geographical distribution as malaria—and by extension, BL—the sickle
cell trait is less prevalent among BL patients, consistent with a shared susceptibility to
malaria and BL.27,28
3
The relationship between malaria and the age of BL incidence provides additional
evidence for an aetiological link.29 One report demonstrated a correlation between BL
incidence and the multiplicity of malaria infection in Ghana and Tanzania.30 More
specifically, both measures peak between 5 and 9 years of age. Notably, immigrants from
lowintensity malaria areas (e.g. highaltitude Rwanda and Burundi) have a distinct age
distribution of BL incidence.26,31 In one Ugandan study, roughly 50% of such immigrants
who were diagnosed with BL were over the age of 15 years.31 These results suggest that
intense malarial infection serves as a triggering event for BL formation, possibly in
conjunction with EBV (Figure 1.3).
Figure 1.3: Interplay between EBV and malaria. This is TextFigure 1 reprinted with permissionfrom Burkitt, 1969.25
Shortly after the initial description of BL, a number of reports from regions outside those
described above detailed cases of Bcell lymphoma that were indistinguishable at the
histological level from those in Africa.5,10,32,33 However, the incidence of these tumours
was much lower than their African counterparts. This discovery ultimately resulted in the
definition of epidemiological variants for BL known as clinical variants. Patients diagnosed
in malariaendemic areas are considered endemic BL (eBL) whereas those diagnosed
elsewhere represent the sporadic BL (sBL) variant. A third epidemiological subgroup was
defined after the observation that BL can arise as a complication in immunocompromised
patients. This disease, referred to as immunodeficiencyrelated BL, was first recognized
4
Table 1.1: Characteristics of the clinical variants of BL. This is Table 5.1 adapted from Robertson,2013.38
Variable Endemic BL Sporadic BL ImmunodeficiencyrelatedBL
Geography Equatorial Africa Worldwide Worldwide
Age incidence Children Children and adults Adults
Anatomic sites Jaws, facial bones,kidneys, liver,gonads, breast
Ileocecal region,Waldeyer’s ring,gonads, breast
Nodal, centralnervous system(CNS)
EBV infection 100% 5–30% 25–40%
Enviromental factor Malaria, arbovirus,euphorbia
NA NA
MYC breakpoints Far 5’ Exon, intron 1, and 5’ Exon and intron 1
IGH breakpoints VDJ region Switch region Switch region
Somatic IGH mutation Yes Yes Yes
during the human immunodeficiency virus (HIV) epidemic, but was also linked to
prolonged immunosuppression following organ transplantation.34–37 The three subtypes
differ in terms of epidemiological and clinical features such as incidence, association with
malaria and EBV, age of diagnosis, and anatomic sites affected by tumour growth (Table
1.1). Genetic and molecular differences were subsequently found, especially following the
emergence of highthroughput sequencing.
The criteria for BL diagnosis are summarized in the World Health Organization (WHO)
Classification of Tumours of Haematopoietic and Lymphoid Tissues (Figure 1.4).39 They
are primarily based on cell morphology, immunophenotype, and fluorescence in situ
hybridization (FISH). Briefly, BL morphology usually adopts a “starrysky” appearance
consisting of uniform mediumsized basophilic lymphoid cells with interspersed
macrophages forming the “stars” where BL cells underwent apoptosis. At the
immunohistochemical level, the tumour cells should be positive for surface
immunoglobulin, Bcell markers (i.e. CD19, CD20, CD22, CD79A, and PAX5), and
germinalcentre markers (i.e. CD10 and BCL6) while having little to no BCL2 staining. The
proliferation fraction marked by MKI67 is expected to be close to 100%. BL tumours
should also have strong MYC protein staining and are often positive for the MYC FISH
breakapart assay, which detects translocations affecting MYC, a genetic hallmark of BL.
5
These criteria apply equally to both eBL and sBL, which remain indistinguishable using
modern techniques. In practice, the distinction between BL and other highgrade Bcell
lymphomas such as diffuse large Bcell lymphoma (DLBCL) is not always welldefined
and can result in misdiagnosis. This problem is exacerbated in resourcepoor settings,
including equatorial Africa, which often lack facilities for performing more expensive
diagnostic tests such as immunohistochemical staining. Misdiagnosis is often fatal for BL
patients because they are treated with inappropriate regimens.40
Figure 1.4: Diagnostic methodology for highgrade Bcell lymphomas. This is Figure 4 reprintedwith permission from Swerdlow et al., 2016.41
In general, BL tumours tend to dramatically respond to intensive chemotherapy and are
considered curable for children in countries where proper supportive care is readily
available to manage treatmentrelated toxicity.42–45 Chemotherapeutic regimens typically
include a combination of cyclophosphamide, vincristine, prednisolone, doxorubicin,
cytarabine, and/or highdose methotrexate.46,47 However, BL remains fatal for children in
subSaharan Africa due to several reasons, including diagnosis typically occurring at an
advanced stage, the limited capacity to support intensive chemotherapeutic regimens,
and the confounding effects of poverty.48–51 Overall survival for eBL varies between 40%
and 70%.48,52,53 In the sporadic setting, treating adult and elderly patients has also been a
challenge and associated with high mortality.45 However, current clinical trials are
showing promise in overcoming the limitations of current treatment regimens.54 BL
relapse is rare, but if it does occur, it is seen within the first year after diagnosis and is
usually fatal.39,55,56 Prognostic indicators for BL include disease stage, bone marrow or
6
central nervous system involvement, unresected tumour size, serum lactate
dehydrogenase levels, and age.39
1.2 Pathogenesis of Burkitt lymphoma
1.2.1 Celloforigin
Bcell development is a highly regulated process whereby B cells progressively
differentiate by rearranging their genome in order to produce antibodies, also known as
immunoglobulins (IGs).57 An IG is composed of a heavy chain and a light chain. The
heavy chain is encoded by the IG heavy (IGH) locus, whereas the light chain is encoded
by either the IG κ (kappa; IGK) or λ (lambda; IGL) locus. Initially, B cells start off with a
germline configuration for all IG loci. The transition from a haematopoietic stem cell to an
immature B cell occurs in the bone marrow. First, the IGH locus undergoes VDJ
rearrangement, which results in the selection and juxtaposition of a variable (V) gene
segment, a diversity (D) gene segment, and a joining (J) gene segment. Second, the IGK
and/or IGL loci, which lack diversity segments, undergo VJ rearrangement. The purpose
of V(D)J rearrangement is to produce a diverse repertoire of IGs—and thus
antibodies—capable of detecting and responding to virtually any pathogen.
Following V(D)J rearrangement, the immature B cell exits the bone marrow and enters the
peripheral circulation, where it expresses the IG on the cell surface in the form of a Bcell
receptor (BCR).57 Upon antigenic stimulation of the BCR, B cells enter the germinal
centre, which are transient structures in secondary lymphoid organs wherein they
complete affinity maturation (Figure 1.5). These cells become centroblasts, which
comprise rapidly dividing B cells in the germinal centre dark zone. Here, centroblasts
undergo somatic hypermutation (SHM) of the IG loci. This process involves the
introduction of mutations within the variable regions of the IG loci in an effort to produce
antibodies with higher affinity for the initiating antigen. This process is catalytically driven
by activationinduced cytidine deaminase (AICDA), also known as AID. Centroblasts that
have undergone some degree of SHM transit to the germinal centre light zone where they
become centrocytes and cease to proliferate. Based on the antigen affinity of their BCR,
centrocytes are either selected to differentiate into plasma cells or memory B cells or are
eliminated via apoptosis in the event of disadvantageous mutations. Alternatively,
7
centrocytes may reenter the dark zone for additional cycles of proliferation and SHM in a
process called “cyclic reentry”.
At every step of Bcell development, the tight regulation that is in place can fail and result
in malignant transformation (Figure 1.5). The type of B cell that gives rise to a particular
lymphoma is termed the “celloforigin”. The postulated celloforigin for BL is one that has
underwent the germinal centre reaction given that the IG loci have been mutated by
AICDA.58–60 More precisely, BL cells most closely resemble centroblasts from the
germinal centre dark zone in terms of gene expression.61 The celloforigin framework
also accounts for the histological similarity between BL and DLBCL tumours considering
that the latter can arise from the germinal centre as well. Consistent with their germinal
centre origin, BL and DLBCL often acquire mutations in nonIG regions due to the
offtarget enzymatic activity of AICDA in a process called aberrant SHM (aSHM). Due to
aSHM, several genes are “hypermutated” in lymphomas including MYC.62 Because
AICDA primarily targets singlestranded DNA, aSHM mostly affects the first kilobase (kbp)
downstream of transcription start sites (TSS) for actively transcribed genes.63–65
Figure 1.5: Bcell development and germinal centre Bcell lymphomas. This is Figure 2 adaptedwith permission from Basso et al., 2015.66
8
1.2.2 Role of MYC
The MYC gene encodes for the transcription factor MYC, which is estimated to regulate
up to 20% of all human genes.67 These target genes have roles in several important
biological processes—many of which are relevant to cancer—including cell cycle control,
cell growth and metabolism, and angiogenesis.68 On the other hand, MYC also sensitizes
cells to apoptosis, presumably to keep cells in check by tempering uncontrolled
proliferation with cell death.68,69 In Bcell development, MYC serves as an inducer of cell
division under specific circumstances. MYC is largely absent in B cells, in large part owing
to its transcriptional repression by BCL6.70,71 However, MYC is briefly expressed when B
cells enter the dark zone, either upon initial entry into the germinal centre or during cyclic
reentry.71
In BL, MYC plays a central role in initiating and maintaining tumour growth. Originally
described in 1972, cytogenetic aberrations affecting chromosome 8 were considered a
genetic hallmark of BL.72–74 A decade after their discovery, the target of these genomic
rearrangements was identified as MYC, a human homolog for the viral transforming
vmyc gene.75,76 More specifically, these translocations put MYC in proximity of one of the
three IG loci and thus under the control of strong IG enhancers. They also tend to
uncouple MYC expression from BCL6 repression by removing BCL6 binding sites in the
MYC promoter.71 The role of these translocations in lymphomagenesis was confirmed
when transgenic (EμMyc) mice developed aggressive lymphomas after coupling MYC
expression with an IG enhancer.77 In human and murine tumours, these translocations
cause constitutive expression of MYC, thereby promoting cell growth and proliferation.
Deregulated MYC activity also promotes the apoptosis pathway which, if not disrupted,
should lead to cell death.68 This safeguard may explain the latent period of up to five
months before tumour formation seen in the EμMyc mouse model. The requirement for
abrogating apoptosis a priori is also consistent with the lack of IGMYC translocations
found in circulating B cells in healthy individuals.78 On the other hand, IGBCL2
translocations are found in circulating B cells, suggesting this is a MYCspecific effect.
Hence, additional genetic or molecular events are required to cooperate with MYC to give
rise to BL tumours.
9
The distribution of chromosomal breakpoints in the MYC and IG loci provides clues to the
origin of these oncogenic translocations. Notably, the MYC breakpoints exhibit a different
pattern in sporadic and endemic cases.79,80 In sBL, the breakpoints are in close proximity
of the MYC TSS, with many overlapping the first exon or the first intron. In contrast, eBL
exhibits a more diffuse distribution of breakpoints, which span a 1Mbp region centred on
MYC, with a minority of translocations occurring near the TSS. The large distances
between the breakpoint and the target oncogene seen in lymphomas seem compatible
with the capability of IG enhancers to induce longrange epigenetic
reprogramming.81
MYC translocations in BL mostly involve one of the three IG loci.38 Each locus is partnered
with MYC at roughly the same proportions in endemic and sporadic BL. The IGH locus on
chromosome 14 is the most commonly involved, translocated with MYC in roughly 80% of
BL cases. The IG loci encoding the light chains IGK and IGL on chromosomes 2 and 22,
respectively, account for the remaining 20% of translocations. The IGH breakpoints were
initially thought to also segregate differently among the clinical variants.82 However,
several studies later demonstrated that the association between breakpoint location in
IGH and geographic origin was much weaker than initially estimated.80,83–86
More precisely, the breakpoints in IGH mostly affect the switch regions, which are
involved in class switch recombination.39 The purpose of class switch recombination is to
swap the constant (C) portion of the IG while maintaining the same variable VDJ
sequence, which is responsible for binding the antigen. This is accomplished by
introducing doublestrand DNA breaks in the switch regions, removing the intervening
DNA, and repairing the break via nonhomologous end joining. These doublestrand DNA
breaks are mediated by AICDA, the same enzyme responsible for SHM. During aSHM,
AICDA can cause the formation of oncogenic MYC translocations, implicating the enzyme
in BL pathogenesis.87
1.2.3 Known genetic and molecular aberrations
Whereas MYC is a potent protooncogene, animal models demonstrated that MYC
deregulation is insufficient for triggering lymphomagenesis, indicating the existence of
additional aetiological factors.88 The involvement of EBV and malaria in BL pathogenesis
10
is strongly suspected and is discussed below, but these environmental factors cannot
account for all BL cases given the existence of EBVnegative cases outside of
malariaendemic regions. Over the past three decades, significant progress has been
made in our understanding of the genetic and molecular underpinnings of BL (Figure
1.6).
Figure 1.6: Molecular pathways contributing to BL pathogenesis. The encoded proteins ofrecurrently mutated genes are highlighted in colour (red, oncogenes; blue, tumour suppressors).The percentages indicate the fraction of BL cases with mutations affecting the associated genes.This is Figure 4 adapted with permission from Pasqualucci, 2019.89
Soon after TP53 was identified as a tumoursuppressor gene in 1989, it was found
recurrently mutated in BL.90 This observation is consistent with the critical role the gene
plays in apoptosis given how MYC deregulation predisposes cells to programmed cell
death. Considering the aforementioned latency observed in EμMyc mice, the involvement
of other genes that regulate apoptosis was investigated. Notably, the homologs for TP53
(Tp53) and CDKN2A (Cdkn2a) were often mutated in the murine tumours in addition to
having increased expression of the MDM2 homolog (Mdm2).91 Cdkn2a encodes a tumour
suppressor capable of inducing G1/S cellcycle arrest and apoptosis, while Mdm2 is an
11
oncogene whose product is capable of promoting the degradation of Tp53 protein. A
concurrent study demonstrated an accelerated disease progression in EμMyc mice when
they were crossed with mice bearing Tp53 or Cdkn2a mutations.92 Mutations in CDKN2A
and overexpression of MDM2 were later confirmed in human BL cell lines.93
In 2012, several highthroughput sequencing studies provided a comprehensive
description of the landscape of somatic mutations in BL.94–97 A number of additional
genes were implicated in BL pathogenesis, some having established roles in other
malignancies and others remaining uncharacterized. For instance, CCND3, which
encodes a Dtype cyclin, was found to be commonly mutated in BL, especially among
sporadic cases.94,96,97 CCND3 functions by regulating the G1/S transition and promoting
cellcycle progression. Variants in CCND3 strictly affect the carboxylterminal of the
encoded protein and many of these mutations cause premature truncation of the protein.
Mutation clusters are a hallmark feature of oncogenes but truncating mutations are more
commonly a feature of tumour suppressor genes. In this case, functional work
demonstrated that the missense mutations and truncating mutations in this region
promote the stability of CCND3 protein.94
In these large sequencing studies, TCF3 and its negative regulator, ID3, were also
identified as recurrently mutated in BL.94,96,97 TCF3 encodes for a transcription factor with
a central role in Bcell development, most notably by modulating IG gene expression.
Mutations in TCF3 are strictly missense and target the basic helixloophelix domain of
the E47 transcript isoform while the corresponding domain of the E12 isoform remains
unaffected. These alterations were shown to result in higher E47 transcript levels, thereby
promoting activity.94 On the other hand, mutations in ID3 are not only more frequent but
include several that are predicted to truncate and deactivate the protein, consistent with
its role as a tumour suppressor. Mutations in ID3 or TCF3 increase BCR signalling by
inducing IG expression and repressing PTPN6, which encodes a phosphatase (SHP1)
that dampens BCR signalling.98 In turn, increased BCR activity promotes
phosphoinositide 3kinase (PI3K) signalling in a growthpromoting pathway termed “tonic”
BCR signalling that is largely antigenindependent. Moreover, TCF3 also induces CCND3
expression, exerting additional pressure on cellcycle progression.
12
Other less frequent genetic lesions capable of activating PI3K signalling in BL include
deactivating mutations in PTEN, an established tumoursuppressor gene with an
inhibitory role in PI3K signalling, and focal amplifications of the MIR17HG locus, which
encodes microRNAs (miRNAs) capable of reducing PTEN translation.98 Alterations in
FOXO1 may be related to the relationship between FOXO1 and the PI3K pathway, but the
exact effect of the mutations is still under investigation.99,100 The obvious role that PI3K is
playing in BL pathogenesis may present a therapeutic opportunity and justifies the clinical
investigation of the use of inhibitors for PI3K, Syk and Src family kinases.94
PI3K signalling is also activated by mutations affecting the GNA13 signalling pathway.
Functional experiments have demonstrated that these variants can deregulate AKT, a key
component of the PI3K pathway.101 These mutations also resulted in a lack of
confinement of germinal centre B cells, which may be associated with increased disease
dissemination. In BL, the most commonly mutated genes are GNA13, encoding a guanine
nucleotidebinding protein (G protein), and P2RY8, encoding an associated G
proteincoupled receptor. Inactivating mutations in RHOA, a downstream target of GNA13
signalling, are thought to have similar consequences on the pathway.
Another set of genes with recurrent mutations is ARID1A and SMARCA4, both encoding
components of the switch/sucrose nonfermentable (SWI/SNF) complex.94–97 This
complex regulates gene expression by repositioning nucleosomes along DNA, thereby
facilitating transcription factor binding.102 At first glance, the mutation pattern in both
genes suggests that they are tumour suppressors, consistent with their role in other
malignancies. Beyond that though, the mechanism of action of these mutations in BL
remains unclear. The same can be said of DDX3X, another tumour suppressor gene
commonly mutated in BL whose role in pathogenesis is unknown.94,97 The gene encodes
an RNA helicase and is located on chromosome X, which may account for the relatively
high maletofemale ratio mentioned earlier. Its structural homologue situated on the
chromosome Y, DDX3Y, shares roughly 90% sequence identity but its expression is
restricted to male germline cells, suggesting a role distinct from that of DDX3X.103 DDX3X
mutations have been described in other EBVassociated cancers, such as natural
killer/Tcell lymphoma, which suggests a function related to the virus.104 Additional
investigation is required to elucidate the consequences of mutations in these genes.
13
While they may not be readily targetable due to being tumour suppressor genes,
mutations affecting ARID1A, SMARCA4, or DDX3X could potentially be exploited for
synthetic lethal interactions with other genes.
1.2.4 Epstein–Barr virus
Since its discovery in BL, EBV has been linked with two lymphoproliferative diseases and
at least seven additional cancer types, mostly involving lymphocytes and epithelial
cells.105 Today, an estimated 200,000 cancer cases per year are attributable to EBV
infection.106 Yet, despite being the first virus to be associated with cancer, the underlying
mechanisms that promote tumour formation remain poorly understood.
For decades, the epidemiological evidence presented earlier in this chapter provided the
strongest case for an oncogenic role for EBV in BL pathogenesis with little support from
functional studies.107 In the early 1970s, the direct capability of transforming B cells was
confirmed when EBV was used to immortalize B cells in vitro to form lymphoblastoid cell
lines (LCLs).108 A pivotal point in EBV research was also achieved in 1984 with the
publication of the viral genome sequence, enabling new molecular analyses.109 Despite
the experimental utility of LCLs, EBV gene expression in vitro differs greatly from that in
vivo, which has complicated the search for a reliable and representative in vitro model
system for EBVpositive BL.110
The observed variation in EBV gene expression ultimately led to the identification of
different EBV gene expression programs associated with distinct latency states. LCLs
express all latent genes, defined as Latency III.111 In contrast, EBVpositive BL tumours
only express EBNA1 and some noncoding genes including EBER1 and EBER2, termed
Latency I.110,112 EBV gene expression in BL cells is presumably restricted in
immunocompetent patients to avoid detection by the immune system. Additional latency
programs such as Latency IIa and IIb that express an intermediate number of genes are
observed in other contexts.111 These expression differences highlight the limitation of
EBVpositive cell lines for studying the role of EBV in BL. For example, the EBV genes
EBNA2 and LMP1 were deemed essential for transformation in vitro for LCLs, and yet
they are not detected in clinical BL samples.113 Furthermore, while EBNA1 is the only
expressed protein in BL, it does not seem critical for B cell immortalization in vitro.114 It
14
thus appears that the mechanisms by which EBV promotes transformation are not entirely
consistent.
A breakthrough was made when the EBVpositive Akata cell line was generated from a
BL sample.115 Unlike previous cell lines, researchers could derive a viable EBVnegative
clone, which allowed for comparative studies.116. As expected, the EBVpositive clones
were relatively more malignant than their EBVnegative counterparts, in part due to
increased resistance to apoptosis.116–119 Later, EBNA1 was found to promote survival in
BL cell lines by inhibiting apoptosis in an EBERindependent manner.120 The importance
of this gene was also demonstrated in transgenic mice expressing EBNA1 in B cells,
although this finding remains controversial.107,121 While EBNA1 is the only consistently
expressed proteincoding gene in BL, heterogeneous EBV gene expression has been
reported by multiple studies.122 For instance, LMP1 and LMP2 were shown to be
transiently expressed in BL and may have similar oncogenic roles as in LCLs.123,124 That
being said, it is reasonable to focus on the role of EBNA1 given its universal presence in
EBVpositive BL, making it a prime target for therapy.
The role of noncoding genes that are expressed alongside EBNA1 in BL has also been
explored. For instance, the EBER genes do not seem essential for the generation of
LCLs.113 On the other hand, they promote tumourigenicity in BL cell lines, although the
underlying mechanism remains elusive.118,125,126 Some studies have shown an inhibitory
effect on the human PKR protein, which in turn represses interferonαinduced apoptosis,
but these findings have been challenged.126,127 Alternatively, the EBER transcripts appear
responsible for increasing levels of the cytokine interleukin10 (IL10) seen in EBVpositive
tumours.128 Not only could this result in growthpromoting autocrine signalling, but IL10
can promote tumour growth through immune evasion by attracting macrophages to engulf
apoptotic cells.129 That being said, studies have shown similar increases in IL10 levels
due to malaria, so the culprit for this molecular change remains unclear.130,131
Other studies have shown that EBV can have an impact on miRNAmediated regulation
through cellular or viral miRNAs, which may promote lymphomagenesis. For example,
hsamiR127 was found to be upregulated in EBVpositive tumours, although the
mechanism for upregulation was not explored.132 The authors proposed a model whereby
15
EBV increases the expression of hsamiR127, which in turn mediates Bcell
differentiation by degrading PRDM1 (i.e. BLIMP1) and XBP1 transcripts. Another study
demonstrated a role for a subset of EBV miRNAs in suppressing apoptosis, possibly
through direct posttranscriptional regulation of the proapoptotic protein CASP3.133 These
potential miRNA:mRNA interactions will likely continue to be identified as more miRNA
and RNA sequencing data are generated, providing a broader perspective on the effects
of EBV on the BL transcriptome.134
Another compelling, albeit controversial, effect of EBV on BL genomes is the activation of
AICDA and the ensuing aSHM.135 In infectious mononucleosis patients, EBVpositive B
cells from the peripheral blood had more active SHM than their EBVnegative
counterparts.136 In vitro, EBV caused an increase in AICDA expression in B cells, which
had the notable consequence of introducing mutations in cancer genes such as
TP53.137,138 These in vitro studies are consistent with results from BL tumour sequencing,
which have shown an increased number of mutations in the IG loci of EBVpositive
tumours.139 The underlying mechanism of this effect has been the focus of more recent
studies, with some attributing the increase to the EBV gene LMP1 and others attributing
to EBNA3C.140,141 These proposed mechanisms must be reconciled with the fact that
these EBV genes are not consistently expressed, or at least detected, in BL. In contrast,
one study showed relatively lower AICDA activity in EBVpositive cells, but unlike BL
tumour cells, these cells also expressed EBNA2, limiting the relevance of this
finding.112,142,143 Overall, it appears that the viral effect on AICDA depends on the context,
as is the case with many other aspects of EBV.
EBV is clonal in BL tumours, and while consistent with an early role in tumourigenesis, a
late but strong influence on tumour growth cannot be excluded, which would be hard to
distinguish in bulk tumour sequencing.144 The inhibition of apoptosis mediated by EBV
would ostensibly benefit the formation of BL by removing the safeguard in place
preventing uncontrolled MYCdriven proliferation. Furthermore, disrupting apoptosis
would also facilitate the survival of cells harbouring doublestrand DNA breaks by
avoiding cell death, thereby allowing the accumulation of potential driver mutations.49 It is
generally thought that EBV infection of B cells occurs before the MYC translocation
arises.122 If EBV also activates AICDA, its presence would also increase the likelihood of
16
forming the oncogenic translocation. Additionally, given the continual cell proliferation
seen in BL and that the EBV episome can be spontaneously lost during cell division, it is
expected to completely disappear from the tumour.145,146 In other words, any
EBVnegative tumour cells that result spontaneously from loss of the EBV genome during
cell division are presumably outcompeted by the EBVpositive cells. Therefore,
EBVnegative tumours must rely on alternative EBVindependent mechanisms to achieve
similar effects, which may be more difficult to attain and could explain the lower incidence
of EBVnegative BL.107
1.2.5 Malaria
It is a matter of debate whether malaria has a direct effect on BL tumourigenesis or an
indirect effect by altering the host environment. Research into the role of malaria in BL
pathogenesis has been hampered by the lack of adequate model systems for BL. Early
on, an aetiological link was supported by in vivo mouse models that formed lymphoma
tumours resembling BL histologically upon infection with malaria.147 The intensity of
malarial infection correlated with the frequency of spontaneous tumour formation.147
Additionally, prior infection with malaria predisposed mice to developing lymphoma
tumours after being inoculated with cellfree tumour extract derived from murine
lymphomas.147 The rationale for inoculation was that the tumour extract may contain
factors such as viruses that promote lymphomagenesis. Indeed, mice treated with the
cellfree tumour extract more frequently developed lymphomas. These results reveal a
possible synergy between malaria and a component of the tumour extract, potentially viral
in nature. The model for BL formation evolved to consider the impact of malaria on
lymphoid tissue but in vivo experiments that dissect the individual role of each pathogen
are sparse.25 What remains certain is that the presence of both malaria and EBV infection
lead to an increased risk of BL but the molecular nature of this hostenvironment
interaction remains elusive to this day.
Some have argued that the increase in AICDA expression observed in endemic BL is
primarily due to malaria infection and the more important role of EBV is to suppress
apoptosis.49 Evidence supporting this effect of malaria on AICDA is steadily
accumulating.148,149 The mechanism has not been fully characterized yet, but one
17
possibility is the activation of Tolllike receptors on B cells by malariaassociated agonists
such as haemozoin, which in turn induces AICDA expression.49 Interestingly, a
synergistic effect between malaria and EBV on AICDA expression has been described,
whereby the EBV load in the blood is correlated with AICDA levels in patients from
malariaendemic regions, but this correlation ceases to exists in patients from areas of
low exposure to malaria.149 The underlying reason for this compounded effect on AICDA
activity remains unknown, but this has led to many suspecting an interaction between
malaria and EBV.29,150
Additional evidence for synergy arises when malaria interacts with the immune system. It
is commonly thought that malaria infection chronically activates the Bcell system, thereby
increasing the number of B cells transiting through the germinal centre and heightening
the risk for MYC translocations.107,148 In this process, EBVinfected cells are preferentially
expanded in the germinal centre, exacerbating the risk for BL formation.123,148 The
underlying mechanism of this interaction remains uncertain, but some work has shown
that a malarial protein, CIDR1α, is capable of inducing lytic reactivation of EBVinfected
memory B cells.151 Interestingly, CIDR1α can also activate pathways that result in
suppression of apoptosis, which may be relevant to BL pathogenesis.152
Lastly, another potential contribution of malaria to BL formation is the resulting Tcell
immunosuppression that is seen during acute malarial infection, which provides a window
of opportunity for EBVinfected B cells to proliferate.153–155 Indeed, the number of
EBVinfected cells in circulation is significantly higher in children during and following an
acute episode of malaria.156 Hence, the clear geographic association between BL
incidence and the distribution of malaria parasites might simply be due to the ability of
malaria to “distract” the immune system enough to allow EBV to infect more cells and/or
to permit broader gene expression programs, which are known to be oncogenic in the in
vitro setting.157 Under this model, I expect EBV infection to immortalize some B cells first;
then, malaria facilitates the expansion of EBVinfected B cells; and finally, this increase in
EBVinfected B cells correlates with the risk of forming a MYC translocation.158
18
1.3 Problem statement and thesis overview
Despite being able to effectively cure paediatric BL, this is only true for privileged patients
with access to proper supportive care, who mostly consist of children with sporadic BL.
Prognosis for children with endemic BL remains dismal. The severe toxicity of current
treatment regimens also needs to be considered because it is thought to be a major
contributor to the lack of success of treatment in endemic and adult sporadic BL. There is
thus an urgent need to advance our understanding of BL pathogenesis, especially in the
comparative setting, in order to identify new potential therapeutic targets. Several open
questions exist in the literature regarding BL. While many of these questions are not
conclusively addressed herein, this thesis presents key advancements in our knowledge
of BL biology and provides support for longstanding hypotheses.
In this work, I aimed to characterize the genetic and molecular landscape of paediatric
sporadic and endemic BL. I do not consider adult cases or immunodeficiencyassociated
cases. I focus on the mutational landscape and to a lesser degree, gene expression
profiling by leveraging whole genome and transcriptome sequencing datasets. The
hypotheses underpinning this thesis are: (1) hitherto uncharacterized features of BL
genomes and transcriptomes may provide novel insight into BL biology and open up new
avenues for targeted therapy; and (2) molecular features of BL vary primarily based on
tumour EBV status (and potentially EBV genome type) rather than geographic origin and
thus treatment should be tailored accordingly. These hypotheses are investigated in
Chapters 2 and 3, respectively. In Chapter 2, I will describe novel features of BL genomes
and extend previously made observations. In Chapter 3, I will demonstrate the importance
of tumour EBV status relative to geographic origin in determining features with likely roles
in pathogenesis. Finally, I will discuss these findings in Chapter 4 and explore potential
avenues for future research in this disease implied by this work.
19
Chapter 2
Discovery of genetic and molecularaberrations in BL
2.1 Introduction
Modern technologies such as highthroughput sequencing have greatly accelerated the
pace and scale at which researchers can characterize the genetic and molecular features
of cancer. Identifying these features can provide pivotal insight into the mechanisms
underlying tumour initiation and progression. In turn, an improved understanding of
disease aetiology can pave the way for the development of more efficient and/or less toxic
treatments, often by virtue of targeting specific features of malignant cells.
Since 2012, a number of published studies have analysed the BL genome and
transcriptome using highthroughput sequencing.94,96,97,159–165 Despite this volume of
work, several open questions regarding BL pathogenesis remain, as laid out in Chapter 1.
This owes in part to the technological and sample limitations of past studies. A majority of
patient cohorts whose BL samples underwent sequencing were small and most lacked
sufficient representation of endemic and/or EBVpositive cases to provide sufficient
statistical power. Additionally, some of these studies relied heavily on tumouronly RNA or
exome sequencing data. While costeffective, this sequencing strategy poses several
constraints on downstream analyses. First, the lack of matched normal data greatly
complicates the distinction between somatic and germline variants. This is especially
difficult for endemic cases because the African population features more germline
polymorphisms that are underrepresented in current databases. Second, because
coverage in RNA and exome sequencing is biased towards to exonic regions, this
naturally limits the number and type of mutations that can be detected. Third, with RNA
sequencing (RNAseq) data, variable gene expression may reduce the sensitivity for
variant detection, especially for genes with lower expression and for lossoffunction
20
mutations that result in nonsensemediated decay. Fourth, the possibility of physiologic or
aberrant RNA editing adds an additional layer of complexity for identifying true somatic
mutations.
To more comprehensively study the molecular aetiology of BL and specifically overcome
these limitations, the Burkitt Lymphoma Genome Sequencing Project (BLGSP)
assembled a comprehensive patient cohort and subjected these to both whole genome
and transcriptome sequencing. The discovery and validation cohorts feature endemic and
sporadic cases as well as EBVpositive and EBVnegative tumours, allowing for their
comparison, detailed in Chapter 3. Whole genome sequencing (WGS) was performed on
tumour and germline DNA for the accurate detection of somatic mutations in all cases.
Compared to RNA and exome sequencing, WGS enables the identification of noncoding
variants in intronic and intergenic regions as well as more accurate copy number
variations (CNVs) and structural variations (SVs). Another key difference with this dataset
is that library preparation for RNA sequencing relied on ribosomal RNA (rRNA) depletion
from total RNA rather than poly(A) RNA enrichment. This theoretically permits the
profiling of noncoding RNAs (ncRNAs) regardless of the presence of a poly(A) tail and
allows the quantification of all EBV transcripts. In short, at the outset of this project, I
gained access to an unprecedented data set and was thereby poised to discover genetic
and molecular features of BL not possible in previous studies.
DLBCL shares some genetic features with BL and has been studied more rigorously
using genomic techniques. Two main gene expression subtypes exist, which roughly
correspond to the presumed celloforigin, namely germinalcentre Bcell DLBCL and
activated Bcell DLBCL. Of these two subtypes, germinalcentre Bcell DLBCL is
considered the most similar to BL at the molecular level, because both of these
lymphomas are thought to derive from germinal centre B cells. This gene expression
subtype has recently been divided further into two groups based on additional gene
expression features.166,167 Given the growing understanding of DLBCL and shared
molecular features, the molecular and genetic relationship between BL and the subgroups
of DLBCL should be investigated further. While both BL and DLBCL are considered to be
aggressive Bcell nonHodgkin lymphomas, their aetiology, prognosis, and response to
treatment are distinct and can be the focus of further study.
21
In this chapter, I set out to discover novel genetic and molecular features of BL by
leveraging the most comprehensive genomic dataset to date. Briefly, five genes were
associated with BL for the first time and help nominate potential novel therapeutic
opportunities. WGS also enabled the identification of discrete regions enriched in
noncoding mutations, which may be disrupting regulatory elements. Four mutational
signatures were discerned de novo, shedding light on the underlying mechanisms
responsible for mutagenesis in BL. Lastly, IG V gene usage was assessed using RNAseq
data, clearly demonstrating nonuniform V gene usage. In summary, this chapter provides
an exhaustive description of the mutational landscape of paediatric BL.
2.2 Results
2.2.1 Clinical and molecular characteristics of BL cases
All cases considered here were less than 21 years old at diagnosis and thus deemed
paediatric. The discovery cohort consisted of 106 BL cases: 74 endemic BL (eBL) cases
from Uganda and 32 sporadic BL (sBL) cases from the United States and Germany. The
Ugandan and American cases were accrued for the BLGSP. The 15 German cases were
accrued for the International Cancer Genome Consortium (ICGC) Molecular Mechanisms
in Malignant Lymphoma by Sequencing (MMMLSeq) project. The ICGC cases were
included in some analyses to increase the number of sBL cases. Both projects generated
WGS and RNAseq data. I had access to WGS data for both tumour and normal tissue
and RNAseq data for tumour tissue. However, I did not utilize the ICGC RNAseq data to
avoid technical sources of variation (or “batch effects”) due to differences in sample
handling, library preparation method, and sequencing protocols. The clinical and
molecular characteristics of the discovery cohort are summarized in Table 2.1. Patient
metadata are presented per case in the discovery and validation cohorts in Supplemental
Table 1 of Appendix A.
Cases that failed the strict criteria for qualifying for the BLGSP discovery cohort were
included in the BLGSP validation cohort instead. The validation cohort consisted of 29 BL
cases: 24 eBL from Uganda and 5 sBL cases from the United States. Instead of WGS,
these cases were subjected to targeted DNA sequencing of recurrently mutated regions
22
Table 2.1: Summary of clinical and molecular characteristics of the discovery cohort. Cases fromthe BLGSP and the ICGC are shown separately. FF, fresh frozen tissue; FFPE, formalinfixedparaffinembedded tissue; BM, bone marrow; CNS, central nervous system.
Variable Level BLGSP(n=91)
ICGC(n=15)
Total(n=106)
Female 32 (35%) 1 (7%) 33 (31%)SexMale 59 (65%) 14 (93%) 73 (69%)
Endemic BL 74 (81%) 0 (0%) 74 (70%)Clinical variantSporadic BL 17 (19%) 15 (100%) 32 (30%)
EBVpositive 71 (78%) 0 (0%) 71 (67%)EBV statusEBVnegative 20 (22%) 15 (100%) 35 (33%)
EBV type 1 59 (65%) 0 (0%) 59 (56%)EBV type 2 12 (13%) 0 (0%) 12 (11%)
EBV type
EBVnegative 20 (22%) 15 (100%) 35 (33%)
0 5 yr 21 (23%) 6 (40%) 27 (25%)6 10 yr 50 (55%) 5 (33%) 55 (52%)11 15 yr 18 (20%) 2 (13%) 20 (19%)
Age group
16 20 yr 2 (2%) 2 (13%) 4 (4%)
FF 88 (97%) 15 (100%) 103 (97%)Tumor biopsyFFPE 3 (3%) 0 (0%) 3 (3%)
IGHMYC 74 (81%) 11 (73%) 85 (80%)IGLMYC 8 (9%) 3 (20%) 11 (10%)IGKMYC 7 (8%) 1 (7%) 8 (8%)
IGMYCtranslocations
Other 2 (2%) 0 (0%) 2 (2%)
IgM 63 (69%) 0 (0%) 63 (59%)IgG 11 (12%) 0 (0%) 11 (10%)
IG isotype
Undetectable 17 (19%) 15 (100%) 32 (30%)
Headonly disease 29 (32%) 0 (0%) 29 (27%)Intraabdominal disease 16 (18%) 0 (0%) 16 (15%)Disseminated disease (noBM/CNS involvement)
36 (40%) 0 (0%) 36 (34%)
Disseminated disease(BM/CNS involvement)
8 (9%) 0 (0%) 8 (8%)
Anatomic site
Unknown 2 (2%) 15 (100%) 17 (16%)
in addition to RNAseq using the same protocol as the discovery cohort. The clinical and
molecular characteristics of the validation cohort are summarized in Table 2.2.
The BLGSP tumour and matched normal genomes were sequenced to an average
nonredundant depth of 82X (range 55–96) and 41X (range 30–51), respectively. The
ICGC tumour and normal genomes were sequenced to a lower depth of 40X (range
29–62). Because of their lower sequencing coverage, the ICGC genomes had fewer
mutations on average, presumably due to limited sensitivity for mutation detection. For this
23
Table 2.2: Summary of clinical and molecular characteristics of the validation cohort.
Variable Level BLGSP(n=29)
Female 11 (38%)SexMale 18 (62%)
Endemic BL 24 (83%)Clinical variantSporadic BL 5 (17%)
EBVpositive 23 (79%)EBV statusEBVnegative 6 (21%)
EBV type 1 22 (76%)EBV type 2 1 (3%)
EBV type
EBVnegative 6 (21%)
0 5 yr 7 (24%)6 10 yr 19 (66%)11 15 yr 1 (3%)16 20 yr 1 (3%)
Age group
Unknown 1 (3%)
FF 29 (100%)Tumor biopsyFFPE 0 (0%)
IgM 19 (66%)IgG 1 (3%)IgA 1 (3%)
IG isotype
Undetectable 8 (28%)
Headonly disease 6 (21%)Intraabdominal disease 14 (48%)Disseminated disease (noBM/CNS involvement)
4 (14%)
Disseminated disease(BM/CNS involvement)
1 (3%)
Anatomic site
Unknown 4 (14%)
reason, I omitted the ICGC cases from analyses relating to global mutation rates, which
would likely be affected by this technical variable. The BLGSP validation tumour and
normal samples were sequenced relatively deeper, namely 243X (range 158–392).
To complement the transcriptome data from the discovery and validation cohorts, I
included RNAseq data from a small group of healthy tonsils donors (“tonsil cohort”), which
were accrued through the BLGSP. Both centroblasts and centrocytes were cellsorted
from the tonsils and separately underwent RNAseq, yielding six libraries for each cell
type. Derived from the germinal centre, centroblasts and centrocytes are considered the
closest celloforigin for BL and thus the most appropriate normal comparator for gene
expression. Specifically, centroblasts were selected for CD19+, CD38+, IgD–, CXCR4+,
24
and CD83–, whereas centrocytes were selected for CD19+, CD38+, IgD–, CXCR4–, and
CD83+. The BLGSP tumour and tonsil RNAseq datasets had 200M (range 100–289M)
and 219M reads (range 204–240M) on average, respectively.
2.2.2 Datadriven inference of tumour EBV status and genome type
The EBV genome encodes two small noncoding RNA genes called EBER1 and EBER2,
which are both highly expressed in host cells. EBER in situ hybridization (ISH) is the
standard clinical assay for determining tumour EBV status. However, for most cases
analyzed here, EBER ISH was not performed. As a result, tumour EBV status was
inferred from the raw sequencing data using two different methods. First, I calculated the
fraction of WGS reads that aligned to the EBV genome (Figure 2.1A). Second, to emulate
EBER ISH, I counted the number of RNAseq reads aligning to the EBER1 and EBER2
genomic loci (Figure 2.1B). Both approaches yielded a clear bimodal distribution, which
was taken to represent the EBVpositive and EBVnegative cases. Importantly, the two
methods agreed with one another for every case. Additionally, the inferred tumour EBV
status was concordant with available results from EBER ISH (N = 5) or EBV PCR (N = 1).
Ultimately, the discovery cohort had 71 (67%) EBVpositive cases and 35 (33%)
EBVnegative cases. I also determined the EBV genome type (i.e. type 1 or type 2) where
applicable (Figure 2.1C). Out of 71 EBVpositive cases, EBV type 1 and type 2 were
found in 59 (83%) and 12 (17%) tumours, respectively. All cases with EBV type 2 were
endemic (i.e. from Uganda).
2.2.3 Structural and copy number variations affecting MYC
In the discovery cohort, 104 out of 106 tumours had detectable translocations placing
MYC in proximity to an IG enhancer (Figure 2.2). Among these tumours, IGH, IGL and
IGK were involved in the MYC rearrangements of 85 (82%), 11 (11%) and 8 (8%)
tumours, respectively. While lacking traditional IGMYC translocations, the remaining two
tumours featured more complex rearrangements involving MYC. One of these was a sBL
case (BLGSP711900123) with a reciprocal structural variation between the MYC and
BCL6 loci that resulted in the focal gain of MYC, possibly in the form of a double minute.
The other was an eBL case (BLGSP710600277) with a complex set of translocations
rearranging MYC and IGH via an intergenic region on chromosome 17.
25
1e−05
1e−04
1e−03
1e−02
1e−01
EBV−negative EBV−positive
Inferred EBV infection status
EB
V p
erce
ntag
e of
WG
S r
eads
(lo
g)
A
1
10
100
1000
10000
EBV−negative EBV−positive
Inferred EBV infection status
EB
ER
RN
A−
seq
read
cou
nt (
log)
B
0.5
1.0
2.0
4.0
EBV type 1 EBV type 2
Inferred EBV genome type
Fre
quen
cy r
atio
for
k−m
ers
uniq
ueto
EB
V ty
pe 1
and
type
2 (
log)
C
Figure 2.1: Molecular differences between EBVpositive and EBVnegative BL tumours. (A)Fraction of mapped reads from whole genome sequencing data that aligned to the EBV genome(log scale). The minimum threshold for calling EBVpositive samples was 0.006, indicated by thedashed line. (B) RNAseq read counts for EBER1 and EBER2 (log scale). The minimum count forcalling EBVpositive samples was 250 reads, indicated by the dashed line. A pseudocount of 1was added to all values prior to log transformation. This excludes the ICGC cases whose RNAseqdata were not analyzed. (C) Ratio between the counts for 21mers that are unique to EBV type 1and type 2, respectively, calculated from whole genome sequencing reads aligned to the EBVgenome. The minimum ratio for calling EBV type 1 samples was 1, indicated by the dashed line.
In addition to translocations, other structural alterations affecting MYC were found. First, I
observed telomeric gains of chromosome 8q in six (5.7 %) tumours (Figure 2.3). In these
tumours, the associated IGMYC breakpoints were upstream of MYC, confirming the
inclusion of the protooncogene in the gain. These events may be the result of
unbalanced MYC translocations and further promote MYC expression. Second, focal
gains were also found in three cases (2.8%), ranging from 50 to 180 kbp. Third, one eBL
case (BLGSP710600086) has distinctive CNVs on chromosome 11q, namely highlevel
gains of a region spanning 11q22.3–q23.2 followed by telomeric loss of 11q23.3–qter
(Figure 2.3). These CNVs are reminiscent of those characteristic of the new WHO entity
“Burkittlike lymphoma with 11q aberration”, which is also defined by the lack of IGMYC
translocations. In this case though, the 11q CNVs coexist with an IGMYC translocation,
indicating that these events are not strictly necessarily mutually exclusive.
26
8
2
22
14
CASC8
MYC PVT1
IGK
C
IGK
V
IGL VIGL C
IGH C
IGH
V
Figure 2.2: Rearrangements of the immunoglobulin loci. Translocations (shown in center)between the MYC locus (chromosome 8) and the IGH (chromosome 14), IGK (chromosome 2), orIGL (chromosome 22) loci in tumours with WGS data (N = 106). The inner track displays therainfall plot for simple somatic mutations in these regions. Mutations that overlap AICDArecognition sites (RGYW) are shown in red.
27
chr1 chr2 chr3 chr4 chr5
20%
10%
0%
10%
20%C
NV
inci
denc
e
chr6 chr7 chr8 chr9 chr10 chr11 chr12
20%
10%
0%
10%
20%
CN
V in
cide
nce
chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX
20%
10%
0%
10%
20%
CN
V in
cide
nce
Figure 2.3: Landscape of copy number variations. Proportion of cohort affected by copy numbergains and losses are shown in red and blue, respectively. CNVs that are smaller than 100 kbp arenot displayed.
2.2.4 Refining list of genes with potential roles in BL pathogenesis
To assemble a list of BLassociated genes (BLGs), I identified somatic single nucleotide
variants (SNVs) and small insertions/deletions (indels), collectively known as simple
somatic mutations (SSMs), from paired tumournormal WGS data using Strelka.168
Exonic and splicesite SSMs in the discovery and validation cohorts are listed in
Supplemental Tables 2 and 3, respectively, of Appendix A. I analyzed somatic SSMs
using two separate strategies.
First, I identified significantly mutated genes in the discovery cohort using an ensemble
approach involving four complementary methods: OncodriveCLUST for identifying genes
with mutation hotspots; OncodriveFM and OncodriveFML for identifying genes with
functional mutation bias using different metrics; and MutSigCV for identifying genes that
are mutated more frequently than what is expected due to chance. To be considered
28
significantly mutated, a gene needed to be supported by two or more methods (Qvalue <
0.1). Most genes identified through this approach have already been associated with BL,
including some recently discovered candidate BL genes such as TFAP4 and
KMT2D.163,164 I also identified genes not previously described as recurrently mutated in
BL, namely SIN3A, USP7, HIST1H1E, CHD8, and RFX7. The supporting methods for
each gene are shown in Supplemental Table 4 of Appendix A.
Second, I employed more lenient criteria whereby genes previously reported as
recurrently mutated in BL were also considered BLGs if they were altered in at least five
cases of the discovery cohort. This approach led to the inclusion of MYC, MIR17HG,
CDKN2A, and PTEN as BLGs. In total, I identified 27 BLGs and organized them into
groups of related genes (Figure 2.4). In addition to SSMs, I also considered CNVs and
SVs affecting BLGs, which are listed in Supplemental Tables 5 and 6, respectively, of
Appendix A. The mutation status for each BLG and pathway per sample is summarized in
Supplemental Table 7 of Appendix A.
At least 74 genes have been previously reported as candidate BL genes but are not
featured on my list of BLGs.94,96,97,159–165 Out of these genes, only two were discussed in
more than one of these publications: CREBBP and CARD11. Both considered
DLBCLassociated genes, they are mutated in one (0.94%) and three (2.8%) cases,
respectively, and thus do not meet my criterion for being considered a bona fide BLG. The
remaining 72 genes are mutated in at most four (3.8%) cases with the exception of RYR2,
which is mutated in seven (6.6%) cases. I did not include RYR2 as a BLG given its large
size and known status as a false positive significantly mutated gene.169 Given the lack of
support for the remaining genes, I presume that most of these are affected by passenger
somatic or germline mutations. As an example, CCNF was previously reported as
harbouring a somatic mutation hotspot but lacked nonsynonymous SSMs in this BL
cohort.161 While I was unable to identify any somatic mutations at the purported hotspot
position in this cohort, I did find two eBL cases with support for this variant in both the
tumour and normal DNA, strongly suggesting that this mutation is a single nucleotide
polymorphism. I also found that this variant exists in the dbSNP database. Among the
populations in the 1000 Genomes Project, the African population had the highest
29
TCF3/ID3 module (altered in 44%)
TCF3
ID3
6.7%
40%
BCR/PI3K signaling (altered in 34%)
PTEN
MIR17HG
FOXO1
3.7%
10%
24%
MYC regulation (altered in 67%)
TFAP4
SIN3A
MYC
9.6%
16%
61%
Apoptosis (altered in 44%)
CDKN2A
USP7
TP53
3.7%
8.1%
35%
SWI/SNF complex (altered in 59%)
SMARCA4
ARID1A
19%
40%
Epigenetic regulation (altered in 30%)
BCL7A
CHD8
HIST1H1E
KMT2D
5.9%
7.4%
8.1%
12%
GPCR signaling (altered in 36%)
P2RY8
RHOA
GNA13
8.1%
12%
19%
Other (altered in 78%)
RFX7
ETS1
PCBP1
GNAI2
CCND3
FBXO11
DDX3X
5.9%
9.6%
14%
14%
19%
26%
56%
0 25 50 75
Mutation countMutation type
Missense
Truncating/splicing
Gain (focal)
Gain (large)
Deletion (focal)
Deletion (large)
Multiple hits
Figure 2.4: Landscape of nonsynonymous mutations in BLGs for the discovery and validationcohorts (N = 135). Cases are reordered for each pathway to highlight any mutual exclusivity.Mutations are colored according to their predicted consequence on the protein (i.e. mutation type)and are tabulated in the righthand barplots. Focal gains and deletions were defined as thosesmaller than 1 Mbp.
30
alternate allele frequency, consistent with my observation of this germline variant has only
been seen in eBL cases.
2.2.5 Challenges with genetic comparison between BL and DLBCL
Given the relationship between BL and DLBCL, it would be interesting to perform a
genetic comparison of somatic mutations. In a recent publication, I contributed to the
assembly and analysis of WGS data from 153 DLBCL cases.170 These two large BL and
DLBCL WGS datasets present a unique opportunity to compare the genetic features of
both diseases. However, important differences between both datasets limit the
interpretability of any findings. First, mutations were identified differently for the DLBCL
genomes compared to those detected in the BL genomes. While the methodology could
be harmonized, this represents a nontrivial task because filters to remove mutation
artifacts in BL can rely on the tumours’ relative purity and clonality. The same filters would
most likely be too aggressive for filtering mutations in DLBCL tumours, which tend to be
less pure and harbour subclonal heterogeneity. Second, the sequencing coverage is not
consistent across the DLBCL dataset, which introduces the same caveat as the ICGC BL
dataset. Namely, variable coverage is associated with varying degrees of sensitivity for
mutation detection, which limits any attempt at comparing the incidence of mutations. For
these reasons, I do not present a comparison of somatic nonsynonymous mutations
between BL and DLBCL.
2.2.6 Novel mutation patterns in BLassociated genes
By considering other mutation types more readily detected using WGS, I observed novel
mutation patterns in some BLGs and consequently, higher incidence of mutations beyond
what has been reported previously. For example, I found focal deletions or inversions
affecting DDX3X in six (5.7%) cases, all of which are predicted to disrupt the open
reading frame by affecting one or more exons (Figure 2.5). Two additional cases (2.8%)
had mutations affecting the splicing branch point of intron 6 (Ensembl transcript
ENST00000399959; Figure 2.6). Both tumours showed aberrant transcript splicing in the
RNAseq data. Considering these novel mutation types, with the exception of MYC,
DDX3X was the most commonly mutated gene, with a total of 75 (56%) affected cases in
the discovery and validation cohorts.
31
Figure 2.5: Focal structural variations affecting DDX3X visualized in the Integrative GenomicsViewer (IGV). The left panel shows the deletion of an exon; the middle panel shows the deletion ofthe entire gene; and the right panel shows the inversion of some exons.
Figure 2.6: Somatic mutations altering a splicing branch point in DDX3X. The top panel shows theintronexon boundary of intron 6 and somatic mutations detected in the discovery cohort; themiddle panel shows the sequence context where recurrent noncoding mutations occur; and thebottom panel shows the sequence motif for splicing branch point for reference.
32
The mutation pattern in GNAI2 was also clarified by my analysis. This gene is affected by
nonsilent mutations in 19 (14%) cases at one of three hotspots: G45, R179, and
K271/K272 (Appendix B for mutation/lollipop plots). While mutations at some of these loci
have been described before, the recurrent inframe deletions of K272 have not been
reported.171 Analogous mutations in GNAS are known to be activating in other
cancers.172,173 Considering that the hotspot mutations in GNAI2 affect residues in
proximity of the protein GDP binding site, it is possible that they share a common function
in activating the encoded protein.
A previous report found that ID3 was enriched in mutations that overlapped AICDA
recognition sites (RGYW), which are presumed to be introduced by aSHM.97 Within the
gene body of every BLG, I compared the observed mutation rate of nucleotides forming
AICDA recognition sites with the expected rate (Qvalues < 0.1, binomial exact test). In
addition to ID3, I found a similar enrichment of mutations affecting AICDA recognition
sites in HIST1H1E, MYC, BCL7A, and ETS1, whereas the opposite trend was seen in
GNAI2 and RHOA (Figure 2.7). The observed constraints on which codons are mutated in
GNAI2 and RHOA can explain the the lack of mutations in AICDA recognition sites. In
other words, there appears to be a selection against variants being introduced elsewhere
in the genes.
I also investigated the relationship of mutations to one another. Specifically, mutual
exclusivity can shed light on mutations that are functionally redundant or whose
cooccurrence may be lethal to the cell. I quantified mutual exclusivity using the previously
established groups of related genes (Figure 2.8). The only genes whose mutations were
mutually exclusive were the components of the SWI/SNF pathway, namely ARID1A and
SMARCA4 (Qvalue = 0.000023; CoMEt exact test).174,175
2.2.7 Landscape of noncoding mutations shaped by somatichypermutation
One key advantage of WGS over exome or RNA sequencing is the ability to
comprehensively determine the landscape of noncoding mutations, especially in intronic
and intergenic regions. Here, I had access to a sufficient number of BL genomes to
characterize the genomewide landscape of noncoding mutations. I used the Rainstorm
33
ID3
ETS1
BCL7A
RHOAGNAI2
HIST1H1E
MYC
0
1
2
3
4
0.0 0.5 1.0 1.5 2.0
Odds ratio (mutations at any base in AICDA motif)
Odd
s ra
tio (
mut
atio
ns a
t G/C
in A
ICD
A m
otif)
Enrichment/depletionof AICDA mutations
Depleted
Enriched
Neutral
Figure 2.7: Enrichment or depletion of mutations affecting AICDA recognition sites (RGYW) inBLGs. The Xaxis displays the odds ratio between the observed and expected mutation rates of allbases in AICDA recognition sites. The Yaxis shows the odds ratio between the observed andexpected mutation rates of guaninecytosine pairs in AICDA recognition sites. BLGs with asignificant enrichment or depletion according to either metric are displayed in red and blue,respectively (Qvalues < 0.1, binomial exact test).
SWI/SNF complex
Apoptosis
GPCR signaling
TCF3/ID3 module
Epigenetic regulation
BCR/PI3K signaling
MYC regulation
0 1 2 3 4
−log10(Q−value)
Figure 2.8: Mutual exclusivity of mutations affecting BLGs associated with each pathway (Cometexact test). The dashed line represents the minimum Qvalue threshold of 0.1.
34
and Doppler algorithms for genomewide inference of discrete genomic regions enriched
for noncoding mutations in the cohort.170 These regions are referred to here as
“noncoding mutation peaks” (“peaks”, for brevity). They are listed in Supplemental Table
8 of Appendix A. I identified 70 peaks with a median size of 1,539 bp (range 20–10,652;
Figure 2.9A). Out of the 38 peaks mutated in 15 or more patients, 17 overlapped one of
the three IG loci and were separately considered as three respective groups. Of the
remaining commonly mutated peaks, there was a clear bimodal distribution in the
distance from the nearest TSS. Specifically, 17 were within 3 kbp of a TSS and were thus
categorized as TSSproximal, while the other three were considered TSSdistal (Figure
2.9B). Additionally, most TSSproximal peaks were associated with genes or regions
known to be affected by aSHM in other lymphomas including DLBCL (Figure 2.9C).176
Given that most peaks were TSSproximal and associated with genes targeted by aSHM,
I hypothesized that these regions are mutated by AICDA in a subset of BL tumours.
Consistent with AICDA activity, I found an enrichment of mutations affecting AICDA
recognition sites (RGYW) in 61% of peaks (Qvalues < 0.1, binomial exact test; Figure
2.10). Given that active transcription is known to facilitate AICDAmediated mutation, I
explored the expression of genes associated with TSSproximal peaks (i.e. “peak target
genes”).63,64,177 Peak target genes were among the most highly expressed genes in all
tumours, including those cases lacking mutations in these regions (median
transcriptspermillion expression percentile = 98.3). I also did not find a strong correlation
between the presence of mutations in a peak and higher target gene expression (Figure
2.11). Overall, AICDA expression correlated with the number of mutated peaks (Figure
2.9C) and the number of mutations within peaks (Pvalue = 2.3 × 10−8, Pearson
correlation test; Figure 2.12). Altogether, these findings demonstrate that discrete
genomic regions in BL accumulate noncoding mutations, and most appear to be the
consequence of AICDAmediated aSHM.
Though several mutation peaks identified here overlap known targets of aSHM, many of
these regions or genes are not known to be targeted by aSHM in BL. Notably, I found a
mutation peak 54 kbp downstream of MYC that overlaps the promoter and first intron of
PVT1, a locus that produces a long noncoding RNA (lncRNA) and a known target of
MYC.178 PVT1 promoter mutations occurred in 17% of 106 BL cases compared to only
35
0
4
8
12
0 2 4 6 8 10
Non−coding mutation peak size (kb)
Fre
quen
cyA TSS−proximal TSS−distal
0.0 0.5 1.0 1.5 0 100 200 300 4000
10
20
30
Distance from nearest TSS (kb)
Fre
quen
cy
B
AIC
DA
expression
8
9
10
11
12
13
IG loci
IGK locus
IGL locus
IGH locus
0.3
3.0
Mut./kbp
TS
S−
proximal
PVT1 (−755 to +3,376)
BIRC3 (+60 to +996)
RHOH (−719 to +1,417)
ST6GAL1 (−853 to +976)
MIR142 (−1,081 to +992)
ZFP36L1 (+408 to +1,412)
DTX1 (−1,476 to +973)
CXCR4 (−575 to +2,204)
BTG2 (+225 to +2,138)
BCL7A (−3,274 to +5,811)
TCL1A (−2,113 to +1,427)
BCL6 (−904 to +3,029)
BACH2 (−2,345 to +5,398)
MYC (−1,017 to +8,532)
0.3
1.0
3.0
10.0
Mut./kbp
TS
S−
distal
ST6GAL1 enhancer (intronic)
BCL6 enhancer (intergenic)
PAX5 enhancer (intergenic)
1
3
Mut./kbp
C
Figure 2.9: Noncoding mutation peaks. (A) Size distribution of noncoding mutation peaks (orsimply, “peaks”). (B) Distance between peaks and the respective nearest TSS. Peaks overlappingimmunoglobulin loci are omitted. (C) Density of noncoding mutations as mutations per kilobase(mut./kbp) in peaks annotated with the nearest transcription start site (relative position inparentheses) or regulatory element. Peaks overlapping IG loci are shown separately. Tumourscorrespond to columns and are ordered based on AICDA expression, as shown in the top panel.
36
0
1
2
3
4
5
−0.6 −0.3 0.0 0.3 0.6
log10(Odds ratio)
−lo
g 10(
Q−
valu
e)Mutations in any base of AICDA motif
0
1
2
3
4
5
−0.6 −0.3 0.0 0.3 0.6
log10(Odds ratio)
−lo
g 10(
Q−
valu
e)
Mutations in G/C of AICDA motif
Figure 2.10: Enrichment or depletion of mutations affecting AICDA recognition sites (RGYW) inpeaks altered in at least 15 cases. The left panel displays the tests considering the mutation rateof all bases in AICDA recognition sites. The right panel shows the tests considering the mutationrate of guaninecytosine pairs in AICDA recognition sites. The vertical dashed line indicates aneutral log odds ratio, and the horizontal dashed line indicates the minimum Qvalue threshold of0.1 (binomial exact tests). Peaks with a significant enrichment are displayed in red.
*** * ** ***
0.5
1.0
1.5
LTB
SERPINA9
HIST1H
4J
RCC1
HIST1H
2BK
ETS1
ST6GAL1
RHOH
ZFP36L1
BTG2
FOXO1DTX1
POU2AF1
CXCR4
BCL7A
TCL1A
BIRC3
RFTN1BCL6
BACH2
BMP7
RNF144B
Non−coding mutation peak
Rel
ativ
e ge
ne e
xpre
ssio
n
Mutation status
Unmutated Mutated
Figure 2.11: Variancestabilized expression values of genes associated with TSSproximal peaksaccording to the mutation status of each peak. Only proteincoding genes are displayed. For eachgene, expression values were normalized by the median expression in unmutated tumours.Significance brackets: *, Qvalue < 0.1; **, Qvalue < 0.001 (Mann–Whitney U test).
37
5000
10000
8 9 10 11 12 13
AICDA expression
Num
ber
of m
utat
ions
in p
eaks
EBV status
EBV−positive
EBV−negative
Figure 2.12: Correlation between variancestabilized AICDA expression and the number ofmutations in noncoding mutation peaks.
4.6% in a cohort of 153 DLBCL cases.170 Another noncoding mutation peak affected a
distal enhancer for PAX5, a transcription factor with an important role in Bcell
differentiation. Mutations in this enhancer were found in 11% of 150 chronic lymphocytic
leukemia cases, whereas I observe a higher mutation incidence (20%) in 106 BL
genomes, which is comparable to that observed in 153 DLBCL genomes (23%).170,179
Guaninecytosine pairs in AICDA recognition sites (RGYW) were mutated at a higher than
expected rate in the PAX5 enhancer and PVT1 promoter mutation peaks, reminiscent of
the cytosine deamination seen during aSHM (Qvalues = 0.0045 and 0.056, respectively;
binomial exact test). These variants raise the possibility that AICDA is contributing to BL
by introducing noncoding mutations in regulatory regions.
2.2.8 Robust identification of mutational signatures in BL genomes
Several mutational processes shape the landscape of somatic variants in tumour
genomes, each resulting in a distinct mutational signature.180 Here, a mutational
signature is defined by a pattern of mutations based on base change and trinucleotide
context. At the time of this work, there were 30 robust reference signatures in the
Catalogue of Somatic Mutations in Cancer (COSMIC) database, some having been
38
A
B
C
Figure 2.13: Known and novel targets of aberrant somatic hypermutation. Noncoding mutationpeaks overlapping (A) BACH2, (B) PVT1 promoter region, and (C) distal PAX5 enhancer.Mutations from the BL discovery cohort (N = 106) and a DLBCL cohort (N = 153) are shownseparately.
39
attributed to known or suspected mutational processes.180,181 To investigate the
mutational processes active in BL cells, I inferred mutational signatures de novo using
standard methodology.180 Similar to unsupervised clustering, a range of signature counts
is tested, and the optimal number is decided by maximizing stability while minimizing
reconstruction error (Figure 2.14A). In this cohort of 106 genomes, the optimal number of
signatures was four (Figure 2.14B). Each of these “BL signatures” (designated BL
signatures A through D) was paired with a COSMIC reference signature (version 2) based
on cosine similarity to infer putative etiologies (Figure 2.14C).
The pattern for BL signature A displayed a relatively uniform distribution of mutation types
with a slight bias towards C>T substitutions. This mutation composition was most similar
to COSMIC signature 5, which is found in all cancer types and most tumours. Its ubiquity
is due to the fact that it is one of two signatures that result from clocklike processes, the
other being COSMIC signature 1.181 In Bcell lymphomas, signature 5 was more common
than signature 1 and presented a stronger correlation with age at diagnosis.181 In BL, this
clocklike process is the most common source of mutations, accounting for 39% (range
1.1–80%) of SSMs on average (Figure 2.15). BL signature B was defined by a
preponderance of T>G—and to a lesser extent, T>C—mutations in the NpTpT context.
This pattern shared the highest similarity with COSMIC signature 17, which has no known
aetiology. This signature has previously been found in several cancer types including
Bcell lymphomas, and it is associated with 17% (3.1–63%) of mutations in these cases
(Figure 2.15). The lack of understanding for this signature limits my capacity to infer its
relevance to BL.
Whereas BL signatures A and B are either expected or unaccounted for, the remaining
two signatures reveal potentially tumourspecific mutational mechanisms. BL signature C
is composed of mutations altering T or C (i.e. Y) in the GpYpN or TpYpN contexts. While
the proportions of different types of mutations differ slightly, this signature is most similar
to COSMIC signature 15, which is not typically represented in Bcell lymphomas.
Defective DNA mismatch repair (MMR) has been proposed as the mechanism
responsible for signature 15. This finding suggests that MMR may be disrupted in a subset
of BL tumours, although the mechanism is unclear. That being said, compared to the other
signatures, it is the least common in BL genomes, accounting for 10% (range 1.3–40%) of
40
234
5
6
7
0.5
0.6
0.7
0.8
0.9
1.0
1000 1500 2000 2500 3000
Reconstruction errorS
tabi
lity
A
C>A C>G C>T T>A T>C T>G
BL signature A
BL signature B
BL signature C
BL signature D
AC
AA
CC
AC
GA
CT
CC
AC
CC
CC
GC
CT
GC
AG
CC
GC
GG
CT
TC
AT
CC
TC
GT
CT
AC
AA
CC
AC
GA
CT
CC
AC
CC
CC
GC
CT
GC
AG
CC
GC
GG
CT
TC
AT
CC
TC
GT
CT
AC
AA
CC
AC
GA
CT
CC
AC
CC
CC
GC
CT
GC
AG
CC
GC
GG
CT
TC
AT
CC
TC
GT
CT
AT
AA
TC
AT
GA
TT
CT
AC
TC
CT
GC
TT
GT
AG
TC
GT
GG
TT
TT
AT
TC
TT
GT
TT
AT
AA
TC
AT
GA
TT
CT
AC
TC
CT
GC
TT
GT
AG
TC
GT
GG
TT
TT
AT
TC
TT
GT
TT
AT
AA
TC
AT
GA
TT
CT
AC
TC
CT
GC
TT
GT
AG
TC
GT
GG
TT
TT
AT
TC
TT
GT
TT
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
Mutation type
Pro
port
ion
(%)
B
17 9 5 15 28 1 8 16 6 14 12 3 19 25 29 26 20 30 4 18 21 24 10 11 23 7 2 22 27 13
D
C
B
A
COSMIC signature
BL
sign
atur
e
0.25 0.50 0.75
Cosine similarity
C
Figure 2.14: Characteristics of de novo mutational signatures. (A) Selecting the optimal number ofde novo mutational signatures (shown in red) by minimizing reconstruction error and maximizingstability. (B) Composition of each BL signature per base change and trinucleotide context. (C)Cosine similarity between the optimal set of BL signatures and all COSMIC reference signatures.Pairs made based on the highest cosine similarity are outlined in red.
41
variants (Figure 2.15). Lastly, BL signature D exhibited a pattern characterized by an
increased occurrence of substitutions affecting T, especially in the TpTpT context. Based
on cosine similarity, this BL signature was paired with COSMIC signature 9, which is
common in cancers derived from mature B cells. This pattern of mutations has been
attributed to polymerase η activity, which is associated with AICDAmediated mutagenesis
during both physiologic and aberrant SHM. Notably, SHM seems responsible for nearly as
many mutations as BL signature A, namely 34% (range 8.6–64%), highlighting the
importance of AICDA in shaping BL genomes (Figure 2.15).
BL signature A
BL signature B
BL signature C
BL signature D
0 25 50 75 100
0
5
10
0
10
20
30
0
10
20
30
40
0
5
10
15
Percent prevalence
Fre
quen
cy
Figure 2.15: Percent prevalence of de novo mutational signatures.
In order to validate the signatures that were identified, I sought to confirm their
relationship with the proposed aetiologies wherever I had relevant data. BL signature B
has no known aetiology, making it impossible to verify, and I had no metric to quantify the
degree of DNA MMR to correlate with BL signature C. On the other hand, BL signatures A
and D were each the only signature to strongly correlate with age at diagnosis and AICDA
expression, respectively (Qvalue = 5.5 × 10−9 and Qvalue = 1.7 × 10−13, respectively;
Pearson correlation test; Figure 2.16A). Additionally, I performed this calculation for all
42
possible solutions for each of the signatures paired with COSMIC signatures 5 and 9
(Figure 2.16B). Despite having been selected using an independent set of criteria, this
analysis showed the strongest correlation with the foursignature solution. This result
lends further credence to the robustness of my inferred signatures.
Age at diagnosis
AIC
DA
expression
0 1 2 3 4 5
D
C
B
A
D
C
B
A
− log10(Q−value)
BL
sign
atur
e
A
2 3
45
6 7
0.0
0.1
0.2
0.3
0.4
0.5
0.0 0.1 0.2 0.3
Pearson correlation with age
Pea
rson
cor
rela
tion
with
AIC
DA
exp
ress
ion
B
Figure 2.16: Correlation between de novo mutational signatures and biological features of BLgenomes. (A) Correlation between signatures from the optimal solution and age at diagnosis andAICDA expression (Pearson’s productmoment correlation test). (B) After generating solutionsranging from 2 to 7 signatures, for each solution, signatures were paired with COSMIC referencesignatures based on cosine similarity. Solutions with signatures paired with COSMIC bothsignatures 1/5 (agerelated) and 9 (AICDArelated) were tested for correlation with age atdiagnosis and AICDA expression, respectively (Pearson’s productmoment correlation).
2.2.9 Nonuniform V gene segment usage in immunoglobulinrepertoire
Given the importance of the BCR in BL, I sought to delineate the repertoire of V(D)J gene
segments used to encode the IG component of the BCR.94 Rearrangement of these
segments helps produce the highly variable complementaritydetermining region 3
(CDR3) sequence, which in turn determines antigen specificity and affinity.182 An IG
nucleotide CDR3 sequence is known as a clonotype, and clonotyping is the process of
identifying these sequences.183 The IG clonotype of the ancestral malignant B cell that
formed the BL tumour is expected to be present in virtually every tumour cell and thus be
clonal, also referred to as the dominant clonotypes. Each antibodyproducing cell contains
a distinct clonotype for the heavy and light chains. I utilized tumour RNAseq data to
perform clonotyping using MiXCR.184,185 By virtue of its reliance on RNAseq data, this
43
analysis is restricted to IG alleles that are expressed. Dominant clonotypes were defined
as those with a clonal fraction of at least 30% (Figure 2.17A). To eliminate spurious
clonotypes, I ignored any clonotypes with fewer than 30 supporting reads. The lack of
similar RNAseq data from environmentmatched controls preclude the comparison with
healthy reportoires. Here, I focused on the V gene segments of dominant clonotypes from
both the heavy and light chains because of their increased diversity.
I identified dominant clonotypes for the heavy and light chains in 96 (82%) and 104 (89%)
cases (N = 117), respectively. In order to account for tumours in which clonal
rearrangements were undetectable, I considered the number of reads attributable to IG
genes. As expected, the limited ability to detect rearrangements in these tumours can be
explained by their reduced heavy and light chain expression (Pvalues = 1.2 × 10−7 and
5.7 × 10−4, respectively, Mann–Whitney U test; Figure 2.17B). Among the dominant
clonotypes that were detected, V segment usage in BL appeared nonrandom, with a
small subset of V segments accounting for most of the clonotypes. Specifically, the five
most commonly used heavy and light chain V segments accounted for 44% and 41% of
dominant clonotypes, respectively. The pattern in BL (N = 117 cases) is similar to what is
seen in DLBCL (N = 323 cases; Figure 2.18A).170 While some V genes appear
differentially utilized between BL and DLBCL (e.g. IGHV320 and IGKV41), none of these
differences are significant (Qvalues > 0.1, Fisher’s exact test). In BL, the most recurrently
used heavy chain V segments were IGHV434 (16 %), IGHV330 (10 %), and IGHV37
(7.3 %). The most frequently used light chain V segment was IGKV320 (20 %). I was
able to recapitulate these findings using the WGS data, however less stringent criteria
were required owing to the lower coverage (Figure 2.18B). These results are consistent
with the established notion that BL relies on BCR activity for promoting PI3K signaling and
raises the possibility for positive selection of potentially autoreactive or antigendriven IG
clonotypes.94
44
30%
30
30%
30
30%
30
30%
30
30%
30
30%
30
Tumor Normal
IGH
IGK
IGL
10 100 1000 10000 10 100 1000 10000
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Clonal count
Clo
nal f
ract
ion
(nor
mal
ized
per
IG c
hain
)
Clonality Dominant Read count < 30 Read fraction < 30% Read fraction < 30% and read count < 30
A
*** **
Heavy chain Light chain
1e+01 1e+02 1e+03 1e+04 1e+05 1e+02 1e+03 1e+04 1e+05
Undetected
Detected
Read count
Clo
nal B
CR
B
Figure 2.17: Dominant immunoglobulin rearrangements. (A) Clonal fraction estimates and countsfor immunoglobulin heavy and light chain clones. Clonal (or “dominant”) rearrangements (shown inred) must have a minimum clonal fraction of 30% (indicated by horizontal dashed line) and at least30 supporting reads (indicated by vertical dashed line). (B) Total read count per sample supportingheavy and light IG chain clones according to whether a dominant clone was detected. Significancebrackets: **, Pvalue < 0.001; ***, Pvalue < 0.00001 (Mann–Whitney U test).
45
IGH IGK IGL
IGHV4−
34
IGHV3−
23
IGHV3−
30
IGHV3−
7
IGHV4−
39
IGHV4−
59
IGHV3−
48
IGHV3−
21
IGHV3−
15
IGKV3−
20
IGKV4−
1
IGKV1−
39
IGKV3−
15
IGKV1−
5
IGKV1−
33
IGKV3−
11
IGLV
1−40
IGLV
2−14
IGLV
1−51
IGLV
3−19
IGLV
1−44
IGLV
3−25
0%
5%
10%
15%
20%
V g
ene
usag
e
Disease BL DLBCL
RNA−seq dataA
IGH IGK IGL
IGHV4−
34
IGHV3−
23
IGHV3−
30
IGHV3−
7
IGHV4−
39
IGHV4−
59
IGHV3−
48
IGHV3−
21
IGHV3−
15
IGKV3−
20
IGKV4−
1
IGKV1−
39
IGKV3−
15
IGKV1−
5
IGKV1−
33
IGKV3−
11
IGLV
1−40
IGLV
2−14
IGLV
1−51
IGLV
3−19
IGLV
1−44
IGLV
3−25
0%
10%
20%
V g
ene
usag
e
WGS dataB
Figure 2.18: Immunoglobulin V gene usage. (A) Percent prevalence of immunoglobulin V genesamong dominant IG rearrangements in BL (N = 106) and DLBCL (N = 256) tumours with RNAseqdata. (B) Percent prevalence of immunoglobulin V genes among dominant IG rearrangements inBL (N = 91) tumours with WGS data, shown in the same order as panel A. V genes that aredominant in less than 10 BL tumours are not displayed.
46
2.3 Materials and methods
2.3.1 Case accrual
Additional details relating to case accrual can be found online in the standard operating
procedures (SOPs).1
Cohort
The cases were accrued at the following tissue source sites: Uganda Cancer Institute
(UCI, Uganda), Epidemiology of Burkitt’s Lymphoma in EastAfrican Children and Minors
(EMBLEM, Uganda), Children’s Oncology Group (COG, USA) who participated in a
clinical trial AALL1131, and St. Jude Children’s Research Hospital (USA). Contributing
tissue source sites provided documentation for Institutional Review Board approval for the
use of tissues submitted for molecular characterization. Clinical data was collected for
each case including initial enrollment data and one year and twoyear outcome data
(details below). The discovery cohort consisted of 91 paediatric BL cases originating from
patients aged between two and 20 years. BL subtypes within this cohort included 74
endemic and 17 paediatric sporadic cases (see Table 1 for details). Each BL case had
both tumour and matched normal tissue (blood, peripheral blood mononuclear cells,
lymph nodes, etc.), and the tumour was collected prior to any treatment. All cases had a
standardized central pathology review by three BL pathologists and confirmed as BL
diagnosis (details below). Once the diagnosis was confirmed, the tumour tissue used for
molecular characterization was evaluated for tumour nuclei and necrosis (details below).
The cases which did not meet the criteria of discovery, lacked matched normal tissue,
normal DNA, or the RNA was degraded or essential clinical data was missing, were
considered for validation. Validation cases with tumour and normal DNA were ultimately
selected for targeted sequencing and validation tumours with sufficient RNA also
underwent RNA sequencing (details below).
1https://ocg.cancer.gov/sites/default/files/BLGSP_SOP_manual.pdf
47
Clinical data
The clinical data were collected by Nationwide Children’s Hospital (Columbus, OH) from
contributing sites after cases were accepted into the discovery or validation cohorts.
Followup data were then collected for two subsequent years. The clinical report form,
followup form, and treatment form can be found within the project standard operating
procedures (SOP #303). The following types of clinical information were collected:
demographic data (date of birth, sex, race, ethnicity, height, weight, vital status), tumour
information [date of diagnosis, tumour anatomic location, tumour status (tumour free/with
tumour), stage, lymph node status, history of prior cancers, synchronous cancers and
subsequent cancers], HIV status [HIV antibody status, date of diagnosis, CD4 counts, HIV
RNA load, Center for Disease Control and Prevention (CDC) HIV risk group,
coinfections, prior acquired immune deficiency syndrome (AIDS)defining conditions],
infectious disease status (hepatitis B virus, hepatitis C virus, Helicobacter pylori, malaria,
EBV), and treatment information [treatment type, tumour response, treatment dates,
highly active antiretroviral therapy (HAART) treatment status]. All dates and other
personally indefinable information were obfuscated prior to submission to the Office of
Cancer Genomics Data Coordinating Center in extensible markup language (XML) and
tabdelimited formats.2
Consensus pathology review
Consensus anatomic site classification
Anatomic site classification was performed by consensus review based on data reported
for sites of disease involvement. Many of the African cases did not have assessment of
bone marrow, cerebrospinal fluid, or total body imaging. Cases were classified into the
following categories: (A) Disseminated disease with no bone marrow (BM) and/or central
nervous system (CNS) involvement, documented disease involvement; (B) Headonly,
disease involvement of jaw with or without adjacent nodal involvement; (C)
Intraabdominal disease, disease confined to abdominal organs with or without abdominal
lymph node involvement; (D) Disseminated disease, disease involvement on both sides of
2https://ocg.cancer.gov/programs/cgci/datamatrix
48
diaphragm, but no documented BM or CNS involvement; (E) Unknown, insufficient data to
classification anatomic involvement.
2.3.2 Sample processing and nucleic acid extraction
Frozen specimens were shipped to and from Nationwide Children’s Hospital (Columbus,
OH) using a cryoport that maintained an average temperature of less than 180°C (SOP
#308). A top and bottom histologic section were cut from tumour and uninvolved tissue (if
it was to be used for healthy tissue control) for pathologic quality control review. These
were either stained with H&E or WrightGiemsa and imaged at 40X using an Aperio AT
Turbo or Aperio AT2 scanner. Images were reviewed by a boardcertified pathologist to
confirm that the tumour specimen was histologically consistent with BL, and that
uninvolved specimens contained no tumour cells. The tumour sections were required to
contain a minimum of 50% tumour cell nuclei, and less than 50% necrosis for inclusion in
the study. Nearly all samples had less than 20% necrosis.
RNA and DNA were extracted from fresh frozen (FF) (SOP #305) and FFPE tumour (SOP
#315316) and normal tissue specimens (mainly blood or granulocytes) using a
modification of the DNA/RNA AllPrep kit (Qiagen). Frozen samples were homogenized
and applied to a Qiagen DNA column, and FFPE samples were deparaffinized and
applied to a Qiagen FFPE DNA column. The flowthrough from the Qiagen DNA column
was processed using a mirVana miRNA Isolation Kit (Ambion) for FF tissues, and a High
Pure miRNA Kit (Roche) for FFPE tissues. This latter step generated RNA preparations
that included RNA <200 nt suitable for miRNA analysis. DNA was extracted from blood
using the QiaAmp blood midi kit (Qiagen; SOP #307).
DNA was quantified by PicoGreen assay, and was resolved by 1% agarose gel
electrophoresis to confirm high molecular weight fragments. A custom Sequenom single
nucleotide polymorphism (SNP) panel or the AmpF/STR Identifiler (Applied Biosystems)
was utilized to verify tumour DNA and germline DNA were derived from the same patient.
One hundred nanograms of each tumour and normal DNA were sent in duplicate to
Qiagen for REPLIg whole genome amplification using a 100 µg reaction scale. RNA was
quantified by measuring Abs260 with a ultraviolet spectrophotometer, and integrity was
49
measured using the RNA6000 nano assay (Agilent) to determine the RNA Integrity
Number for FF samples or DV200 for FFPE samples.
For inclusion in the discovery set, a tumour needed to pass pathology consensus review
(University of Nebraska Medical Center, Omaha, NE) and the specimen pathology quality
control review (Nationwide Children’s Hospital, Columbus, OH). In addition, a primary
tumour and a matched germline (blood, buccal, or uninvolved tissue) sample needed to
pass the following metrics: a minimum of 0.7 µg of DNA from FF or 0.25 µg of DNA from
FFPE, and 3 µg RNA from FF or 1 µg RNA from FFPE. The minimum RNA integrity
metrics were an RNA Integrity Number above 7.0 or DV200 above 30. Cases that did not
meet these metrics were included in the validation set if there was at least 0.7 µg of DNA
from the primary tumour available for DNA sequencing. Tumour RNA sequencing was
also performed for validation cases if there was sufficient RNA material.
2.3.3 Library construction and sequencing
Whole genome sequencing of fresh frozen samples
WGS libraries were constructed from DNA provided by Nationwide Children’s Hospital
(Columbus, OH) using a polymerase chain reaction (PCR)free protocol. To minimize
library bias and coverage gaps associated with PCR amplification of high GC or ATrich
regions, a version of the TruSeq DNA PCRfree kit (E68756877BGSC, New England
Biolabs) was implemented, automated on a Microlab NIMBUS liquid handling robot
(Hamilton). Briefly, 500 ng of genomic DNA was arrayed in a 96well microtitre plate and
subjected to shearing by sonication (Covaris LE220). Sheared DNA was endrepaired and
size selected using paramagnetic PCRClean DX beads (C1003450, Aline Biosciences)
targeting a 300400 bp fraction. After 3’ Atailing, full length TruSeq adapters were ligated.
Libraries were purified using paramagnetic (Aline Biosciences) beads. PCRfree genome
library concentrations were quantified using a qPCR Library Quantification kit (KAPA,
KK4824) prior to sequencing with pairedend 150 base reads on the Illumina HiSeqX
platform using V4 chemistry according to manufacturer recommendations.
50
Whole genome sequencing of formalinfixed, paraffinembedded samples
A 96well library construction protocol was performed from FFPE tissue extracted
genomic DNA provided by Nationwide Children’s Hospital (Columbus, OH). Since DNA
extracted from FFPE tissue will be damaged by the fixation process and prolonged
storage in nonideal conditions, variable DNA quality across the collection is expected
with some highly degraded samples. DNA was normalized to 500 ng in a volume of 62 μL
elution buffer (Qiagen) and transferred into a microTUBE plate for shearing on an LE220
(Covaris) acoustic sonicator using the conditions: Duty Factor, 20%; Peak Incident Power,
450W; Cycle per burst, 200; Duration, 2 x 60 seconds with an intervening spin. The profile
of sheared FFPE DNA extracted by the Qiagen Allprep DNA/RNA FFPE protocol has a
dominant DNA peak in the size range between 300 and 400 bp. To improve library quality
of FFPEderived DNA, solid phase reversible immobilization (SPRI) beadbased size
selection was performed before library construction to remove smaller DNA fragments
from highly degraded FFPE DNAs. If not removed early in the library construction
process, these smaller fragments would otherwise dominate the final amplified library.
FFPE DNA damage and endrepair and phosphorylation were combined in a single
reaction using an enzymatic premix (NEB), then bead purified using a 0.8:1
(bead:sample) ratio to remove small FFPE fragments. Repaired DNA fragments were next
Atailed for ligation to pairedend, partial Illumina sequencing adapters then purified twice
with SPRI beads (1:1 ratio). Fulllength adaptered products were achieved by performing
8 cycles PCR with primers introducing faulttolerant hexamer “barcodes” allowing
multiplexing of libraries. Indexed PCR products were double purified with 1 1:1 bead ratio.
Concentration of final libraries was determined using size profiles obtained from a high
sensitivity Caliper LabChip GX together with QuantiT (Invitrogen) quantification.
Strandspecific ribosomal RNA depletion RNA sequencing
RNAseq libraries were constructed from RNA provided by Nationwide Children’s Hospital
(Columbus, OH) using a strandspecific ribosomal depletion protocol. To remove
cytoplasmic and mitochondrial ribosomal RNA (rRNA) species from total RNA NEBNext
rRNA Depletion Kit for Human/Mouse/Rat was used (NEB, E6310X). Enzymatic reactions
were setup in a 96well plate (Thermo Fisher Scientific) on a Microlab NIMBUS liquid
51
handler (Hamilton Robotics, USA). 100 ng of DNase I treated total RNA in 6 µL was
hybridized to rRNA probes in a 7.5 µL reaction. Heatsealed plates were incubated at
95°C for 2 minutes followed by incremental reduction in temperature by 0.1°C per second
to 22°C (730 cycles). The rRNA in DNA hybrids were digested using RNase H in a 10 µL
reaction incubated in a thermocycler at 37°C for 30 minutes. To remove excess rRNA
probes (DNA) and residual genomic DNA contamination, DNase I was added in a total
reaction volume of 25 µL and incubated at 37°C for 30 minutes. RNA was purified using
RNA MagClean DX beads (Aline Biosciences, USA) with 15 minutes of binding time, 7
minutes clearing on a magnet followed by two 70% ethanol washes, 5 minutes to air dry
the RNA pellet and elution in 36 μL DEPC water. The plate containing RNA was stored at
80°C prior to cDNA synthesis.
Firststrand cDNA was synthesized from the purified RNA (minus rRNA) using the
Maxima H Minus First Strand cDNA Synthesis kit (ThermoFisher, USA) and random
hexamer primers at a concentration of 8ng/µL along with a final concentration of 0.4 µg/µL
Actinomycin D, followed by PCR Clean DX bead purification on a Microlab NIMBUS robot
(Hamilton Robotics, USA). The second strand cDNA was synthesized following the
NEBNext Ultra Directional Second Strand cDNA Synthesis protocol (NEB) that
incorporates deoxyribose uridine triphosphate (dUTP) in the deoxyribose nucleoside
triphosphate (dNTP) mix, allowing the second strand to be digested using USERTM
enzyme (NEB) in the postadapter ligation reaction and thus achieving strand
specificity.
cDNA was fragmented by Covaris LE220 sonication for 130 seconds (2 x 65 seconds) at
a “Duty cycle” of 30%, 450W Peak Incident Power and 200 Cycles per Burst in a 96well
microTUBE Plate (P/N: 520078) to achieve 200250 bp average fragment lengths. The
pairedend sequencing library was prepared following the BC Cancer Agency Genome
Sciences Centre strandspecific, platebased library construction protocol on a Microlab
NIMBUS robot (Hamilton Robotics, USA). Briefly, the sheared cDNA was subject to
endrepair and phosphorylation in a single reaction using an enzyme premix (NEB)
containing T4 DNA polymerase, Klenow DNA Polymerase and T4 polynucleotide kinase,
incubated at 20°C for 30 minutes. Repaired cDNA was purified in 96well format using
PCR Clean DX beads (Aline Biosciences, USA), and 3’ Atailed (adenylation) using
52
Klenow fragment (3’ to 5’ exo minus) and incubation at 37°C for 30 minutes prior to
enzyme heat inactivation. Illumina PE adapters were ligated at 20°C for 15 minutes. The
adapterligated products were purified using PCR Clean DX beads, then digested with
USERTM enzyme (1 U/µL, NEB) at 37°C for 15 minutes followed immediately by 13
cycles of indexed PCR using Phusion DNA Polymerase (Thermo Fisher Scientific
Inc. USA) and Illumina’s PE primer set. PCR parameters: 98°C for 1 minute followed by
13 cycles of 98°C 15 seconds, 65°C 30 seconds and 72°C 30 seconds, and then 72°C 5
minutes. The PCR products were purified and sizeselected using a 1:1 PCR Clean DX
bead ratio (twice), and the eluted DNA quality was assessed with Caliper LabChip GX for
DNA samples using the High Sensitivity Assay (PerkinElmer, Inc. USA) and quantified
using a QuantiT dsDNA High Sensitivity Assay Kit on a Qubit fluorometer (Invitrogen)
prior to library pooling and sizecorrected final molar concentration calculation for Illumina
HiSeq2500 sequencing with pairedend 75 base reads.
miRNA sequencing
miRNA sequencing (miRNAseq) libraries were constructed from 1 µg total RNA provided
by Nationwide Children’s Hospital (Columbus, OH) using a platebased protocol
developed at the British Columbia Cancer, Genome Sciences Centre (BCGSC). Negative
controls were added at three stages: elution buffer was added to one well when the total
RNA was loaded onto the plate, water to another well just before ligating the 3’ adapter,
and PCR brew mix to a final well just before PCR amplification. A 3’ adapter was ligated
using a truncated T4 RNA ligase2 (NEB Canada, cat. M0242L) with an incubation at 22°C
for 1 hour. This adapter is an adenylated, singlestranded DNA with the sequence 5’
/5rApp/ ATCTCGTATGCCGTCTTCTGCTTGT /3ddC/, which selectively ligates to
miRNAs. An RNA 5’ adapter was then ligated, using T4 RNA ligase (Ambion USA, cat.
AM2141) and adenosine triphosphate (ATP), and was incubated at 37°C for 1 hour. The
sequence of the single strand RNA adapter is 5’
GUUCAGAGUUCUACAGUCCGACGAUCUGGUCAA 3’.
Upon completion of adapter ligation, 1st strand cDNA was synthesized using Superscript
II Reverse Transcriptase (Invitrogen, cat.18064 014) and RT primer (5’
CAAGCAGAAGACGGCATACGAGAT 3’). Firststrand cDNA provided the template for the
53
final library PCR, into which index sequences were introduced to enable libraries to be
identified from a sequenced pool that contains multiple libraries. Briefly, a PCR brew mix
was made with the 3’ PCR primer (5’ CAAGCAGAAGACGGCATACGAGAT 3’), Phusion
Hot Start High Fidelity DNA polymerase (NEB Canada, cat. F540L), buffer, dNTPs and
dimethyl sulfoxide (DMSO). The mix was distributed evenly into a new 96well plate. A
Microlab NIMBUS robot (Hamilton Robotics, USA) was used to transfer the PCR template
(1st strand cDNA) and indexed 5’ PCR primers into the brew mix plate. Each indexed 5’
PCR primer, 5’
AATGATACGGCGACCACCGACAGNNNNNNGTTCAGAGTTCTACAGTCCGA 3’,
contains a unique sixnucleotide ‘index’ (shown here as N’s), and was added to each well
of the 96well PCR brew plate. PCR was performed at 98°C for 30 seconds, followed by
15 cycles of 98°C for 15 seconds, 62°C for 30 seconds and 72°C for 15 seconds, and
finally a 5 minute incubation at 72°C. Library qualities were assessed across the whole
plate using a Caliper LabChipGX DNA chip. PCR products were pooled and size selected
to remove larger cDNA fragments and smaller adapter contaminants, using a 96channel
automated size selection robot that was developed at the BCGSC. After size selection,
each pool was ethanol precipitated, quality checked using an Agilent Bioanalyzer
DNA1000 chip and quantified using a Qubit fluorometer (Invitrogen, cat. Q32854). Each
pool was diluted to a target concentration for cluster generation and loaded into a single
lane of an Illumina HiSeq2500 flow cell. Clusters were generated, and lanes were
sequenced with a 31nt main read for the insert and a 7nt read for the index.
Targeted sequencing by custom hybridization capture
Targeted sequencing libraries were constructed from DNA provided by Nationwide
Children’s Hospital (Columbus, OH) using a custom hybridization capture protocol. 50 ng
from each of 20 or 21 whole genome libraries was pooled prior to custom capture using
Agilent SureSelect XT Custom probes (4.8 Mbp) targeting 74,809 human and EBV
features.3 The features included the following: exons of recurrently mutated genes with
the exception of known targets of passenger mutations (e.g. TTN, mucin genes); exons of
several known DLBCL genes; exons of previously reported BL genes not found mutated
3https://cgcidata.nci.nih.gov/PreRelease/BLGSP/targeted_capture_sequencing/DESIGN/
54
in this data; whole gene bodies for DDX3X (chrX:4133277541364961, GRCh38) and
FBXO11 (chr2:4778263947907718); whole gene bodies and flanking regions for ID3
(chr1:2355791823657826) and BCL6 (chr3:187718649188265924); the recurrently
rearranged region surrounding MYC (chr8:127242368129788153); and noncoding
mutation peaks (details below). The pooled libraries were hybridized to the RNA probes at
65°C for 24 hours. Following hybridization, streptavidincoated magnetic beads (Dynal,
MyOne) were used for custom capture. Postcapture material was purified on MinElute
columns (Qiagen) followed by postcapture enrichment with 10 cycles of PCR using
primers that maintain the libraryspecific indices. Pooled libraries were sequenced on an
Illumina HiSeq 2500 instruments with v4 chemistry generating 125 base pairedend
reads.
2.3.4 Data analysis
Sequencing read alignment
WGS and targeted sequencing reads were aligned to the human reference genome
(GRCh38) with BWAMEM (version 0.7.6a; parameters: M).186,187 The human reference
genome that was used is a version of GRCh38 without alternate contigs that includes the
Epstein–Barr viral genome (GenBank accession AJ507799.2), which can be
downloaded.4 Read duplicate marking was done using sambamba (version 0.5.5).188
RNAseq reads were pseudoaligned using Salmon (version 0.8.2; details below).189 The
RNAseq reads were also aligned to the reference genome indicated above using the
JAGuaR pipeline.190 Tumour and matched normal WGS data for 15 cases from the ICGC
were obtained through a Data Access Compliance Office (DACO)approved project using
a virtual instance on the Cancer Genome Collaboratory.97,191 The ICGC WGS reads were
realigned using the above parameters.
Tumour EBV status and genome type
Owing to missing data from most cases, I devised a computational approach to directly
infer tumour EBV status and genome type from tumour WGS and RNAseq data. To
determine tumour EBV status, the fraction of reads aligning to the EBV genome was
4http://www.bcgsc.ca/downloads/genomes/9606/hg38_no_alt/bwa_0.7.6a_ind/genome/
55
calculated using Samtools (version 1.6).186 Tumours were considered to be EBVpositive
when the EBV fraction of WGS reads was greater than 0.00006 (calculated from the
fraction represented by the EBV genome in the reference genome) and the number of
RNAseq reads mapped to the EBER1 (chrEBV:66296795) and EBER2
(chrEBV:69567128) loci in the JAGuaRbased alignments was greater than 250. There
were no cases with discordant EBV statuses inferred from the WGS and RNAseq data.
Although EBER expression was not quantified for the ICGC tumours because their
RNAseq data were not used in this project, they were all classified as EBVnegative
according to their WGS data, which is consistent with the EBV status reported by the
MMMLseq project. The minimum fraction of EBV reads was 0.01 for samples that
underwent targeted sequencing to account for the different ratio of human and EBV
genomic regions due to hybridization capture. EBV genome type was inferred for
EBVpositive tumours by comparing the counts for 21mers that are unique to either EBV
type 1 (GenBank accession NC_007605.1) or type 2 (GenBank accession NC_009334.1).
Kmer counting was performing on tumour WGS reads aligned to the EBV genome using
Jellyfish (version 2.2.6).192 EBV genome type was inferred to be type 1 or type 2 if the
count ratio of EBV type 1–specific kmers to EBV type 2–specific kmers was greater than
or lesser than 1, respectively.
Simple somatic mutations
The Strelka workflow (version 1.0.14) was used to call SSMs. The default configuration
for data aligned with bwa (strelka_config_bwa_default.ini) was used with the exception of
filtering SNVs with a minimum quality somatic score (QSS) of 25 (default 15). For SNVs
and indels, reference and alternate allele counts were taken from the Strelka output
variant call format (VCF) file.193 SNVs and indels were annotated using vcf2maf (version
1.6.12) and Ensembl Variant Effect Predictor (release 86).194 Transcript selection for
annotation was performed by vcf2maf with the following exception. Noncanonical
transcripts were instead selected if they were nonsynonymously mutated more
commonly than the canonical transcript (minimum increase of two affected cases). SNVs
and indels were further filtered for a minimum alternate allele count of six and a minimum
variant allele fraction (VAF) of 10% and 20% for FF and FFPE tumours, respectively.
Tumours with a median VAF below 25% were omitted from subsequent analyses due to
56
either excessive noise or low predicted tumour content. The same pipeline was used for
detecting SNVs and indels in the targeted validation sequencing data, with the exception
that depth filters were disabled for Strelka (isSkipDepthFilters = 1).
Significantly mutated genes
Considering only SNVs and indels, significantly mutated genes were identified using an
ensemble approach integrating four methods: MutSigCV, OncodriveFM, OncodriveFML,
and OncodriveCLUST.169,195–197 Mutations were lifted over from GRCh38 to GRCh37
using CrossMap (version 0.2.5) along with the “hg38ToHg19” chain file provided by the
UCSC Genome Browser.198,199 Lifting over variants was necessary because some of the
methods listed above rely on GRCh37 reference data. For consistency, the liftedover
mutations based on GRCh37 served as input for all methods. Nonsynonymous mutations
were defined as those with one of the following values in the Mutation Annotation Format
(MAF) file Variant_Classification field, as annotated by vcf2maf: Splice_Site,
Nonsense_Mutation, Frame_Shift_Del, Frame_Shift_Ins, Nonstop_Mutation,
Translation_Start_Site, In_Frame_Ins, In_Frame_Del, or Missense_Mutation. To minimize
noise, I only considered genes deemed significant (Qvalue < 0.1) by two or more
methods.
BLassociated genes
I defined BLGs as any gene deemed significantly mutated in this study or previously
described as recurrently mutated in BL with at least five affected patients in the discovery
cohort. Only nonsynonymous simple somatic mutations and copy number variations
(minimum size 10 kbp) were considered. To avoid considering mainly largescale events,
copy number variations affecting a BLG were required to be relatively small with a median
size of 10 Mbp or less. For each BLG, additional cryptic splicing variants (with support for
aberrant splicing in RNAseq data), structural variations, and copy number variations
were manually curated.
Noncoding mutation peaks
Pvalues were empirically determined for each peak by comparing its mutation rate with
an empirical distribution produced by calculating the mutation rates of identically sized
57
regions randomly sampled across the genome. The smallest and largest mutated position
on each chromosome were used to determine the range of positions available for
sampling with replacement. Positions that overlapped gaps in the reference genome such
as centromeres and telomeres were excluded. A “pseudopeak” was created from a
sampled position by extending each side to create regions with the same size as the
given mutation peak. The mutation rate of 100,000 such pseudopeaks was calculated to
generate the empirical null distribution of mutation rates genomewide. The empirical
Pvalue was calculated as the number of pseudopeaks with a higher mutation rate than
the given mutation peak divided by 100,000. Given that each mutation peak is tested
against independent null distributions, the Pvalues did not require multiple test
correction. All peaks had empirical Pvalues < 0.001 and were thus significantly mutated
above background rates.
Enrichment for AICDAmediated mutations
A bespoke algorithm was implemented in Python (version 3.6.1) to determine whether
certain regions, such as significantly mutated genes and noncoding mutation peaks,
were enriched for SNVs and indels consistent with AICDAmediated mutagenesis.200,201
Enrichment for putative AICDAmediated mutations in a given region was measured using
two binomial exact tests. First, the observed number of mutations affecting AICDA
recognition sites (number of successes), defined as regions that fit the AICDA motif
(RGYW), was compared to the expected number of such mutations, which was calculated
from the region’s mutation rate (probability of success) and the number of bases that
overlap AICDA recognition sites (number of trials). Second, the observed number of
mutations affecting the guaninecytosine pair targeted by AICDA (number of successes)
was compared to the expected number of such mutations, which was calculated from the
region’s mutation rate of guaninecytosine pairs (probability of success) and the number
of target guaninecytosine pairs in AICDA recognition sites (number of trials). Mutation
rates were calculated using the effective region size, which is equal to the product of the
region size and the cohort size. The effective region size ensures that the observed
number of mutations (number of successes) is never higher than the region size (number
of trials). Care was taken to avoid doublecounting mutations if they overlapped more than
one AICDA recognition site. This process was repeated for all regions of interest. The
58
regions for BLassociated genes were based on the transcripts that were affected by
nonsynonymous as opposed to entire gene bodies. The entire regions of noncoding
mutation peaks were considered. The inhouse program also annotated mutations based
on whether they overlapped an AICDA recognition site.
De novo mutational signatures
Mutational signatures were discovered using the previously described framework by
Alexandrov et al..202 I summarized somatic SNVs based on their mutational subtype, 5’
context, and 3’ context. This resulted in a mutation catalog matrix of 96 SNV classes for
each sample. I performed nonnegative matrix factorisation on the mutation catalog to
discover mutational signatures within the entire cohort. Signature stability was computed
by bootstrap resampling over 1000 total iterations (10 iterations in each of 100 cores).
The optimal nsignature solution, nopt, which simultaneously maximised signature stability
and minimised the Frobenius reconstruction error, was automatically selected,
nopt = argminn
(Rn − min(R)
max(R) − min(R)− Sn − min(S)max(S) − min(S)
),
where R and S are the vectors containing reconstruction errors and stability of each
nsignature solution, and Rn and Sn are the reconstruction error and stability of the
nsignature solution. This approach determined that the foursignature solution was
optimal. To determine matches to known mutational signatures, cosine similarity metrics
were computed against the 30 COSMIC reference mutational signatures. Where more
than one signature matched to a single COSMIC signature, the highest similarity match
was chosen and the remaining signatures were matched to the next most similar
COSMIC signature. For each nsignature solution, the Pearson correlation was calculated
between the age at diagnosis for each case and the predicted number of mutations
attributable to de novo signatures associated with age (COSMIC reference signatures 1
and 5), taking the maximum correlation if both COSMIC signatures were paired. Similarly,
for each nsignature solution, the Pearson correlation was calculated between AICDA
expression for each case and the predicted number of mutations attributable to the de
novo signature associated with AICDA activity (COSMIC reference signature 9).
59
Somatic structural variations
Somatic SVs were detected using the Manta pipeline (version 1.1.0) in paired
tumournormal mode using default parameters with the exception of a minimum somatic
score (SOMATICSCORE) of 45 (default 30).203 In FFPE samples, any inversions smaller
than 500 bp were considered noise and ignored. Variant allele fractions were calculated
from the reference and alternate allele counts reported in the Manta output variant call
format file. These files were converted to BEDPE format using the vcftobedpe tool from
the svtools package (version 0.3.2, commit 6d7b6ec8).204 SVs that overlapped any of the
significantly mutated genes were manually curated for inclusion as nonsynonymous
mutations. IGMYC translocations were identified as being any SV that met the following
conditions: (1) one breakpoint was near MYC (chr8:126393182130762146); (2) the
breakpoint near MYC was oriented such that exons 2 and 3 are included in the
rearrangement; (3) the other breakpoint was near an immunoglobulin heavy or light chain
locus, namely IGH (chr14:104589639107810399), IGK (chr2:8799951890599757), or
IGL (chr22:2103146523905532); and (4) the highestscoring translocation was selected
in the event of multiple candidate SVs. Tumours in which Manta failed to detect a
translocation that met the above criteria were manually inspected for such events, which
revealed IGMYC rearrangements in all remaining cases.
Somatic copy number variations
Sequenza was used to call somatic CNVs in tumournormal pairs.205 Sequenza
bam2seqz (parameters: –qlimit 30) generated the SEQZ files, which were then binned
using Sequenza seqzbinning (parameters: w 300 s). To eliminate noise, the putative
germline heterozygous positions identified by Sequenza were postfiltered to retain only
those represented in dbSNP (downloaded 20170403) “common all” single nucleotide
polymorphisms. Using bedtools intersect (parameters: wa), germline heterozygous
positions were removed if they overlapped gaps in the reference genome (e.g.
centromeres) or segmental duplications, which were obtained from the UCSC Table
Browser.206,207 Previously, the segmental duplications were merged if they overlapped
one another using bedtools merge, then filtered for a minimum size of 10 kbp, and
subsequently merged again using bedtools merge (parameters: d 10000). The Sequenza
60
R package was used to load the binned SEQZ data, fit a model for cellularity and ploidy,
and generate CNV segments.205 Sequenza was made aware of the sex of each case to
properly handle CNVs on the sex chromosomes. To simplify model fitting and avoid
incorrect local optima, ploidy and cellularity options were restricted as follows. Ploidy was
limited to the range between 1.8 and 2.5. Cellularity was restricted to an estimate of
tumour content derived from the VAF of SNVs and indels, defined as twice the VAF
corresponding to the first local density maximum below 50%.
Gene expression quantification
The tximport Bioconductor R package was used to summarize transcriptlevel read
counts at the gene level.208 The DESeq2 Bioconductor R package was used to correct
the read counts for library size and to perform a variancestabilizing data
transformation.209 These variancestabilized expression values were used for statistical
tests that require homoskedastic data.
miRNA expression profiling was performed separately on the miRNA sequencing data
using Canada’s Michael Smith Genome Sciences Centre miRNA processing pipeline,
which was used for The Cancer Genome Atlas project.210 The analysis was done using
miRBase release 21.211–215
Clonal Bcell receptors
MiXCR (version 2.1.3) was used to identify immunoglobulin heavy and light chain clones
from the RNAseq and WGS data as per the standard pipeline described in their
documentation.184,185 The MiXCR pipeline was also run on 323 DLBCL tumour samples
that underwent a strandspecific poly(A)selection RNAseq protocol.166 All RNAseq
reads were aligned using “mixcr align” (parameters: p rnaseq
OallowPartialAlignments=true) while for the WGS data, only reads originating from the
immunoglobulin regions (chr2:8866807890584447, chr14:105548159107030529, and
chr22:2189731823046831) or unmapped reads were aligned using “mixcr align”
(parameters: p rnaseq OallowPartialAlignments=true
OvParameters.geneFeatureToAlign=VGeneWithP). Two rounds of contig assembly was
performed using “mixcr assemblePartial” followed by clone assembly using “mixcr
61
assemble”. Clones were exported using “mixcr exportClones” (parameters: o t) options
to exclude any clones with outofframe sequences or stop codons. Clonal fraction was
calculated for heavy and light chains separately. Dominant clones in the RNAseq data
were defined as having a clonal fraction of at least 30% with a minimum of 30 supporting
reads. For the WGS analysis, dominant clones were defined as having the greatest clonal
fraction with at least two supporting reads. The topscoring V, D, J and C genes were
selected for each clone when multiple genes were possible.
Data and statistical analyses
Data and statistical analyses were done using the R statistical programming language
(version 3.4.2).216 Mann–Whitney U tests and Fisher’s exact tests were used where
appropriate with the wilcox.test and fisher.test functions in R, respectively. Correlation
between continuous variables was tested using Pearson’s productmoment correlation
coefficient with the cor.test function in R. Mutual exclusivity between mutations in different
genes was evaluated using the CoMEt exact test with the comet_exact_test function from
the cometExactTest package.174,175 Multiple hypothesis correction was performed using
the Benjamini–Hochberg method with the p.adjust function in R. Pvalues below 5% and
Qvalues (corrected Pvalues) below 10% were considered significant. Significantly used
R packages are listed below with their respective versions and citations.
Package Version References
argparse 1.1.1 217
bedr 1.0.4 218
biomaRt 2.32.1 219, 220
bookdown 0.7 221, 222
broom 0.4.3 223
circlize 0.4.1 224
cometExactTest 0.1.5 175
cowplot 0.9.3 225
data.table 1.11.4 226
DESeq2 1.16.1 227
dplyr 0.7.4 228
62
Package Version References
feather 0.3.1 229
flextable 0.4.4 230
forcats 0.2.0 231
GenomicRanges 1.28.6 232
ggbeeswarm 0.6.0 233
ggExtra 0.8 234
ggplot2 3.1.0 235
ggrepel 0.7.0 236
ggsignif 0.4.0 237
ggstance 0.3 238
Gviz 1.20.0 239
knitr 1.2 240, 241, 242
lsa 0.73.1 243
maftools 1.4.20 244
MassSpecWavelet 1.42.0 245
matrixStats 0.53.0 246
pheatmap 1.0.8 247
Publish 2018.04.17 248
purrr 0.2.5 249
RColorBrewer 1.12 250
readr 1.1.1 251
readxl 1.0.0 252
robustbase 0.927 253, 254
sequenza 2.1.2 205
tidyverse 1.1.1 255
tximport 1.4.0 256
viridis 0.4.1 257
63
Chapter 3
EBV defines a BL entity with distinctmolecular and pathogenicfeatures
3.1 Introduction
Our understanding of the genetic landscape of cancer has grown considerably over the
last few decades. We have also gained a concomitant appreciation of the intertumour
and intratumour heterogeneity that respectively exist between and within patient
tumours. This genetic heterogeneity has many clinical implications, most notably the
interplay between genetic features and treatment response or resistance. This newfound
appreciation has spurred the strategy of precision oncology, whereby patients are treated
based on the unique genetic makeup of their respective tumours. The goal of this
approach is simple: by taking into account the molecular features driving each tumour,
clinicians will be more successful in curing cancer. In practice, precision oncology hinges
on detailed knowledge of the mechanisms underpinning pathogenesis. Without this
knowledge, precision medicine would not be possible due to a lack of clinically actionable
(i.e. drugtargetable) genetic alterations.
On the surface, BL appears to be a poor candidate for precision medicine by virtue of
already being curable in most cases by standardofcare (i.e. intensive chemotherapy).
However, this view does not account for the toxicity of current treatment regimens geared
for BL, which severely degrades the quality of life for patients and can lead to additional
malignancies later in life. Additionally, this view is biased by the cure rates for children in
countries where proper supportive care is readily available.45 In reality, BL remains fatal
for children in subSaharan Africa, in part because healthcare delivery systems lack
capacity to administer intensive chemotherapy not to mention the poor outcome seen in
older patients, even in developed countries.48–51 When considering these issues, it
64
becomes clear that tailoring treatments for molecular features specific to BL presents an
opportunity to reduce both mortality and treatment morbidity in this patient population,
particularly those affected by BL in developing countries where this disease is particularly
common.
Currently, BL is classified based on geographic origin and immunocompetence: endemic
for cases diagnosed in malariaendemic areas, sporadic for cases diagnosed elsewhere,
and immunodeficiencyassociated for immunocompromised cases irrespective of locale.
While the endemic and sporadic subtypes differ from one another at the epidemiological
level, their definition has little basis in biology. Admittedly, both subtypes still have
important differences (e.g. tumour growth site), but considering disease pathogenesis
when stratifying patients is key for understanding treatment response and paving the way
for precision medicine. Compared to other cancers that have transitioned to molecularly
defined subtypes, the de facto classification system for BL appears outdated. Accordingly,
I hypothesized that there are common molecular features that more accurately explain
some of the observed differences in BL biology and clinical presentation. Specifically, I
hypothesized that the presence of EBV in BL tumours is more relevant for disease
aetiology than the geographic origin of the tumour. Finally, I also hypothesized that
additional molecular differences exist among EBVpositive tumours on the basis of EBV
genome type, namely type 1 and type 2.
In this chapter, I test these hypotheses by investigating the same BL dataset presented in
Chapter 2. Unlike previous studies, my cohort comprised patients representing two
common clinical variants, namely endemic and sporadic BL, whose samples were
processed using the same methodology, thus limiting technical sources of variation. The
high correlation between clinical variant and tumour EBV status introduced an analytical
challenge. Recall from Chapter 1 that most endemic cases are EBVpositive while most
sporadic cases are EBVnegative. However, this cohort included eight EBVnegative
endemic BLs and four EBVpositive sporadic BLs, which I termed “discordant” BL cases.
These discordant cases afforded an opportunity to distinguish between the features
associated with geography versus tumour EBV status.
65
Through this analysis, I found a number of mutational differences that are more strongly
associated with tumour EBV status than clinical variant. Despite having greater mutation
burden genomewide, EBVpositive tumours harboured fewer driver mutations,
particularly those affecting genes with roles in apoptosis such as TP53. The mutational
signatures I detected in BL genomes suggested that the increased mutation frequency in
EBVpositive tumours could be explained by defects in DNA mismatch repair and
elevated AICDA activity. Indeed, the presence of EBV was the most important variable in
determining AICDA expression level and aberrant somatic hypermutation. This level of
heterogeneity in BL has been previously underappreciated and presents new therapeutic
opportunities.
3.2 Results
3.2.1 Fewer driver mutations in EBVpositive BL despite mutationburden
Due to differences in sequencing coverage and tumour content, the mutation burden in BL
cannot be readily compared with other cancer cohorts. While downsampling sequencing
data was a possibility, I preferred to maintain sensitivity as high as possible. A comparison
of the mutation load among the BLGSP tumours, which had similarly high tumour content
and sequencing coverage, revealed one clear outlier (Figure 3.1A). I excluded case
BLGSP710600142 because its tumour genome was relatively hypermutated with
48,994 SSMs. The remaining BLGSP tumours featured 5,666 SSMs on average (range
1,481–14,115) and mutations from these cases were used for subsequent analyses.
Given the considerable range in mutation load among the remaining cases, I investigated
whether the number of mutations varied with any of the available patient or tumour
metadata (Figure 3.1B). Indeed, genomewide mutation burden was significantly
correlated with both geographic origin and tumour EBV status (Qvalues < 0.1,
Mann–Whitney U test). Based on median mutation counts, endemic and EBVpositive
tumours have 1.96 and 1.75fold more mutations than sporadic and EBVnegative
mutations, respectively. Similar differences were found when I separately considered
mutations within or outside noncoding mutation peaks described in Chapter 2. Lastly, the
same pattern was observed among nonsynonymous mutations affecting all
66
proteincoding genes. Hence, one could speculate that the greater mutation burden seen
in endemic and EBVpositive tumours could expedite the accumulation of driver
mutations.
To pursue this analysis further, I counted the number of putative driver mutations in each
case and made similar comparisons based on clinical variants and tumour EBV status
(Figure 3.2). Here, I defined putative driver mutations as nonsynonymous mutations (i.e.
SSMs, CNVs, and SVs) affecting any BLG, as determined in Chapter 2. Surprisingly,
despite having more mutations genomewide, EBVpositive tumours had significantly
fewer driver mutations (Qvalue = 0.0021, Mann–Whitney U test). On the other hand,
sporadic and endemic tumours lacked any difference in this regard (Qvalue = 0.368). In
other words, in the absence of EBV, there is a an elevated accumulation of driver
mutations, presumably compensating for the oncogenic role played by the virus. On the
other hand, I saw no difference in the number of driver mutations between tumours
infected with EBV type 1 and those infected with EBV type 2 (Qvalue = 0.815),
suggesting that EBV genome type is not as important, if at all, for BL
tumourigenesis.
3.2.2 Variation in mutation burden explained by mutational signatures
Considering the observed differences in mutation burden, I asked whether these could be
explained by the de novo mutational signatures identified in Chapter 2. For each sample, I
estimated the number of mutations contributed by each signature based on its exposure,
a measure of signature prevalence (Figure 3.3). Comparing BL genomes on the basis of
tumour EBV status or geographic origin, I found no difference in the number of mutations
related to BL signature A, which was associated with age. Similarly, no difference was
observed between EBV type 1–infected tumours and EBV type 2–infected tumours for
any of the signatures. On the other hand, a significantly higher representation of
mutations linked to BL signatures B, C, and D was found in EBVpositive and endemic
tumours (Qvalues < 0.1, Mann–Whitney U test). In other words, these three signatures
combined can account for the observed difference in genomewide mutation load. While
little is known about the aetiology underlying BL signature B, BL signatures C and D were
associated with defective DNA mismatch repair and AICDA activity, respectively. These
67
0
5
10
15
20
0 10000 20000 30000 40000 50000
Mutation burden (genome−wide)
Fre
quen
cyA
*
*
*
*
*
*
*
*
Clinical variant EBV status EBV type
All m
utationsV
ariants outsidem
utation peaksV
ariants insidem
utation peaksN
on−synonym
ousm
utations
Endemic BL Sporadic BL EBV−positive EBV−negative EBV type 1 EBV type 2
4000
8000
12000
16000
4000
8000
12000
16000
0
100
200
300
400
50
100
Mut
atio
n bu
rden
B
Figure 3.1: Genomewide mutation burden per BL subtype. (A) Distribution of the genomewidemutation burden across the discovery cohort. (B) Mutation frequency is shown for each diseasesubtype. From top to bottom, the following SSMs are considered in each tumour: all genomewideSSMs; SSMs outside mutation peaks; SSMs within mutation peaks; and nonsynonymous SSMsin any gene. This analysis was restricted to WGS data from the BLGSP discovery cohort excludingthe outlier (N = 90). Significance brackets: *, Qvalue < 0.1 (Mann–Whitney U test).
68
*
Clinical variant EBV status EBV type
Endemic BL Sporadic BL EBV−positive EBV−negative EBV type 1 EBV type 2
0
5
10
15F
requ
ency
of m
utat
ed B
LGs
Figure 3.2: Number of BLGs that are mutated in each BLGSP discovery and validation case. Allmutation types were considered. Discordant cases are highlighted as red points. Significancebrackets: *, Qvalue < 0.1 (Mann–Whitney U test).
findings indicate that these two mechanisms at least partially explain the greater mutation
burden in endemic or EBVpositive tumours independently of EBV genome type.
To isolate the source of this variation, I performed linear regression for each signature to
describe its relationship with relevant sample attributes (Table 3.1). As expected, BL
signature A was uniquely associated with age at diagnosis (Pvalue = 0.0021). While it
was significantly more common in endemic and EBVpositive tumours, BL signature B did
not associate specifically with any of the variables I considered (Pvalues > 0.05). In
contrast, BL signature C was found to be significantly associated with tumour EBV status
(Pvalue = 0.038) but not geographic origin (Pvalue = 0.23), suggesting a link between
EBV and DNA mismatch repair. Lastly, consistent with an aetiological link with AICDA, BL
signature D was strictly associated with AICDA expression (Pvalue = 0.00098). Notably,
neither BL signature B nor signature C correlated with AICDA expression, indicating that
these do not have a significant contribution from AICDA (Pvalues = 0.18 and 0.34,
respectively). In summary, I may partly attribute the difference in mutation burden to
defective DNA mismatch repair in EBVpositive tumours and variable AICDA
activity.
69
**
***
*
**
***
*
Clinical variant EBV status EBV type
BL S
ignature A(C
OS
MIC
Sig. 5)
BL S
ignature B(C
OS
MIC
Sig. 17)
BL S
ignature C(C
OS
MIC
Sig. 15)
BL S
ignature D(C
OS
MIC
Sig. 9)
Endemic BL Sporadic BL EBV−positive EBV−negative EBV type 1 EBV type 2
0
2000
4000
0
2500
5000
7500
10000
0
1000
2000
3000
0
2000
4000
6000
Est
imat
ed n
umbe
r of
mut
atio
ns
Figure 3.3: Prevalence of each mutational signature per BL subtype. Estimated number of singlenucleotide variants is shown per mutational signature for each disease subtype in the BLGSPdiscovery cohort excluding the outlier (N = 90). The four de novo mutational signatures (BL sig.)are annotated with the associated COSMIC reference signature (COSMIC sig.). ICGC cases wereexcluded to avoid the possible confounding effect of lower sequencing coverage. Significancebrackets: *, Qvalue < 0.1; **, Qvalue < 0.001; ***, Qvalue < 0.00001 (Mann–Whitney U test).
70
Table 3.1: Linear regression of mutational signatures. Linear regression of the estimated numberof mutations per signature (Sig.) as a function of various covariates. Tumor EBV status and clinicalvariant status were used as covariates in all models, age was used as a covariate for BL signatureA given its association with age, and AICDA expression was used as a covariate for BL signaturesB, C, and D. The linear models were also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).
BLSig.
Term Coefficient Standarderror
Bootstrap 95%CI (N = 10000)
Pvalue
EBV status (Ref: EBVpositive) 320.0 280 280 to 1000 0.26000Clinical variant (Ref: Endemic) 400.0 290 1000 to 110 0.16000
A
Age at diagnosis 80.0 25 24 to 160 0.00210
EBV status (Ref: EBVpositive) 690.0 440 2300 to 130 0.12000Clinical variant (Ref: Endemic) 480.0 430 1600 to 74 0.26000
B
AICDA expression 180.0 140 950 to 120 0.18000
EBV status (Ref: EBVpositive) 420.0 200 950 to 120 0.03800Clinical variant (Ref: Endemic) 230.0 190 570 to 40 0.23000
C
AICDA expression 59.0 62 330 to 51 0.34000
EBV status (Ref: EBVpositive) 3.2 400 640 to 490 0.99000Clinical variant (Ref: Endemic) 200.0 380 800 to 300 0.60000
D
AICDA expression 420.0 120 190 to 670 0.00098
3.2.3 Proteinaltering mutations associated with tumour EBV status
Based on the observation that there are fewer driver mutations in EBVpositive tumours, I
identified the individual BLGs or biologically related gene sets (i.e. pathways) that were
differentially mutated based on geographic origin and/or tumour EBV status (Figure 3.4).
These results are summarized in Supplemental Table 9 of Appendix A. EBVnegative
tumours, but not sporadic tumours, more frequently had mutations in TP53 (Qvalue =
0.0044, Fisher’s exact test), a difference that became more striking when considering a
group comprising all BLGs with roles in apoptosis (Qvalue = 0.00024). I also found
differences in the mutation prevalence of SMARCA4 and CCND3 (Qvalues < 0.1), but I
was unable to confidently resolve whether these relate to geographic origin or EBV status.
In contrast to a previous report, I failed to identify any differentially mutated genes
between tumours infected by EBV type 1 and EBV type 2 (Qvalues > 0.1).163 In short, I
found greater contrast according to EBV status, consistent with the earlier observation
that the frequency of driver mutations varied based on the presence of EBV.
To confirm these findings, I compared tumour EBV status and clinical variant as predictors
of mutation status. For this analysis, I only considered differentially mutated genes and
71
CCND3
SMARCA4
Apoptosis
CCND3
SMARCA4
TP53
Clinical variant (Ref: Endemic BL) EBV status (Ref: EBV−positive) EBV type (Ref: EBV type 1)
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
0
1
2
3
log10(Odds ratio)
−lo
g 10(
Q−
valu
e)
Figure 3.4: Differential incidence of nonsynonymous mutations in molecular BL subtypes.Mutations are restricted to those affected BLGs. Significant differences are highlighted in red(Qvalues < 0.1, indicated by dashed line; Fisher’s exact test).
pathways, which were determined without including the 12 discordant cases. Among the
genes and pathways that were mutated in at least 10% of the cases, SMARCA4,
apoptosis, CCND3, and TP53 were differentially mutated (Qvalues < 0.1, Fisher’s exact
test). Tumour EBV status significantly outperformed geographic origin in predicting the
mutation status of the apoptosis pathway for the discordant cases (Pvalue = 0.0094,
McNemar’s test; Table 3.2). For the remaining genes, it remained inconclusive as to
whether their mutation status in the discordant cases were significantly better predicted
by EBV status or clinical variant (Pvalues > 0.05). Together, these findings demonstrate
that EBVpositive tumours are genetically defined by a paucity of mutations affecting
apoptotic genes, supporting the longstanding hypothesis that persistent EBV infection
abrogates apoptosis in BL tumour cells.
3.2.4 Deregulated AICDA activity in EBVpositive BL
My above analysis of mutational signatures revealed substantial variation in the number
of mutations predicted to be caused by BL signature C. Given that this signature is
aetiologically linked to AICDA activity, I compared AICDA expression based on geographic
origin and tumour EBV status (Figure 3.5A). Consistent with my earlier result, AICDA
expression was signicantly higher in endemic (Qvalue = 9.7 × 10−7, Mann–Whitney U
72
Table 3.2: McNemar’s test results. This table compares tumour EBV status and clinical variantstatus in their ability to predict the mutation status of genes or pathways that are differentiallymutated between EBVpositive eBLs and EBVnegative sBLs (i.e. excluding discordant cases).The McNemar’s test Pvalue indicates whether there is a significant difference in the predictiveperformance of tumour EBV status and clinical variant status.
Gene orPathway
EBV status Clinicalvariant
Mutatedcases
Unmutatedcases
Mutationprevalence
McNemar’stest
PvalueEBVpositive Endemic 27 63 30%EBVnegative Sporadic 13 5 72%EBVpositive Sporadic 1 3 25%
Apoptosis
EBVnegative Endemic 8 0 100%
0.0094
EBVpositive Endemic 9 81 10%EBVnegative Sporadic 8 10 44%EBVpositive Sporadic 0 4 0%
CCND3
EBVnegative Endemic 1 7 12%
0.7700
EBVpositive Endemic 10 80 11%EBVnegative Sporadic 10 8 56%EBVpositive Sporadic 0 4 0%
SMARCA4
EBVnegative Endemic 0 8 0%
0.3900
EBVpositive Endemic 20 70 22%EBVnegative Sporadic 10 8 56%EBVpositive Sporadic 1 3 25%
TP53
EBVnegative Endemic 6 2 75%
0.1500
test) and EBVpositive tumours (Qvalue = 1.9 × 10−8). Linear regression revealed a
stronger association of AICDA expression with tumour EBV status than with geographic
origin (Table 3.3). Consistent with this observation, if endemic and sporadic cases are
considered separately, EBVpositive tumours have higher AICDA expression for both
clinical variants (Figure 3.5B). After accounting for variation associated with EBV status,
geographic origin still significantly accounted for some of the remaining variation.
Altogether, these findings demonstrate that AICDA expression appears to be induced
especially in EBVpositive tumours, but there may also be an unexplained geographic
component to this phenomenon. This increased AICDA expression is expected to result in
enhanced aSHM, which was described as noncoding mutation peaks in Chapter 2.
3.2.5 EBV genome copy number uncorrelated with EBVassociatedeffects
Considering the above associations with tumour EBV status, I asked whether the number
of copies of the EBV genome per tumour cell correlated with the magnitude of the
73
*
Q = 2.9e−03
***
Q = 9.7e−07
***
Q = 1.9e−08 Q = 8.5e−01
Germinal centre Clinical variant EBV status EBV type
Centroblasts Centrocytes Endemic BL Sporadic BL EBV−positiveEBV−negative EBV type 1 EBV type 2
8
10
12
14
AIC
DA
exp
ress
ion
A
*
Q = 0.014
*
Q = 0.024
Endemic BL Sporadic BL
EBV−positive EBV−negative EBV−positive EBV−negative
8
10
12
14
AIC
DA
exp
ress
ion
B
Figure 3.5: AICDA expression per BL subtype. (A) Germinal centre samples (N = 12) are shownseparately from tumour samples (N = 117), which are partitioned according to differentclassification systems. Discordant cases are highlighted as red points. (B) VariancestabilizedAICDA expression in sporadic and endemic BL according to tumour EBV status. Significancebrackets: *, Qvalue < 0.1; ***, Qvalue < 0.00001 (Mann–Whitney U test).
74
Table 3.3: Linear regression of AICDA expression as a function of tumour EBV status and clinicalvariant status. This linear model was also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).
Term Coefficient Standarderror
Bootstrap 95%CI (N = 10000)
Pvalue
EBV status (Ref: EBVpositive) 1.30 0.27 1.9 to 0.46 6.4e06
Clinical variant (Ref: Endemic) 0.66 0.29 1.5 to 0.047 2.3e02
observed effects. I leveraged the stoichiometry of WGS reads and their relation to the
proportion of human and EBV DNA to estimate the EBV genome copy number. I
corrected for genome size, ploidy, and tumour content, which was estimated from the VAF
of clonal SSMs. An assumption for this analysis is that the EBV genome copies are
evenly distributed among the BL cells. The average EBV genome copy number per
tumour cell was 46 (range 13–189). Considering only EBVpositive tumours (N = 71), I
performed Spearman correlation tests for AICDA expression (Figure 3.6A) and
genomewide mutation burden (Figure 3.6B). In both cases, EBV genome copy number
did not correlate (Pvalues = 0.20 and 0.79, respectively), suggesting that the magnitude
of these effects is not related to the number of EBV copies per tumour cell.
Spearman correlation test
r = 0.052 / P = 0.679
10
11
12
13
25 50 75 100 125
EBV genome copy number per tumour cell
AIC
DA
exp
ress
ion
ASpearman correlation test
r = 0.15 / P = 0.22
5000
10000
25 50 75 100 125
EBV genome copy number per tumour cell
Mut
atio
n bu
rden
(ge
nom
e−w
ide)
B
Figure 3.6: Correlation between EBV genome copy number and (A) AICDA expression or (B)genomewide mutation burden.
75
3.2.6 Genetic comparison of intraabdominal and headonly tumours
As mentioned in Chapter 1, one of the most striking differences between endemic and
sporadic cases is the anatomic site affected by the tumour. Endemic cases mostly present
with jaw tumours while facial tumours are exceedingly rare in the sporadic setting; rather,
sporadic cases tend to present with abdominal tumours. Thus, I investigated whether
there were underlying molecular differences that could account for this contrast. While
differential gene expression analysis might seem suitable for this purpose, I encountered
many limitations of such an approach. Notably, normal tissue contamination from adjacent
and stromal cells would render it impossible to confidently assign any differences to the
tumour cells in bulk RNAseq. To avoid this issue, I focused on somatic genetic features
unique to the tumours. I compared the mutation incidence of every BLG and pathway
considered in Chapter 2 between tumours affecting different anatomical sites.
For this analysis, I selected 65 cases that were confidently annotated as facial or
intraabdominal tumours without lymph node involvement. Unfortunately, the ICGC cases
did not provide sufficient clinical metadata, which limited the number of sporadic cases
included in this analysis. The breakdown was 35 cases with jaw tumours and 30 cases
with abdominal disease (Figure 3.7A,B). As expected, 61% of endemic cases presented
with facial tumours, while no sporadic cases were annotated as such. No genes or
pathways had mutations that were significantly associated with anatomic site (Qvalues >
0.1, Fisher’s exact test; Figure 3.7C). That being said, one gene, FBXO11, had a Qvalue
of 0.12, indicating that there might be merit to this analysis, but I may have been
ultimately limited by the sample size.
3.2.7 Variable distribution of MYC breakpoints in BL subtypes
A known genetic feature of BL that warrants revisiting here is the variable distribution of
breakpoints affecting the MYC locus that are associated with an IG locus. As described in
Chapter 1, MYC breakpoints in sporadic cases are proximal to the TSS while they are
much more dispersed relative to MYC in endemic cases. I can recapitulate this result with
my data by comparing the absolute distance between the IGMYC breakpoint on
chromosome 8 and the MYC TSS among BL subtypes. Endemic and sporadic tumours as
well as EBVpositive and EBVnegative tumours both showed significant differences in the
76
0
10
20
30
40
50
Endemic BL Sporadic BL
Clinical variant
Num
ber
of c
ases
Anatomic siteHead−onlydiseaseIntra−abdominaldisease
A
0
10
20
30
40
50
EBV−positive EBV−negative
EBV status
Num
ber
of c
ases
Anatomic siteHead−onlydiseaseIntra−abdominaldisease
B FBXO11
Anatomic site (Ref: Head−only disease)
−2 −1 0 1 2
0.0
0.5
1.0
1.5
2.0
log10(Odds ratio)
−lo
g 10(
Q−
valu
e)
C
Figure 3.7: Genetic comparison of anatomic BL subtypes. (A) Number of endemic and sporadiccases per anatomic subtype. (B) Number of EBVpostive and EBVnegative cases per anatomicsubtype. (C) Differential incidence of nonsynonymous mutations in anatomic BL subtypes.Mutations are restricted to those affected BLGs. Significant differences are highlighted in red(Qvalues < 0.1, indicated by dashed line; Fisher’s exact test).
Table 3.4: Linear regression of the distance between MYC and the associated translocationbreakpoint on chromosome 8 (in kilobases) as a function of tumour EBV status and clinical variantstatus. This linear model was also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).
Term Coefficient Standarderror
Bootstrap 95%CI (N = 10000)
Pvalue
Clinical variant (Ref: Endemic) 14 43 140 to 180 0.76
EBV status (Ref: EBVpositive) 53 42 210 to 99 0.21
distance between the breakpoint and the MYC TSS (Pvalues = 0.0077 and 0.0099,
respectively; Mann–Whitney U test). However, linear regression was unable to assign this
variation to one classification system over the other (Pvalues = 0.76 and 0.21,
respectively; Table 3.4). These findings recapitulate what has been described previously,
but it remains unclear whether tumour EBV status is relatively a more important factor in
determining the IGMYC breakpoint location.
3.2.8 V gene usage not determined by tumour EBV status
In Chapter 2, I demonstrated that V gene usage was nonuniform for both heavy and light
IG chains. However, it was not clear whether specific antigens were eliciting the inclusion
77
of those V genes that were overrepresented among dominant clonotypes. Given the
polymicrobial origins of BL, namely the exposure to EBV and malaria, I investigated
whether a link existed between the presence of certain V genes and that of specific
pathogens. Here, I used the geographicallydefined clinical variants as a proxy for malaria
status with the assumption that most, if not all, endemic cases were infected at least once
by malaria. I also considered tumour EBV status as well as EBV genome type among the
EBVpositive cases. However, I found no significant difference in the prevalence of any of
the considered V genes between the various BL subtypes (Figure 3.8). The inconclusive
nature of these findings may not be surprising given that this IG repertoire analysis relied
on RNAseq rather than the more conventional highdepth targeted sequencing of the
CDR3 region. Further work on the BL repertoire of IG clonotypes is warranted.
3.3 Materials and methods
This chapter relies on the same dataset presented in Chapter 2. Similarly, most data
analyses were described in Chapter 2. The analytical methods that are specific to this
chapter are detailed below.
3.3.1 Data analysis
McNemar’s tests
Discordant cases were defined as EBVnegative endemic BL cases and EBVpositive
sporadic BL cases. Differentially mutated genes and pathways (referred to here as
features) were identified using the following criteria: (1) they must be mutated in at least
10% of cases, and (2) they were differentially mutated between EBVpositive endemic BL
cases and EBVnegative sporadic BL cases (Qvalue < 0.1, Fisher’s exact test).
Discordant cases were excluded from the Fisher’s exact tests to ensure that there is no
reason to believe a priori that the mutation status of these features are preferentially
associated with tumour EBV status or clinical variants. Following that, tumour EBV status
and clinical variant were used as naive predictors of the mutation status of these
differentially mutated features and determined whether or not they were correct for each
case. The performance of tumour EBV status and clinical variant as predictors were
compared using McNemar’s tests. Features with a significant difference according to the
78
IGH IGK IGL
Clinical variant
EB
V status
EB
V type
IGHV4−
34
IGHV3−
30
IGHV3−
7
IGHV4−
59
IGHV3−
23
IGHV3−
15
IGHV3−
21
IGHV4−
39
IGHV3−
48
IGKV3−
20
IGKV1−
39
IGKV1−
5
IGKV4−
1
IGKV3−
15
IGKV3−
11
IGKV1−
33
IGLV
3−25
IGLV
1−51
IGLV
2−14
IGLV
1−44
IGLV
1−40
IGLV
3−19
0%
10%
20%
30%
0%
10%
20%
30%
0%
10%
20%
30%
V g
ene
usag
e
SubtypeEndemic BL
Sporadic BL
EBV−positive
EBV−negative
EBV type 1
EBV type 2
Figure 3.8: Immunoglobulin V gene usage per BL subtypes. Percent prevalence ofimmunoglobulin V genes among dominant IG rearrangements in BL tumours with RNAseq data(N = 106). V genes that are dominant in fewer than 10 BL tumours in the RNAseq data are notdisplayed.
79
McNemar’s test (Pvalue < 0.05) indicate that the “winning” predictor is more strongly
associated with the mutation status of said features.
Data and statistical analyses
Data and statistical analyses were done using the R statistical programming language
(version 3.4.2).216 Mann–Whitney U tests, Fisher’s exact tests, and McNemar’s tests
were used where appropriate with the wilcox.test, fisher.test, and mcnemar.test functions
in R, respectively. Linear regressions were performed using the lm function in R and
bootstrapped 10,000 times to calculate bootstrap 95% confidence intervals using the boot
and boot.ci functions in R (adjusted bootstrap percentile interval).
80
Chapter 4
Discussion and future directions
BL is considered curable with intensive chemotherapy. In practice though, BL patients
suffer from severe side effects due to treatmentrelated toxicity, and many still die from the
disease or treatment complications. Currently, cure rates above 90% are only achievable
in children who have access to proper supportive care, consisting mostly of paediatric
sporadic cases. However, these fortunate patients represent only a minority of BL burden
worldwide considering the incidence of endemic cases, whose survival range from 45% to
70%.48,52,53 This reality motivated the genetic and molecular characterization of paediatric
endemic and sporadic BL presented in this thesis. Hereafter, I will discuss the main
findings from earlier chapters and their implications for the future of BL research.
4.1 De novo mutational signatures
The mutational landscape of BL is not uniform among BL tumours, as revealed by WGS.
Broadly speaking, the overall mutation burden was higher in endemic or EBVpositive
tumours, suggesting underlying differences in the mutational processes active in these
subtypes. In an attempt to understand the biological basis for these differences, I found
the genomes contained variable representations of four robust de novo mutational
signatures, each of which should be associated with a distinct aetiology. Based on
similarity to the reference COSMIC signatures, BL signatures A through D were
respectively attributed to age, an unknown mechanism, defective DNA MMR, and AICDA
activity. Given that only paediatric cases were considered here, it is not surprising that
there was no difference in the prevalence of the agerelated BL signature A on the basis
of geographic origin or tumour EBV status. On the other hand, the three other signatures
were all more prevalent in endemic or EBVpositive tumour genomes. Therefore, the
associated aetiology of each of these three signatures may account for the observed
variation in mutation burden across the discovery cohort. In other words, if the inferred
81
mechanisms are correct, most of the difference in mutation load can be explained by a
lack of DNA MMR and increased AICDA activity.
To refine this model of mutagenesis in BL, I used linear regression to assign variation in
the prevalence of these signatures to covariates such as geographic origin, tumour EBV
status, patient age, and tumour AICDA expression. The robustness of the mutation
signatures was confirmed by a strong association between BL signature A and age at
diagnosis, consistent with the signature’s presumed aetiology. In contrast, BL signature B
remained wholly unaccounted for given that it was not associated with any of the included
covariates. That being said, the lack of correlation with AICDA expression indicates that
this signature is not related to AICDA activity. Interestingly, the MMRrelated BL signature
C was significantly associated with tumour EBV status but not geographic origin. This is
consistent with a model wherein the presence of EBV results in an accumulation of
mutations due to insufficient or aberrant DNA repair. This suggests that the genomes are
in a more fragile state and raises the potential utility of DNAdamaging chemotherapy in
the context of EBVpositive BL. A link between EBV and DNA repair was reported in one
study, which described a loss of H3K4 trimethylation of DNA repair signalling genes due
to EBV in nasopharyngeal epithelial cells.258 This highlights the need to more thoroughly
characterize the BL epigenome in the context of EBV status, which has not been explored
to the same degree as the genome and transcriptome. In this case, DNA methylation
assays comparing EBVpositive and EBVnegative tumours could reveal the role for EBV
in genome and epigenome maintenance.
Lastly, the aetiology for BL signature D was confirmed by a linear correlation with AICDA
expression. After accounting for the contribution of AICDA expression, there was no
association with geographic origin or tumour EBV status. This led me to suspect that
AICDA expression was a confounding variable that is associated with both geographic
origin and tumour EBV status. Indeed, AICDA expression was substantially higher in
endemic or EBVpositive tumours. Given that AICDA was having a strong effect on the
mutational landscape of BL, I employed an approach similar to that used for mutational
signatures to understand the source of variation in expression. Strikingly, most of the
variation in AICDA expression was explained by tumour EBV status, and geographic
origin accounted for the little variation that remained. This finding establishes a strong
82
association between the presence of EBV and increased AICDA expression, and
consequently an elevation in mutation burden.
4.2 Noncoding mutation peaks
The result of deregulated AICDA activity, or aSHM, was readily observable in the
noncoding space. The BL genomes exhibited mutation patterns previously attributed to
focal enrichment of aSHM activity that have been documented in other B cell lymphomas.
The identification of noncoding mutation “peaks” was done solely based on mutation
density without any prior knowledge of gene annotations. Yet, among the most commonly
mutated peaks, the majority were either located in one of the three IG loci or near the TSS
of a gene. Corroborating the implication of AICDA, most genes affected by TSSproximal
peaks were known targets of aSHM in DLBCL (e.g. BACH2, _TCL1A__); the number of
mutated peaks per patient correlated with AICDA expression; most of the peaks were
almost exclusively mutated in EBVpositive tumours; and the mutations tended to occur in
the AICDA recognition motif.176 Although the bulk of these mutations are likely
passengers, the local enrichment of AICDAmediated mutations within some of these
peaks may also have functional consequences that benefit the tumours.
The differentiation of passenger and driver mutations is challenging, especially in the
noncoding setting. Among the putative targets of aSHM, I highlighted two potentially
relevant examples of recurrently mutated regulatory elements, namely the PAX5 enhancer
and the PVT1 promoter. Considering the role of PAX5 in Bcell development, future work
will need to clarify whether the mutations affecting the enhancer exert the same effect as
those seen in chronic lymphocytic leukemia.179 As for the PVT1 promoter, there is recent
evidence that this regulatory element acts as a tumoursuppressor by insulating intragenic
enhancers from inducing MYC expression.259 The same study also demonstrated that
PVT1 promoter mutations could enhance cancer cell growth, albeit in a distinct cell type,
namely breast cancer cells (Figure 4.1). The mutations I have observed in BL alter a
different TSS of PVT1 than the one studied previously. Furthermore, it is unclear whether
the effect on MYC expression will be similar given that the gene is already constitutively
activated by the translocated IG enhancer in BL. Considering the relative ease of
introducing point mutations compared to producing specific genomic rearrangements, it is
83
conceivable that these PVT1 promoter mutations are introduced by EBVinduced AICDA
prior to the IGMYC translocation as a temporary means of promoting growth (Figure 4.2).
In this case, they are expected to remain as a record of a previous driver from an early
progenitor of the malignant clone that ultimately acquired a MYC translocation. I could not
readily test this hypothesis from the bulk sequencing data I had access to in this thesis
given the difficulty of determining mutation timing, especially structural variations. More
precise methods of determining the presence or absence of these mutations at the
singlecell level could shed light on the chronology of BL progression.
Figure 4.1: Putative mechanism of MYC activation mediated by PVT1 promoter mutations.259Figure created with BioRender.com.
4.3 Nonsynonymous mutations
Despite bearing a greater mutation burden, EBVpositive BL genomes have fewer
putative driver mutations affecting BLGs. Together, these two features may account for
the younger age of onset in EBVpositive (or endemic) cases. More specifically, I found a
relative paucity of nonsynonymous mutations in SMARCA4 and CCND3 among
EBVpositive or endemic cases, which has been reported previously.161,163 In other
84
Figure 4.2: Potential role for PVT1 promoter mutations in BL pathogenesis. Figure created withBioRender.com.
words, the CDK4/6 inhibitor palbociclib would be predicted to be more effective in
EBVnegative or sporadic BL.94 However, these differences are not as striking as the
disparity in the prevalence of mutations affecting genes with roles in apoptosis, namely
TP53, USP7, and CDKN2A. A similar but less pronounced difference exists for TP53
when it is considered alone. Importantly, these differences relating to apoptosis and TP53
are strictly associated with tumour EBV status and not geographic origin. This novel
observation was aided by my discovery of USP7 as a recurrently mutated gene in BL.
This gene encodes a deubiquitinase that counteracts MDM2mediated ubiquitination and
degradation of TP53 (Figure 4.3).260 Despite its status as an essential gene in one study,
USP7 has the mutational pattern of a tumoursuppressor in BL.261
The relevance of USP7 is underscored by its known interaction with the protein encoded
by EBNA1, the only consistently expressed EBV protein in BL.49,262 EBNA1 can disrupt
the interaction between TP53 and USP7, which is predicted to have an effect similar to
nonsynonymous variants, namely the loss of TP53 (Figure 4.3).263 These data suggest
85
that EBV may present an alternative mechanism for disrupting apotosis in BL in addition
to somatic mutations. Functional experiments would be required to investigate the
interaction between EBNA1 and USP7 in vivo. Preliminary support for this model exists
based on in vitro experiments that have demonstrated that MDM2 is essential for survival
in lymphoblastoid cell lines transformed by EBV.264,265 Although this hypothetical function
for EBNA1 is compelling, I cannot exclude the potential role of other EBV latency or lytic
genes, which may only be transiently expressed such that their expression is not
detectable using bulk RNAseq. Regardless of the mechanism, the lack of mutations
affecting apoptosis in EBVpositive tumours is consistent with EBVmediated suppression
of apoptosis in BL cells, which is predicted to alleviate the selective pressure for acquiring
mutations affecting genes involved in this process.
Figure 4.3: Potential role for USP7 mutations and/or EBVencoded EBNA1 in abrogatingapoptosis by enhancing MDM2mediated degradation of TP53. Figure created withBioRender.com.
This work also extends the emerging theme of chromatin modifiers as recurrently mutated
in Bcell nonHodgkin lymphomas including BL.96,266 This includes two genes that were
86
associated with BL for the first time, namely SIN3A and CHD8. SIN3A encodes a
transcriptional repressor that acts through histone deacetylase complexes.267 Its ability to
repress MYC target genes is clearly relevant to BL and consistent with the propensity of
mutations in BL predicted to truncate and thus deactivate the protein (Figure 4.4).267 The
loss of SIN3Amediated repression of MYC targets is expected to further promote the
fitness of BL cells. The protein encoded by CHD8 can also act as a repressor of
transcription through chromatin regulation, but unlike SIN3A, it achieves this via the
recruitment of histone H1 (Figure 4.5).268 The specific targets of H1 recruitment remains
unclear and thus the contribution of CHD8 to BL pathogenesis warrants further
investigation.
Figure 4.4: Putative mechanism for SIN3A in repressing the expression of MYC target genes.267Figure created with BioRender.com.
Perhaps the most compelling mutation pattern exemplifying the importance of chromatin
structure in BL biology is the recurrence of mutations affecting members of the SWI/SNF
complex. Similar observations have been made in other cancer types, including other
germinal centre Bcell lymphomas.269,270 In paediatric BL, they represent the most
commonly mutated group of genes other than MYC with a mutation incidence of 59%.
87
Figure 4.5: Putative mechanism for CHD8 in repressing gene expression by recruiting histone H1and thereby condensing chromatin.268 Figure created with BioRender.com.
This nucleosome remodelling pathway also exhibits mutually exclusive mutations,
confirming a functional redundancy between variants affecting ARID1A and SMARCA4. In
spite of this functional redundancy, there is a strong contrast between the types of
mutations affecting each gene. Most mutations in ARID1A are predicted to truncate the
protein, consistent with a tumour suppressor role, whereas SMARCA4 is mainly disrupted
by missense variants. Generally speaking, a lack of truncating mutations in favour of
missense mutations is suggestive of an oncogene, especially when the variants are
constrained to certain regions of the protein. Indeed, all missense mutations in SMARCA4
form two visible clusters affecting residues 773–974 (size 202) and 1155–1243 (size 89),
which can be seen in Appendix B (Ensembl transcript ENST00000429416; 1647 residues
in total). That being said, the SWI/SNF complex is described as a tumoursuppressor in
most cancers, the exception thus far being synovial sarcoma.271 Despite these conflicting
observations regarding the role of SMARCA4 in BL pathogenesis, it is clear that the
missense mutations in this gene have a more nuanced effect on the encoded protein than
a simple gene knockout.
88
Despite their high prevalence, the functional consequence of these mutations has not
been explored in the context of paediatric BL. The challenge of studying the SWI/SNF
complex largely stems from its ability to have both positive and negative effects on gene
expression, which appear dependent on the subunit composition. Notably, in murine
preosteoblast cells, ARID1Acontaining SWI/SNF complexes were found to repress MYC
expression, which could account for the high prevalence of mutations deactivating
ARID1A in BL.269,272 In the same model system, MYC transcription was also dependent
on ARID1Bcontaining SWI/SNF complexes, suggesting that the complex may remain
important in BL as long as ARID1A is excluded as a subunit. This observation could
explain the mutation pattern seen in SMARCA4, namely the lack of truncating mutations,
since the encoded protein is a key component of the SWI/SNF complex. Mutations in one
of the two clusters in SMARCA4 may disrupt the tertiary or quaternary structure of the
complex, potentially by altering proteinprotein interfaces. All that being said, without data
from more relevant cell lines, these potential mechanisms for mutations affecting the
SWI/SNF complex in BL remain hypotheses that need to be tested in future
experiments.
Given that the SWI/SNF complex is known to regulate nucleosome remodelling, one
possible approach to elucidate the effect of mutations disrupting this complex would be to
assess open chromatin. Notably, the assay for transposaseaccessible chromatin using
sequencing (ATACseq) seems an appropriate methodology to apply to BL samples.273 A
challenge with this method is the difficulty of application to clinical samples such as FF
tissue, although recent developments are overcoming this limitation.274 While many of
these chromatin modifiers appear to be tumoursuppressors, improving our understanding
of their role in BL pathogenesis may still reveal therapeutical opportunities that could be
exploited, such as synthetic lethality.275 In fact, short hairpin RNA (shRNA) screens have
identified promising candidate genes whose knockdowns are synthetic lethal when
combined with mutated components of the SWI/SNF complex.271 For example,
SMARCA4mutant cancer cells were highly sensitive to shRNAmediated depletion of
SMARCA2.276 Similarly, in another screen of cancer cell lines, mutations in ARID1A were
synthetic lethal in combination with a depletion of ARID1B.277 The dependency of the
tumour on other paralogs when one is mutated suggest that they occupy the same
89
position in the complex.271 However, while these paralogs may be “structurally
redundant”, developmental data indicate that they are not necessarily functionally
redundant. For instance, germline mutations in ARID1B are associated with
developmental disorders, demonstrating that it is not functionally redundant with
ARID1A.278,279 Hence, while these screens have identified therapeutical opportunities for
a large portion of BL tumours, additional work will be required to minimize any toxicity
related to the essential role played by these genes and their encoded proteins.
Despite these discoveries, much work remains to be done to fully understand the effect of
nonsynonymous driver mutations in BL pathogenesis. Notably, the role of several BLGs
remains unknown, including the most commonly mutated gene in BL, DDX3X. Most BLGs
appear to be tumour suppressor genes by virtue of their mutation pattern, which may limit
the potential utility of knowing their function from a therapeutical standpoint. This work has
also focused exclusively on somatic mutations and did not consider the possibility of
germline variants due to the difficulty of assessing their pathogenicity, especially in African
populations where there remains insufficient data representing the natural genetic
variation in this population.280
4.4 Bcell receptor repertoire
Another genomic feature unique to Bcell malignancies is the somatic rearrangement and
mutation of the three IG regions for the generation of the heavy and light chains that
together form the BCR and secreted antibodies. Previously, I described SHM affecting all
three IG regions, an expected physiologic consequence of B cells that have transited
through the germinal centre. In BL, I observed a greater mutation burden of the IG loci in
EBVpositive tumours, which has been reported previously.139 Although this study
ascribed this difference to distinct cells of origin, my data suggests that it can be primarily
explained by variation in AICDA expression. I also determined the V, D, and J gene
segments that were recombined to generate the expressed IG heavy and light chain
alleles. In particular, I explored V gene usage among the clonal rearrangements for each
tumour with the hypothesis that some V gene segments may be selected more than
others for providing a selective advantage to the tumour. It is worth noting that this
analysis is limited by the use of RNAseq data rather than a more conventional targeted
90
DNA sequencing approach such as adaptive immunity receptor repertoire sequencing
(AIRRseq). Nonetheless, the high BCR expression in BL tumours allowed an exploratory
analysis of V gene usage.
My findings supported my hypothesis that some V genes were overrepresented among
the clonal IG rearrangements. This complements existing data demonstrating the
importance of BCR signaling in BL, thus supporting the clinical use of inhibitors for PI3K,
Syk and Src family kinases.94 Of the commonly used heavy chain V genes, IGHV434 is
the best characterized with an established role in autoreactivity.281,282 This potentially
reveals an alternative or complementary approach for sustaining BCR activation in BLs, in
addition to genetic alterations that increase BCR expression via TCF3 or ID3 mutations.94
Previous reports have suggested a possible role for superantigens in BL.283–285
Interestingly, the most commonly observed clonal light chain V gene was IGKV320.
Preferential IGKV320 usage has been observed in other Bcell nonHodgkin lymphomas,
especially in those linked to hepatitis C virus (HCV) infection.286 To my knowledge, this is
the first time that biased usage of IGKV320 is described in BL, which features one of the
highest frequencies of IGKV320 usage among HCVnegative Bcell malignancies. If this
preliminary observation is confirmed in a larger study, BL patients could benefit from
emerging BCRdirected vaccines that target IGKV320 peptides.286
4.5 Epstein–Barr virus
Since the initial observation of EBV in the tumour cells of BL patients 55 years ago, the
effect of the virus on B cells has been the focus of many studies.18 Its ability to
immortalize B cells in vitro is certainly indicative of a role for EBV in BL pathogenesis, and
yet its functional role remains elusive to this day.287 The lack of progress in this area can
be partly attributed to the challenge of reliably modelling EBVpositive BL in an
experimental setting.135 The difficulty stems from the fact that EBV adopts different gene
expression programs depending on the context, especially in response to the immune
system.111 Generally speaking, the greater the immune surveillance, the fewer genes
EBV will express in order to avoid detection. For this reason, studying the behaviour of
EBV in cell lines—even those derived from BL patients—cannot be readily generalized to
infer its behaviour in lymphomagenesis. The application of highthroughput sequencing to
91
clinical BL samples aims at overcoming this challenge by studying the differences in
tumour biology between EBVpositive and EBVnegative samples.
One of the major findings presented in this thesis is a compelling association between
EBV and AICDA activity. A link between the two has long been hypothesized but with a
paucity of evidence from in vivo studies.288 The present work addresses this lack of data
by showing increased AICDA expression in EBVpositive BL and concomitant aSHM.
While these data are unable to distinguish between correlation and causation, they are
consistent with in vitro experiments that have demonstrated a causative link.140,141 This
relationship between EBV and AICDA is important given that aSHM is thought to promote
the doublestrand breaks that lead to the hallmark IGMYC translocation.289–293 Also, I
and others have found that this process introduces mutations in BLassociated genes
such as ID3.97 It is worth noting that other studies have demonstrated increases in AICDA
expression due to malaria infection.148,149,294 This may explain the weak albeit significant
association between AICDA expression and geographic origin in the linear regression
described earlier. If this is the case, these data suggest that either EBV has a stronger
influence on the transcriptional regulation of AICDA than malaria or its effect on AICDA
may be longerlasting than that of malaria. By mediating this effect on AICDA, EBV and
potentially malaria promote the accumulation of potential driver mutations in BL.
Another key finding is the depletion of mutations altering genes with roles in apoptosis in
EBVpositive tumours. The lack of difference based on geographic origin strengthens the
evidence that EBV disrupts apoptosis, which is not a new idea.288 If my earlier proposed
mechanism that EBNA1 interacts with USP7 to cause TP53 degradation is validated, this
would point to MDM2 inhibitors as a valid treatment approach in TP53–wildtype patients
with either EBV infection or USP7 mutations. That being said, other studies have
suggested alternative mechanisms based on in vitro work. For instance, the apoptosis
regulator CASP3 can be targeted by EBV miRNAs to abrogate the pathway.133,295–299
The mechanistic details for the effect in BL must be elucidated in future functional
experiments in order to pave the way for the development of therapies targeting EBV.
Accordingly, the fact that MYCtranslocated cells undergo apoptosis implies that the B
cells that initiate EBVpositive BL tumours are virally infected before the IGMYC
rearrangement and thereby protected from a fate of MYCmediated apoptosis.69 This
92
model can be unified with the fact that EBV induces AICDA expression in these cells,
increasing their risk of acquiring doublestrand breaks and promoting the formation of this
fundamental translocation. In contrast, EBVnegative tumours follow a similar
progression, but they acquire mutations necessary to disrupt apoptosis as early events
prior to the MYC translocation rather than relying on EBV.
It is worth acknowledging that roughly 30% of EBVpositive tumours also have mutations
affecting apoptosis. It remains an open question whether these mutations came before or
after EBV infection since my bulk sequencing data cannot accurately resolve mutation
timing. Furthermore, given that the viral genome is maintained as an episome in tumour
cells and can be spontaneously lost during cell division, I expect EBV to be depleted from
the tumour cell population unless the virus provides a competitive advantage (Figure
4.6).116,146,300,301 In fact, the immunogenicity of EBV may accelerate this depletion by
exerting a selective pressure against EBVpositive cells in favour of cells that can survive
without EBV.302 In other words, if the oncogenic role of the virus is restricted to abrogating
apoptosis, BL tumours should become EBVindependent following the acquisition of
mutations affecting apoptosis. Given the highly proliferative nature of BL, I would expect a
rapid transition between the EBVpositive and EBVnegative subclones, which may have
been witnessed in at least one case.303 Accordingly, the existence of EBVpositive
tumours that also bear mutations affecting apoptosis suggests that additional oncogenic
roles are played by EBV in BL pathogenesis.
The clear genetic and molecular distinctions between EBVpositive and EBVnegative BL
identified in this thesis reveal a multifaceted role for the virus in Burkitt lymphomagenesis
and shed new light on mechanisms behind EBV carcinogenicity (Figure 4.7). Based on
my results, it may be more accurate to describe BL tumours as EBVdependent or
EBVindependent. Importantly, tumour EBV status appears to be a more clinically relevant
criterion for BL classification given the pathogenic differences and associated implications
for treatment. This reliance on EBV gene expression represents a potential vulnerability
and nominates EBV as a therapeutic target. These data motivate the development of
methods for targeting EBV, including EBV vaccines, smallmolecule inhibitors, or drugs
that trigger lytic gene expression to elicit an immune response.304–306
93
Figure 4.6: Expected outcome from spontaneous loss of EBV during cell division depending onthe role played by the virus. Figure created with BioRender.com.
4.6 Hitandrun hypothesis
The idea of a transient reliance on EBV until somatic mutations are in place to provide the
same oncogenic benefits has been proposed as the “hitandrun” mechanism.302
According to this hypothesis, some (or all) EBVnegative tumours were originally
EBVpositive. In BL, this theory has some support from work that demonstrated the
presence of subclonal EBV “traces” in what would be considered EBVnegative tumours
using standard diagnostic tests.307 Based on the data in this thesis, the acquisition of
mutations disrupting apoptosis appears insufficient to enable the transition to EBV
independence. Notably, the EBV genome copy number is not relatively lower in tumours
with these mutations, which would be expected if the tumours were undergoing the
transition at the time of biopsy (data not shown). A potential limitation is that insufficient
time has elapsed since the acquisition of these mutations. That being said, I do not
observe a difference in the VAF of SSMs affecting TP53 or USP7 based on tumour EBV
status. In other words, the mutations have had enough time to become clonal, and despite
this, EBV was not lost to an appreciable degree. These data suggest that EBV confers a
94
Figure 4.7: Putative model for BL pathogenesis. On their own, MYC translocations are expectedto trigger apoptosis. Alternatively, if mutations disrupt apoptosis (e.g. TP53 mutations) before theMYC translocation, this can give rise to an EBVnegative BL precursor cell. My data show thatEBV can act in place of mutations affecting apoptosis. Furthermore, the observed increase inAICDA activity associated with EBV infection is expected to promote the formation of MYCtranslocations. Altogether, this can give rise to an EBVnegative BL precursor cell. The existenceof EBVpositive tumours with mutations affecting apoptosis indicates other roles played by EBV.The possibility of a hitandrun mechanism, whereby BL cells acquire mutations that obviate theneed for EBV and subsequently lose EBV from the cell population, remains an open question. *,other genetic lesions can disrupt apoptosis. Figure created with BioRender.com.
growth advantage that goes beyond abrogating apoptosis and inducing AICDAmediated
mutagenesis. For example, EBV may be regulating other important pathways such as the
BCRPI3KAKT signalling axis via miRNAmediated repression of PTEN.308
Since the hitandrun hypothesis has been proposed, it was recognized that devising a
strategy to demonstrate the former presence—and ideally, implication—of EBV in an
EBVnegative tumour was going to be challenging.302 This question could be resolved by
tracking the evolution of the tumour during the transition to EBV independence. The
experimental design adopted in this study is not amenable for this approach because bulk
sequencing prevents the assignment of mutations to EBVpositive or EBVnegative
95
subclones. However, newer technologies, such as singlecell sequencing, might offer a
means to overcome this limitation. For example, the use of singlecell RNAseq could
reveal heterogeneous EBV gene expression that would not be observable using bulk
sequencing. It is conceivable that a small subset of EBVinfected BL cells express
oncogenic EBV proteins other than EBNA1 to promote tumour growth, potentially by
transiently inducing cell cycle progression or modulating the microenvironment. This
pattern could easily be missed using bulk RNAseq, especially given the high expression
of some cellular genes including MYC. Critically, singlecell DNA sequencing could
provide key insight into the chronology of BL progression. This approach could detect a
minor EBVpositive clone in an otherwise EBVnegative tumour and allow the genetic
comparison of these subclones. Any acquired molecular alterations could reveal the steps
required for BL to evolve beyond its reliance on EBV and minimize detection by the
immune system. Although clearly beyond the scope of this thesis, the resolution of
whether (and how) EBV participates in hitandrun oncogenesis remains an open and
enticing question in this field and may be resolved with emerging genomic
technologies.
96
Bibliography1. Grande BM, Gerhard DS, Jiang A, Griner NB, Abramson JS, Alexander TB, et al.
Genomewide discovery of somatic coding and noncoding mutations in pediatricendemic and sporadic Burkitt lymphoma. Blood. 2019Jan;blood–2018–09–871418.
2. Rowe M, Kelly GL, Bell AI, Rickinson AB. Burkitt’s lymphoma: the Rosetta Stonedeciphering EpsteinBarr virus biology. Semin Cancer Biol. 2009Dec;19(6):377–88.
3. Poirel HA, Ambrosio MR, Piccaluga PP, Leoncini L. Pathology and MolecularPathogenesis of Burkitt Lymphoma and Lymphoblastic Lymphoma. In: Lenz G,Salles G, editors. Agressive Lymphomas. Cham: Springer InternationalPublishing; 2019. pp. 75–94.
4. Burkitt D. A sarcoma involving the jaws in African children. Br J Surg. 1958Nov;46(197):218–23.
5. O’Conor GT, Davies JNP. Malignant tumors in African children: With special referenceto malignant lymphoma. J Pediatr. 1960 Apr;56(4):526–35.
6. Burkitt D, O’Conor GT. Malignant lymphoma in African children. I. A clinical syndrome.Cancer. 1961 Mar;14(2):258–69.
7. Orem J, Mbidde EK, Lambert B, Sanjose S de, Weiderpass E. Burkitt’s lymphoma inAfrica, a review of the epidemiology and etiology. Afr Health Sci. 2007Sep;7(3):166–75.
8. Stefan DC, Lutchman R. Burkitt lymphoma: epidemiological features and survival in aSouth African centre. Infect Agent Cancer. 2014 Jun;9:19.
9. Pannone G, Zamparese R, Pace M, Pedicillo MC, Cagiano S, Somma P, et al. The roleof EBV in the pathogenesis of Burkitt’s Lymphoma: an Italian hospital basedsurvey. Infect Agent Cancer. 2014 Oct;9(1):34.
10. Seldam REJT, Cooke R, Atkinson L. Childhood lymphoma in the territories of papuaand new guinea. Vol. 19, Cancer. 1966. pp. 437–46.
11. Burkitt D. A ”tumour safari” in East and Central Africa. Br J Cancer. 1962Sep;16:379–86.
12. Burkitt D. Determining the climatic limitations of a children’s cancer common in Africa.Br Med J. 1962 Oct;2(5311):1019–23.
13. Burkitt D. A Lymphoma Syndrome in African Children. Royal College of Surgeons ofEngland; 1961.
14. Burkitt DP, Davies JNP. Lymphoma syndrome in Uganda and tropical Africa. MedPress. 1961;245:367–9.
97
15. Harris RJ. Aetiology of Central African Lymphomata. Br Med Bull. 1964May;20:149–53.
16. Dalldorf G. Lymphomas of African children with different forms or environmentalinfluences. JAMA. 1962 Sep;181:1026–8.
17. Burkitt DP. Charles S. Mott Award. The discovery of Burkitt’s lymphoma. Vol. 51,Cancer. 1983. pp. 1777–86.
18. Epstein MA, Achong BG, Barr YM. Virus Particles in Cultured Lymphoblasts fromBurkitt’s Lymphoma. Lancet. 1964 Mar;283(7335):702–3.
19. Epstein MA, Achong BG, Pope JH. Virus in cultured lymphoblasts from a New GuineaBurkitt lymphoma. Br Med J. 1967 Apr;2(5547):290–1.
20. Henle G, Henle W, Diehl V. Relation of Burkitt’s tumorassociated herpestype virus toinfectious mononucleosis. Proc Natl Acad Sci U S A. 1968Jan;59(1):94–101.
21. Henle G, Henle W. Immunofluorescence in cells derived from Burkitt’s lymphoma. JBacteriol. 1966 Mar;91(3):1248–56.
22. Levy JA, Henle G. Indirect immunofluorescence tests with sera from African childrenand cultured Burkitt lymphoma cells. J Bacteriol. 1966 Jul;92(1):275–6.
23. Piriou E, Asito AS, Sumba PO, Fiore N, Middeldorp JM, Moormann AM, et al. Earlyage at time of primary EpsteinBarr virus infection results in poorly controlled viralinfection in infants from Western Kenya: clues to the etiology of endemic Burkittlymphoma. J Infect Dis. 2012 Mar;205(6):906–13.
24. deThé G, Geser A, Day NE, Tukei PM, Williams EH, Beri DP, et al. Epidemiologicalevidence for causal relationship between EpsteinBarr virus and Burkitt’slymphoma from Ugandan prospective study. Nature. 1978Aug;274(5673):756–61.
25. Burkitt DP. Etiology of Burkitt’s Lymphoma—an Alternative Hypothesis to a VectoredVirus. J Natl Cancer Inst. 1969 Jan;42(1):19–28.
26. Morrow RH, Kisuule A, Pike MC, Smith PG. Burkitt’s Lymphoma in the MengoDistricts of Uganda: Epidemiologic Features and Their Relationship to Malaria. JNatl Cancer Inst. 1976 Mar;56(3):479–83.
27. Williams AO. Haemoglobin genotypes, ABO blood groups, and Burkitt’s tumour. JMed Genet. 1966 Sep;3(3):177–9.
28. Pike MC, Morrow RH, Kisuule A, Mafigiri J. Burkitt’s lymphoma and sickle cell trait. BrJ Prev Soc Med. 1970 Feb;24(1):39–41.
29. Moormann AM, Snider CJ, Chelimo K. The company malaria keeps: how coinfectionwith EpsteinBarr virus leads to endemic Burkitt lymphoma. Curr Opin Infect Dis.2011 Oct;24(5):435–41.
98
30. Emmanuel B, Kawira E, Ogwang MD, Wabinga H, Magatti J, Nkrumah F, et al. AfricanBurkitt lymphoma: agespecific risk and correlations with malaria biomarkers. AmJ Trop Med Hyg. 2011 Mar;84(3):397–401.
31. Burkitt D, Wright D. Geographical and tribal distribution of the African lymphoma inUganda. Br Med J. 1966 Mar;1(5487):569–73.
32. O’conor GT. Malignant lymphoma in African children. II. A pathological entity. Cancer.1961 Mar;14(2):270–83.
33. O’conor GT, Rappaport H, Smith EB. Childhood Lymphoma Resembling ”BurkittTumor” In the United States. Cancer. 1965 Apr;18:411–7.
34. Doll DC, List AF. Burkitt’s lymphoma in a homosexual. Lancet. 1982May;1(8279):1026–7.
35. Ziegler JL, Drew WL, Miner RC, Mintz L, Rosenbaum E, Gershow J, et al. Outbreak ofBurkitt’slike lymphoma in homosexual men. Lancet. 1982Sep;2(8299):631–3.
36. WhangPeng J, Lee EC, Sieverts H, Magrath IT. Burkitt’s lymphoma in AIDS:cytogenetic study. Blood. 1984 Apr;63(4):818–22.
37. Gong JZ, Stenzel TT, Bennett ER, Lagoo AS, Dunphy CH, Moore JO, et al. BurkittLymphoma Arising in Organ Transplant Recipients: A Clinicopathologic Study ofFive Cases. Am J Surg Pathol. 2003 Jun;27(6):818–27.
38. Robertson ES, editor. Burkitt’s Lymphoma. Springer, New York, NY; 2013.
39. Swerdlow SH, Campo E, Harris NL, Jaffe ES, Pileri SA, Stein H, et al., editors. WHOClassification of Tumours of Haematopoietic and Lymphoid Tissues. Revised 4thedition. Lyon, France: International Agency for Research on Cancer; 2017. (WHOclassification of tumours; vol. 2).
40. Dave SS, Fu K, Wright GW, Lam LT, Kluin P, Boerma EJ, et al. Molecular Diagnosisof Burkitt’s Lymphoma. N Engl J Med. 2006 Jun;354(23):2431–42.
41. Swerdlow SH, Campo E, Pileri SA, Harris NL, Stein H, Siebert R, et al. The 2016revision of the World Health Organization classification of lymphoid neoplasms.Blood. 2016 May;127(20):2375–90.
42. Magrath I, Adde M, Shad A, Venzon D, Seibel N, Gootenberg J, et al. Adults andchildren with small noncleavedcell lymphoma have a similar excellent outcomewhen treated with the same chemotherapy regimen. J Clin Oncol. 1996Mar;14(3):925–34.
43. Adde M, Shad A, Venzon D, Arndt C, Gootenberg J, Neely J, et al. Additionalchemotherapy agents improve treatment outcome for children and adults withadvanced Bcell lymphomas. Semin Oncol. 1998 Apr;25(2 Suppl 4):33–9;discussion 45–8.
44. Patte C, Auperin A, Michon J, Behrendt H, Leverger G, Frappaz D, et al. The SociétéFrançaise d’Oncologie Pédiatrique LMB89 protocol: highly effective multiagent
99
chemotherapy tailored to the tumor burden and initial response in 561 unselectedchildren with Bcell lymphomas and L3 leukemia. Blood. 2001Jun;97(11):3370–9.
45. Costa LJ, Xavier AC, Wahlquist AE, Hill EG. Trends in survival of patients with Burkittlymphoma/leukemia in the USA: an analysis of 3691 cases. Blood. 2013Jun;121(24):4861–6.
46. Magrath IT. Treatment of Burkitt lymphoma in children and adults: Lessons fromAfrica. Curr Hematol Malig Rep. 2006 Dec;1(4):230–40.
47. Molyneux EM, Rochford R, Griffin B, Newton R, Jackson G, Menon G, et al. Burkitt’slymphoma. Lancet. 2012 Mar;379(9822):1234–44.
48. Buckle G, Maranda L, Skiles J, Ong’echa JM, Foley J, Epstein M, et al. Factorsinfluencing survival among Kenyan children diagnosed with endemic Burkittlymphoma between 2003 and 2011: A historical cohort study. Int J Cancer. 2016Sep;139(6):1231–40.
49. Magrath I. Epidemiology: clues to the pathogenesis of Burkitt lymphoma. Br JHaematol. 2012 Mar;156(6):744–56.
50. Mbulaiteye SM, Talisuna AO, Ogwang MD, McKenzie FE, Ziegler JL, Parkin DM.African Burkitt’s lymphoma: could collaboration with HIV1 and malariaprogrammes reduce the high mortality rate? Lancet. 2010May;375(9726):1661–3.
51. JokoFru WY, Parkin DM, Borok M, Chokunonga E, Korir A, Nambooze S, et al.Survival from Childhood Cancers in Eastern Africa: A Populationbased registrystudy. Int J Cancer. 2018 Jul;
52. Harif M, Barsaoui S, Benchekroun S, Bouhas R, Doumbé P, Khattab M, et al.Treatment of Bcell lymphoma with LMB modified protocols in Africa–report of theFrenchAfrican Pediatric Oncology Group (GFAOP). Pediatr Blood Cancer. 2008Jun;50(6):1138–42.
53. Ngoma T, Adde M, Durosinmi M, Githang’a J, Aken’Ova Y, Kaijage J, et al. Treatmentof Burkitt lymphoma in equatorial Africa using a simple threedrug combinationfollowed by a salvage regimen for patients with persistent or recurrent disease. BrJ Haematol. 2012 Sep;158(6):749–62.
54. Dunleavy K, Roschewski M, Abramson JS, Link B, Parekh S, Jagadeesh D, et al.RiskAdapted Therapy in Adults with Burkitt Lymphoma: Updated Results of aMulticenter Prospective Phase II Study of DAEPOCHR. Hematol Oncol. 2017Jun;35:133–4.
55. Sweetenham JW, Pearce R, Taghipour G, Blaise D, Gisselbrecht C, Goldstone AH.Adult Burkitt’s and Burkittlike nonHodgkin’s lymphoma–outcome for patientstreated with highdose therapy and autologous stemcell transplantation in firstremission or at relapse: results from the European Group for Blood and MarrowTransplantation. J Clin Oncol. 1996 Sep;14(9):2465–72.
100
56. Jacobson C, LaCasce A. How I treat Burkitt lymphoma in adults. Blood. 2014Nov;124(19):2913–20.
57. Murphy K. Janeway’s immunobiology. 9th edition. New York, NY : GarlandScience/Taylor & Francis Group, LLC; 2017.
58. Klein U, Klein G, EhlinHenriksson B, Rajewsky K, Küppers R. Burkitt’s lymphoma is amalignancy of mature B cells expressing somatically mutated V region genes. MolMed. 1995 Jul;1(5):495–505.
59. Chapman CJ, Mockridge CI, Rowe M, Rickinson AB, Stevenson FK. Analysis of VHgenes used by neoplastic B cells in endemic Burkitt’s lymphoma shows somatichypermutation and intraclonal heterogeneity. Blood. 1995Apr;85(8):2176–81.
60. Tamaru J, Hummel M, Marafioti T, Kalvelage B, Leoncini L, Minacci C, et al. Burkitt’slymphomas express VH genes with a moderate number of antigenselectedsomatic mutations. Am J Pathol. 1995 Nov;147(5):1398–407.
61. Victora GD, DominguezSola D, Holmes AB, Deroubaix S, DallaFavera R,Nussenzweig MC. Identification of human germinal center light and dark zonecells and their relationship to human Bcell lymphomas. Blood. 2012Sep;120(11):2240–8.
62. Pasqualucci L, Neumeister P, Goossens T, Nanjangud G, Chaganti RS, Küppers R, etal. Hypermutation of multiple protooncogenes in Bcell diffuse largecelllymphomas. Nature. 2001 Jul;412(6844):341–6.
63. Peters A, Storb U. Somatic hypermutation of immunoglobulin genes is linked totranscription initiation. Immunity. 1996 Jan;4(1):57–65.
64. Fukita Y, Jacobs H, Rajewsky K. Somatic hypermutation in the heavy chain locuscorrelates with transcription. Immunity. 1998 Jul;9(1):105–14.
65. Pavri R, Gazumyan A, Jankovic M, Di Virgilio M, Klein I, AnsarahSobrinho C, et al.Activationinduced cytidine deaminase targets DNA at sites of RNA polymerase IIstalling by interaction with Spt5. Cell. 2010 Oct;143(1):122–33.
66. Basso K, DallaFavera R. Germinal centres and B cell lymphomagenesis. Nat RevImmunol. 2015 Mar;15(3):172–84.
67. Dang CV, O’Donnell KA, Zeller KI, Nguyen T, Osthus RC, Li F. The cMyc target genenetwork. Semin Cancer Biol. 2006 Aug;16(4):253–64.
68. Meyer N, Penn LZ. Reflecting on 25 years with MYC. Nat Rev Cancer. 2008Dec;8(12):976–90.
69. Evan GI, Wyllie AH, Gilbert CS, Littlewood TD, Land H, Brooks M, et al. Induction ofapoptosis in fibroblasts by cmyc protein. Cell. 1992 Apr;69(1):119–28.
70. Ci W, Polo JM, Cerchietti L, Shaknovich R, Wang L, Yang SN, et al. The BCL6transcriptional program features repression of multiple oncogenes in primary Bcells and is deregulated in DLBCL. Blood. 2009 May;113(22):5536–48.
101
71. DominguezSola D, Victora GD, Ying CY, Phan RT, Saito M, Nussenzweig MC, et al.The protooncogene MYC is required for selection in the germinal center andcyclic reentry. Nat Immunol. 2012 Nov;13(11):1083–91.
72. Manolov G, Manolova Y. Marker band in one chromosome 14 from Burkittlymphomas. Nature. 1972 May;237(5349):33–4.
73. Jarvis JE, Ball G, Rickison AB, Epstein MA. Cytogenetic studies on humanlymphoblastoid cell lines from Burkitt’s lymphomas and other sources. Int JCancer. 1974 Dec;14(6):716–21.
74. Zech L, Haglund U, Nilsson K, Klein G. Characteristic chromosomal abnormalities inbiopsies and lymphoidcell lines from patients with burkitt and nonburkittlymphomas. Int J Cancer. 1976 Jan;17(1):47–56.
75. Taub R, Kirsch I, Morton C, Lenoir G, Swan D, Tronick S, et al. Translocation of thecmyc gene into the immunoglobulin heavy chain locus in human Burkittlymphoma and murine plasmacytoma cells. Proc Natl Acad Sci U S A. 1982Dec;79(24):7837–41.
76. DallaFavera R, Bregni M, Erikson J, Patterson D, Gallo RC, Croce CM. Humancmyc onc gene is located on the region of chromosome 8 that is translocated inBurkitt lymphoma cells. Proc Natl Acad Sci U S A. 1982 Dec;79(24):7824–7.
77. Adams JM, Harris AW, Pinkert CA, Corcoran LM, Alexander WS, Cory S, et al. Thecmyc oncogene driven by immunoglobulin enhancers induces lymphoidmalignancy in transgenic mice. Nature. 1985;318(6046):533–8.
78. Schüler F, Hirt C, Dölken G. Chromosomal translocation t(14;18) in healthyindividuals. Semin Cancer Biol. 2003 Jun;13(3):203–9.
79. Pelicci PG, Knowles DM 2nd, Magrath I, DallaFavera R. Chromosomal breakpointsand structural alterations of the cmyc locus differ in endemic and sporadic formsof Burkitt lymphoma. Proc Natl Acad Sci U S A. 1986 May;83(9):2984–8.
80. Shiramizu B, Barriga F, Neequaye J, Jafri A, DallaFavera R, Neri A, et al. Patterns ofchromosomal breakpoint locations in Burkitt’s lymphoma: relevance to geographyand EpsteinBarr virus association. Blood. 1991 Apr;77(7):1516–26.
81. Kovalchuk AL, AnsarahSobrinho C, Hakim O, Resch W, Tolarová H, Dubois W, et al.Mouse model of endemic Burkitt translocations reveals the longrange boundariesof Igmediated oncogene deregulation. Proc Natl Acad Sci U S A. 2012Jul;109(27):10972–7.
82. Neri A, Barriga F, Knowles DM, Magrath IT, DallaFavera R. Different regions of theimmunoglobulin heavychain locus are involved in chromosomal translocations indistinct pathogenetic forms of Burkitt lymphoma. Proc Natl Acad Sci U S A. 1988Apr;85(8):2748–52.
83. Basso K, Frascella E, Zanesco L, Rosolen A. Improved longdistance polymerasechain reaction for the detection of t(8;14)(q24;q32) in Burkitt’s lymphomas. Am JPathol. 1999 Nov;155(5):1479–85.
102
84. Burmeister T, Schwartz S, Horst HA, Rieder H, Gökbuget N, Hoelzer D, et al.Molecular heterogeneity of sporadic adult Burkitttype leukemia/lymphoma asrevealed by PCR and cytogenetics: correlation with morphology, immunology andclinical features. Leukemia. 2005 Aug;19(8):1391–8.
85. Busch K, Keller T, Fuchs U, Yeh RF, Harbott J, Klose I, et al. Identification of twodistinct MYC breakpoint clusters and their association with various IGH breakpointregions in the t(8;14) translocations in sporadic Burkittlymphoma. Leukemia.2007 Aug;21(8):1739–51.
86. Burmeister T, Molkentin M, Schwartz S, Gökbuget N, Hoelzer D, Thiel E, et al.Erroneous class switching and false VDJ recombination: molecular dissection oft(8;14)/MYCIGH translocations in Burkitttype lymphoblastic leukemia/Bcelllymphoma. Mol Oncol. 2013 Aug;7(4):850–8.
87. Robbiani DF, Bothmer A, Callen E, ReinaSanMartin B, Dorsett Y, Difilippantonio S,et al. AID is required for the chromosomal breaks in cmyc that lead to cmyc/IgHtranslocations. Cell. 2008 Dec;135(6):1028–38.
88. Magrath I. The Pathogenesis of Burkitt’s Lymphoma. In: Vande Woude GF, Klein G,editors. Advances in Cancer Research. Academic Press; 1990. pp.133–270.
89. Pasqualucci L. Molecular pathogenesis of germinal centerderived B cell lymphomas.Immunol Rev. 2019 Mar;288(1):240–61.
90. Gaidano G, Ballerini P, Gong JZ, Inghirami G, Neri A, Newcomb EW, et al. p53mutations in human lymphoid malignancies: association with Burkitt lymphomaand chronic lymphocytic leukemia. Proc Natl Acad Sci U S A. 1991Jun;88(12):5413–7.
91. Eischen CM, Weber JD, Roussel MF, Sherr CJ, Cleveland JL. Disruption of theARF–Mdm2–p53 tumor suppressor pathway in Mycinduced lymphomagenesis.Genes Dev. 1999 Oct;13(20):2658–69.
92. Schmitt CA, McCurrach ME, Stanchina E de, WallaceBrodeur RR, Lowe SW.INK4a/ARF mutations accelerate lymphomagenesis and promotechemoresistance by disabling p53. Genes Dev. 1999 Oct;13(20):2670–7.
93. Lindstrom MS, Klangby U, Wiman KG. p14ARF homozygous deletion or MDM2overexpression in Burkitt lymphoma lines carrying wild type p53. Oncogene. 2001Apr;20(17):2171–7.
94. Schmitz R, Young RM, Ceribelli M, Jhavar S, Xiao W, Zhang M, et al. Burkittlymphoma pathogenesis and therapeutic targets from structural and functionalgenomics. Nature. 2012 Oct;490(7418):116–20.
95. GiulinoRoth L, Wang K, MacDonald TY, Mathew S, Tam Y, Cronin MT, et al. Targetedgenomic sequencing of pediatric Burkitt lymphoma identifies recurrent alterationsin antiapoptotic and chromatinremodeling genes. Blood. 2012Dec;120(26):5181–4.
103
96. Love C, Sun Z, Jima D, Li G, Zhang J, Miles R, et al. The genetic landscape ofmutations in Burkitt lymphoma. Nat Genet. 2012 Dec;44(12):1321–5.
97. Richter J, Schlesner M, Hoffmann S, Kreuz M, Leich E, Burkhardt B, et al. Recurrentmutation of the ID3 gene in Burkitt lymphoma identified by integrated genome,exome and transcriptome sequencing. Nat Genet. 2012Dec;44(12):1316–20.
98. Schmitz R, Ceribelli M, Pittaluga S, Wright GW, Staudt LM. Oncogenic mechanisms inBurkitt lymphoma. Cold Spring Harb Perspect Med. 2014Feb;4(2):a014282–2.
99. DominguezSola D, Kung J, Holmes AB, Wells VA, Mo T, Basso K, et al. The FOXO1Transcription Factor Instructs the Germinal Center Dark Zone Program. Immunity.2015 Dec;43(6):1064–74.
100. Sander S, Chu VT, Yasuda T, Franklin A, Graf R, Calado DP, et al. PI3 Kinase andFOXO1 Transcription Factor Activity Differentially Control B Cells in the GerminalCenter Light and Dark Zones. Immunity. 2015 Dec;43(6):1075–86.
101. Muppidi JR, Schmitz R, Green JA, Xiao W, Larsen AB, Braun SE, et al. Loss ofsignalling via Gα13 in germinal centre Bcellderived lymphoma. Nature. 2014Dec;516(7530):254–8.
102. Lu C, Allis CD. SWI/SNF complex in cancer. Nat Genet. 2017 Jan;49(2):178–9.
103. Ditton HJ, Zimmer J, Kamp C, RajpertDe Meyts E, Vogt PH. The AZFa gene DBY(DDX3Y) is widely transcribed but the protein is limited to the male germ cells bytranslation control. Hum Mol Genet. 2004 Oct;13(19):2333–41.
104. Jiang L, Gu ZH, Yan ZX, Zhao X, Xie YY, Zhang ZG, et al. Exome sequencingidentifies somatic mutations of DDX3X in natural killer/Tcell lymphoma. NatGenet. 2015 Sep;47(9):1061–6.
105. ShannonLowe C, Rickinson A. The Global Landscape of EBVAssociated Tumors.Front Oncol. 2019;9:713.
106. Cohen JI, Fauci AS, Varmus H, Nabel GJ. EpsteinBarr virus: an important vaccinetarget for cancer prevention. Sci Transl Med. 2011 Nov;3(107):107fs7.
107. Young LS, Rickinson AB. EpsteinBarr virus: 40 years on. Nat Rev Cancer. 2004Oct;4(10):757–68.
108. Werner J, Henle G, Pinto CA, Haff RF, Henle W. Establishment of continuouslymphoblast cultures from leukocytes of gibbons (Hylobates lar). Int J Cancer.1972 Nov;10(3):557–67.
109. Baer R, Bankier AT, Biggin MD, Deininger PL, Farrell PJ, Gibson TJ, et al. DNAsequence and expression of the B958 EpsteinBarr virus genome. Nature.1984;310(5974):207–11.
104
110. Rowe M, Rowe DT, Gregory CD, Young LS, Farrell PJ, Rupani H, et al. Differences inB cell growth phenotype reflect novel patterns of EpsteinBarr virus latent geneexpression in Burkitt’s lymphoma cells. EMBO J. 1987 Sep;6(9):2743–51.
111. Price AM, Luftig MA. To be or not IIb: a multistep process for EpsteinBarr viruslatency establishment and consequences for B cell tumorigenesis. PLoS Pathog.2015 Mar;11(3):e1004656.
112. Kelly G, Bell A, Rickinson A. Epstein–Barr virus–associated Burkittlymphomagenesis selects for downregulation of the nuclear antigen EBNA2. NatMed. 2002 Oct;8(10):1098–104.
113. Kieff E, Rickinson AB. In Fields Virology Vol. 2 (eds. Knipe DM & Howley PM)2511–2573. Lippincott Williams & Wilkins; 2001.
114. Humme S, Reisbach G, Feederle R, Delecluse HJ, Bousset K, Hammerschmidt W,et al. The EBV nuclear antigen 1 (EBNA1) enhances B cell immortalization severalthousandfold. Proc Natl Acad Sci U S A. 2003 Sep;100(19):10989–94.
115. Takada K, Horinouchi K, Ono Y, Aya T, Osato T, Takahashi M, et al. An EpsteinBarrvirusproducer line Akata: establishment of the cell line and analysis of viral DNA.Virus Genes. 1991 Apr;5(2):147–56.
116. Shimizu N, TanabeTochikura A, Kuroiwa Y, Takada K. Isolation of EpsteinBarr virus(EBV)negative cell clones from the EBVpositive Burkitt’s lymphoma (BL) lineAkata: malignant phenotypes of BL cells are dependent on EBV. J Virol. 1994Sep;68(9):6069–73.
117. Chodosh J, Holder VP, Gan YJ, Belgaumi A, Sample J, Sixbey JW. Eradication oflatent EpsteinBarr virus by hydroxyurea alters the growthtransformed cellphenotype. J Infect Dis. 1998 May;177(5):1194–201.
118. Komano J, Sugiura M, Takada K. EpsteinBarr virus contributes to the malignantphenotype and to apoptosis resistance in Burkitt’s lymphoma cell line Akata. JVirol. 1998 Nov;72(11):9150–6.
119. Ruf IK, Rhyne PW, Yang H, Borza CM, HuttFletcher LM, Cleveland JL, et al.Epsteinbarr virus regulates cMYC, apoptosis, and tumorigenicity in Burkittlymphoma. Mol Cell Biol. 1999 Mar;19(3):1651–60.
120. Kennedy G, Komano J, Sugden B. EpsteinBarr virus provides a survival factor toBurkitt’s lymphomas. Proc Natl Acad Sci U S A. 2003Nov;100(24):14269–74.
121. Wilson JB, Bell JL, Levine AJ. Expression of EpsteinBarr virus nuclear antigen1induces B cell neoplasia in transgenic mice. EMBO J. 1996Jun;15(12):3117–26.
122. Brady G, Macarthur GJ, Farrell PJ. EpsteinBarr virus and Burkitt lymphoma.Postgrad Med J. 2008 Jul;84(993):372–7.
105
123. Araujo I, Foss HD, Hummel M, Anagnostopoulos I, Barbosa HS, Bittencourt A, et al.Frequent expansion of EpsteinBarr virus (EBV) infected cells in germinal centresof tonsils from an area with a high incidence of EBVassociated lymphoma. JPathol. 1999 Feb;187(3):326–30.
124. Babcock GJ, Hochberg D, ThorleyLawson AD. The expression pattern ofEpsteinBarr virus latent genes in vivo is dependent upon the differentiation stageof the infected B cell. Immunity. 2000 Oct;13(4):497–506.
125. Komano J, Maruo S, Kurozumi K, Oda T, Takada K. Oncogenic role of EpsteinBarrvirusencoded RNAs in Burkitt’s lymphoma cell line Akata. J Virol. 1999Dec;73(12):9827–31.
126. Ruf IK, Rhyne PW, Yang C, Cleveland JL, Sample JT. EpsteinBarr virus small RNAspotentiate tumorigenicity of Burkitt lymphoma cells independently of an effect onapoptosis. J Virol. 2000 Nov;74(21):10223–8.
127. Nanbo A, Inoue K, AdachiTakasawa K, Takada K. EpsteinBarr virus RNA confersresistance to interferonalphainduced apoptosis in Burkitt’s lymphoma. EMBO J.2002 Mar;21(5):954–65.
128. Kitagawa N, Goto M, Kurozumi K, Maruo S, Fukayama M, Naoe T, et al.EpsteinBarr virusencoded poly(A)() RNA supports Burkitt’s lymphoma growththrough interleukin10 induction. EMBO J. 2000 Dec;19(24):6742–50.
129. Ogden CA, Pound JD, Batth BK, Owens S, Johannessen I, Wood K, et al. Enhancedapoptotic cell clearance capacity and B cell survival factor production byIL10activated macrophages: implications for Burkitt’s lymphoma. J Immunol.2005 Mar;174(5):3015–23.
130. Wahlgren M, Abrams JS, Fernandez V, Bejarano MT, Azuma M, Torii M, et al.Adhesion of Plasmodium falciparuminfected erythrocytes to human cells andsecretion of cytokines (IL1beta, IL1RA, IL6, IL8, IL10, TGF beta, TNF alpha,GCSF, GMCSF. Scand J Immunol. 1995 Dec;42(6):626–36.
131. Lyke KE, Burges R, Cissoko Y, Sangare L, Dao M, Diarra I, et al. Serum levels of theproinflammatory cytokines interleukin1 beta (IL1beta), IL6, IL8, IL10, tumornecrosis factor alpha, and IL12(p70) in Malian children with severe Plasmodiumfalciparum malaria and matched uncomplicated malaria or healthy controls. InfectImmun. 2004 Oct;72(10):5630–7.
132. Leucci E, Onnis A, Cocco M, De Falco G, Imperatore F, Giuseppina A, et al. Bcelldifferentiation in EBVpositive Burkitt lymphoma is impaired at posttranscriptionallevel by miRNAaltered expression. Int J Cancer. 2010 Mar;126(6):1316–26.
133. Vereide DT, Seto E, Chiu YF, Hayes M, Tagawa T, Grundhoff A, et al. Epstein–Barrvirus maintains lymphomas via its miRNAs. Oncogene. 2013Mar;33(10):1258–64.
106
134. Piccaluga PP, Navari M, De Falco G, Ambrosio MR, Lazzi S, Fuligni F, et al.Virusencoded microRNA contributes to the molecular profile of EBVpositiveBurkitt lymphomas. Oncotarget. 2016 Jan;7(1):224–40.
135. Bornkamm GW. EpsteinBarr virus and its role in the pathogenesis of Burkitt’slymphoma: an unresolved issue. Semin Cancer Biol. 2009Dec;19(6):351–65.
136. Souza TA, Stollar BD, Sullivan JL, Luzuriaga K, ThorleyLawson DA. Influence ofEBV on the peripheral blood memory B cell compartment. J Immunol. 2007Sep;179(5):3153–60.
137. Gil Y, LevyNabot S, Steinitz M, Laskov R. Somatic mutations and activationinducedcytidine deaminase (AID) expression in established rheumatoid factorproducinglymphoblastoid cell line. Mol Immunol. 2007 Jan;44(4):494–505.
138. Epeldegui M, Hung YP, McQuay A, Ambinder RF, MartınezMaza O. Infection ofhuman B cells with EpsteinBarr virus results in the expression of somatichypermutationinducing molecules and in the accrual of oncogene mutations. MolImmunol. 2007 Feb;44(5):934–42.
139. Bellan C, Lazzi S, Hummel M, Palummo N, Santi M de, Amato T, et al.Immunoglobulin gene analysis reveals 2 distinct cells of origin for EBVpositiveand EBVnegative Burkitt lymphomas. Blood. 2005 Aug;106(3):1031–6.
140. Kim JH, Kim WS, Park C. EpsteinBarr virus latent membrane protein 1 increasesgenomic instability through Egr1mediated upregulation of activationinducedcytidine deaminase in Bcell lymphoma. Leuk Lymphoma. 2013Sep;54(9):2035–40.
141. Kalchschmidt JS, BashfordRogers R, Paschos K, Gillman ACT, Styles CT, Kellam P,et al. EpsteinBarr virus nuclear protein EBNA3C directly induces expression ofAID and somatic mutations in B cells. J Exp Med. 2016 May;213(6):921–8.
142. Kurth J, Hansmann ML, Rajewsky K, Küppers R. EpsteinBarr virusinfected B cellsexpanding in germinal centers of infectious mononucleosis patients do notparticipate in the germinal center reaction. Proc Natl Acad Sci U S A. 2003Apr;100(8):4730–5.
143. Tobollik S, Meyer L, Buettner M, Klemmer S, Kempkes B, Kremmer E, et al.EpsteinBarr virus nuclear antigen 2 inhibits AID expression during EBVdrivenBcell growth. Blood. 2006 Dec;108(12):3859–64.
144. Neri A, Barriga F, Inghirami G, Knowles DM, Neequaye J, Magrath IT, et al.EpsteinBarr virus infection precedes clonal expansion in Burkitt’s and acquiredimmunodeficiency syndromeassociated lymphoma. Blood. 1991Mar;77(5):1092–5.
145. Kirchmaier AL, Sugden B. Plasmid maintenance of derivatives of oriP ofEpsteinBarr virus. J Virol. 1995 Feb;69(2):1280–3.
107
146. Nanbo A, Sugden A, Sugden B. The coupling of synthesis and partitioning of EBV’splasmid replicon is revealed in live cells. EMBO J. 2007Oct;26(19):4252–62.
147. Jerusalem C, Jap P, Eling W. Virus Induced Malignant Lymphome in MiceDependent on a RES “Conditioned” by Chronic Parasitic Infection (P. Berghei). In:Di Luzio NR, Flemming KBP, editors. The Reticuloendothelial System and ImmunePhenomena: Proceedings of the Ludwig Aschoff Memorial Meeting of theReticuloendothelial Society, Freiburg, Germany, August 1970. Boston, MA:Springer US; 1971. pp. 391–9.
148. Torgbor C, Awuah P, Deitsch K, Kalantari P, Duca KA, ThorleyLawson DA. Amultifactorial role for P. falciparum malaria in endemic Burkitt’s lymphomapathogenesis. PLoS Pathog. 2014 May;10(5):e1004170.
149. Wilmore JR, Asito AS, Wei C, Piriou E, Sumba PO, Sanz I, et al. AID expression inperipheral blood of children living in a malaria holoendemic region is associatedwith changes in B cell subsets and EpsteinBarr virus. Int J Cancer. 2015Mar;136(6):1371–80.
150. Bosch CA van den. Is endemic Burkitt’s lymphoma an alliance between threeinfections and a tumour promoter? Lancet Oncol. 2004 Dec;5(12):738–46.
151. Chêne A, Donati D, GuerreiroCacais AO, Levitsky V, Chen Q, Falk KI, et al. Amolecular link between malaria and EpsteinBarr virus reactivation. PLoS Pathog.2007 Jun;3(6):e80.
152. Donati D, Mok B, Chêne A, Xu H, Thangarajh M, Glas R, et al. Increased B cellsurvival and preferential activation of the memory compartment by a malariapolyclonal B cell activator. J Immunol. 2006 Sep;177(5):3035–44.
153. Whittle HC, Brown J, Marsh K, Blackman M, Jobe O, Shenton F. The effects ofPlasmodium falciparum malaria on immune control of B lymphocytes in Gambianchildren. Clin Exp Immunol. 1990 May;80(2):213–8.
154. Whittle HC, Brown J, Marsh K, Greenwood BM, Seidelin P, Tighe H, et al. Tcellcontrol of Epstein–Barr virusinfected B cells is lost during P. falciparum malaria.Nature. 1984 Nov;312(5993):449–50.
155. Moss DJ, Burrows SR, Castelino DJ, Kane RG, Pope JH, Rickinson AB, et al. Acomparison of EpsteinBarr virusspecific Tcell immunity in malariaendemic andnonendemic regions of Papua New Guinea. Int J Cancer. 1983Jun;31(6):727–32.
156. Lam KM, Syed N, Whittle H, Crawford DH. Circulating EpsteinBarr viruscarrying Bcells in acute malaria. Lancet. 1991 Apr;337(8746):876–8.
157. Donati D, Espmark E, Kironde F, Mbidde EK, Kamya M, Lundkvist A, et al.Clearance of circulating EpsteinBarr virus DNA in children with acute malariaafter antimalaria treatment. J Infect Dis. 2006 Apr;193(7):971–7.
108
158. Morrow RH Jr. Epidemiological evidence for the role of falciparum malaria in thepathogenesis of Burkitt’s lymphoma. IARC Sci Publ. 1985;(60):177–86.
159. GiulinoRoth L, Wang K, MacDonald TY, Mathew S, Tam Y, Cronin MT, et al.Targeted genomic sequencing of pediatric Burkitt lymphoma identifies recurrentalterations in antiapoptotic and chromatinremodeling genes. Blood. 2012Dec;120(26):5181–4.
160. Wagener R, Aukema SM, Schlesner M, Haake A, Burkhardt B, Claviez A, et al. ThePCBP1 gene encoding poly(rC) binding protein I is recurrently mutated in Burkittlymphoma. Genes Chromosomes Cancer. 2015 Sep;54(9):555–64.
161. Abate F, Ambrosio MR, Mundo L, Laginestra MA, Fuligni F, Rossi M, et al. DistinctViral and Mutational Spectrum of Endemic Burkitt Lymphoma. PLoS Pathog. 2015Oct;11(10):e1005158.
162. Oduor CI, Kaymaz Y, Chelimo K, Otieno JA, Ong’echa JM, Moormann AM, et al.Integrative microRNA and mRNA deepsequencing expression profiling inendemic Burkitt lymphoma. Vol. 17, BMC Cancer. 2017.
163. Kaymaz Y, Oduor CI, Yu H, Otieno JA, Ong’echa JM, Moormann AM, et al.Comprehensive Transcriptome and Mutational Profiling of Endemic BurkittLymphoma Reveals EBV Type–Specific Differences. Mol Cancer Res. 2017May;15(5):563–76.
164. Bouska A, Bi C, Lone W, Zhang W, Kedwaii A, Heavican T, et al. Adult highgradeBcell lymphoma with Burkitt lymphoma signature: genomic features and potentialtherapeutic targets. Blood. 2017 Oct;130(16):1819–31.
165. López C, Kleinheinz K, Aukema SM, Rohde M, Bernhart SH, Hübschmann D, et al.Genomic and transcriptomic changes complement each other in the pathogenesisof sporadic Burkitt lymphoma. Nat Commun. 2019 Mar;10(1):1459.
166. Ennishi D, Jiang A, Boyle M, Collinge B, Grande BM, BenNeriah S, et al. DoubleHitGene Expression Signature Defines a Distinct Subgroup of Germinal CenterBCellLike Diffuse Large BCell Lymphoma. J Clin Oncol. 2019Jan;37(3):190–201.
167. Sha C, Barrans S, Cucco F, Bentley MA, Care MA, Cummin T, et al. Molecularhighgrade B cell lymphoma: defining a poor risk group requiring differentapproaches to therapy. J Clin Oncol. 2019 Jan;37(3):202–13.
168. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka:accurate somatic smallvariant calling from sequenced tumor–normal samplepairs. Bioinformatics. 2012 Jul;28(14):1811–7.
169. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al.Mutational heterogeneity in cancer and the search for new cancerassociatedgenes. Nature. 2013 Jul;499(7457):214–8.
109
170. Arthur SE, Jiang A, Grande BM, Alcaide M, Cojocaru R, Rushton CK, et al.Genomewide discovery of somatic regulatory variants in diffuse large Bcelllymphoma. Nat Commun. 2018 Oct;9(1):4001.
171. Kretzmer H, Bernhart SH, Wang W, Haake A, Weniger MA, Bergmann AK, et al.DNA methylome analysis in Burkitt and follicular lymphomas identifies differentiallymethylated regions linked to somatic mutation and transcriptional control. NatGenet. 2015 Nov;47(11):1316–25.
172. Furukawa T, Kuboki Y, Tanji E, Yoshida S, Hatori T, Yamamoto M, et al.Wholeexome sequencing uncovers frequent GNAS mutations in intraductalpapillary mucinous neoplasms of the pancreas. Sci Rep. 2011 Nov;1:161.
173. Lyons J, Landis CA, Harsh G, Vallar L, Grünewald K, Feichtinger H, et al. Two Gprotein oncogenes in human endocrine tumors. Science. 1990Aug;249(4969):655–9.
174. Leiserson MDM, Wu HT, Vandin F, Raphael BJ. CoMEt: a statistical approach toidentify combinations of mutually exclusive alterations in cancer. Genome Biol.2015 Aug;16:160.
175. Leiserson M, Wu HT, Vandin F, Raphael B. CoMEt: A Statistical Approach to IdentifyCombinations of Mutually Exclusive Alterations in Cancer. 2015.
176. Jiang Y, Soong TD, Wang L, Melnick AM, Elemento O. Genomewide detection ofgenes targeted by nonIg somatic hypermutation in lymphoma. PLoS One. 2012Jul;7(7):e40332.
177. Bachl J, Carlson C, GraySchopfer V, Dessing M, Olsson C. Increased transcriptionlevels induce higher mutation rates in a hypermutating cell line. J Immunol. 2001Apr;166(8):5051–7.
178. Carramusa L, Contino F, Ferro A, Minafra L, Perconti G, Giallongo A, et al. ThePVT1 oncogene is a Myc protein target that is overexpressed in transformedcells. J Cell Physiol. 2007 Nov;213(2):511–8.
179. Puente XS, Beà S, ValdésMas R, Villamor N, GutiérrezAbril J, MartınSubero JI, etal. Noncoding recurrent mutations in chronic lymphocytic leukaemia. Nature.2015 Oct;526(7574):519–24.
180. Alexandrov LB, NikZainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al.Signatures of mutational processes in human cancer. Nature. 2013Aug;500(7463):415–21.
181. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, NikZainal S, et al.Clocklike mutational processes in human somatic cells. Nat Genet. 2015Dec;47(12):1402–7.
182. Xu JL, Davis MM. Diversity in the CDR3 region of V(H) is sufficient for most antibodyspecificities. Immunity. 2000 Jul;13(1):37–45.
110
183. Yassai MB, Naumov YN, Naumova EN, Gorski J. A clonotype nomenclature for Tcell receptors. Immunogenetics. 2009 Jul;61(7):493–502.
184. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, etal. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods.2015 May;12(5):380–1.
185. Bolotin DA, Poslavsky S, Davydov AN, Frenkel FE, Fanchi L, Zolotareva OI, et al.Antigen receptor repertoire profiling from RNAseq data. Nat Biotechnol. 2017Oct;35(10):908–11.
186. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The SequenceAlignment/Map format and SAMtools. Bioinformatics. 2009Aug;25(16):2078–9.
187. Li H. Aligning sequence reads, clone sequences and assembly contigs withBWAMEM. 2013 Mar; Available from: http://arxiv.org/abs/1303.3997
188. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing ofNGS alignment formats. Bioinformatics. 2015 Jun;31(12):2032–4.
189. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast andbiasaware quantification of transcript expression. Nat Methods. 2017Apr;14(4):417–9.
190. Butterfield YS, Kreitzman M, Thiessen N, Corbett RD, Li Y, Pang J, et al. JAGuaR:junction alignments to genome for RNAseq reads. PLoS One. 2014Jul;9(7):e102398.
191. Hezaveh K, Kloetgen A, Bernhart SH, Mahapatra KD, Lenze D, Richter J, et al.Alterations of microRNA and microRNAregulated messenger RNA expression ingerminal center Bcell lymphomas determined by integrative sequencing analysis.Haematologica. 2016 Nov;101(11):1380–9.
192. Marçais G, Kingsford C. A fast, lockfree approach for efficient parallel counting ofoccurrences of kmers. Bioinformatics. 2011 Mar;27(6):764–70.
193. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. Thevariant call format and VCFtools. Bioinformatics. 2011 Aug;27(15):2156–8.
194. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The EnsemblVariant Effect Predictor. Genome Biol. 2016 Jun;17(1):122.
195. GonzalezPerez A, LopezBigas N. Functional impact bias reveals cancer drivers.Nucleic Acids Res. 2012 Nov;40(21):e169.
196. Mularoni L, Sabarinathan R, DeuPons J, GonzalezPerez A, LópezBigas N.OncodriveFML: a general framework to identify coding and noncoding regionswith cancer driver mutations. Genome Biol. 2016 Jun;17(1):128.
197. Tamborero D, GonzalezPerez A, LopezBigas N. OncodriveCLUST: exploiting thepositional clustering of somatic mutations to identify cancer genes. Bioinformatics.2013 Sep;29(18):2238–44.
111
198. Zhao H, Sun Z, Wang J, Huang H, Kocher JP, Wang L. CrossMap: a versatile toolfor coordinate conversion between genome assemblies. Bioinformatics. 2014Apr;30(7):1006–7.
199. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. Thehuman genome browser at UCSC. Genome Res. 2002 Jun;12(6):996–1006.
200. Jones E, Oliphant T, Peterson P. SciPy: Open source scientific tools for Python.2001.
201. Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. Efficient ”pythonic” access to FASTAfiles using pyfaidx. PeerJ PrePrints; PeerJ Inc. 2015 Apr. Report No.: e1196.
202. Alexandrov LB, NikZainal S, Wedge DC, Campbell PJ, Stratton MR. Decipheringsignatures of mutational processes operative in human cancer. Cell Rep. 2013Jan;3(1):246–59.
203. Chen X, SchulzTrieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al.Manta: rapid detection of structural variants and indels for germline and cancersequencing applications. Bioinformatics. 2016 Apr;32(8):1220–2.
204. Larson D, abelhj, Chiang C, AbhijitBadve, Eldred J, Morton D. halllab/svtools:svtools v0.3.2. 2017.
205. Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza:allelespecific copy number and mutation profiles from tumor sequencing data.Ann Oncol. 2015;26:64–70.
206. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomicfeatures. Bioinformatics. 2010 Mar;26(6):841–2.
207. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. TheUCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004Jan;32(Database issue):D493–6.
208. Soneson C, Love MI, Robinson MD. Differential analyses for RNAseq:transcriptlevel estimates improve genelevel inferences. F1000Res. 2015Dec;4:1521.
209. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersionfor RNAseq data with DESeq2. Genome Biol. 2014;15(12):550.
210. Chu A, Robertson G, Brooks D, Mungall AJ, Birol I, Coope R, et al. Largescaleprofiling of microRNAs for The Cancer Genome Atlas. Nucleic Acids Res. 2016Jan;44(1):e3.
211. Kozomara A, GriffithsJones S. miRBase: annotating high confidence microRNAsusing deep sequencing data. Nucleic Acids Res. 2014 Jan;42(Databaseissue):D68–73.
212. Kozomara A, GriffithsJones S. miRBase: integrating microRNA annotation anddeepsequencing data. Nucleic Acids Res. 2011 Jan;39(Databaseissue):D152–7.
112
213. GriffithsJones S, Saini HK, Dongen S van, Enright AJ. miRBase: tools for microRNAgenomics. Nucleic Acids Res. 2008 Jan;36(Database issue):D154–8.
214. GriffithsJones S, Grocock RJ, Dongen S van, Bateman A, Enright AJ. miRBase:microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006Jan;34(Database issue):D140–4.
215. GriffithsJones S. The microRNA Registry. Nucleic Acids Res. 2004Jan;32(Database issue):D109–11.
216. R Core Team. R: A Language and Environment for Statistical Computing. Vienna,Austria: R Foundation for Statistical Computing; 2017.
217. Davis TL. argparse: Command Line Optional and Positional Argument Parser.2018.
218. Waggott D, Haider S, C. Boutros P. bedr: Genomic Region Processing using ToolsSuch as ’BEDTools’, ’BEDOPS’ and ’Tabix’. 2017.
219. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration ofgenomic datasets with the R/Bioconductor package biomaRt. Nat Protoc.2009;4:1184–91.
220. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, et al. BioMart andBioconductor: a powerful link between biological databases and microarray dataanalysis. Bioinformatics. 2005;21:3439–40.
221. Xie Y. bookdown: Authoring Books and Technical Documents with R Markdown.2018.
222. Xie Y. bookdown: Authoring Books and Technical Documents with R Markdown.Boca Raton, Florida: Chapman; Hall/CRC; 2016.
223. Robinson D. broom: Convert Statistical Analysis Objects into Tidy Data Frames.2017.
224. Gu Z, Gu L, Eils R, Schlesner M, Brors B. circlize implements and enhances circularvisualization in R. Bioinformatics. 2014;30:2811–2.
225. Wilke CO. cowplot: Streamlined Plot Theme and Plot Annotations for ’ggplot2’.2017.
226. Dowle M, Srinivasan A. data.table: Extension of ‘data.frame‘. 2018.
227. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersionfor RNAseq data with DESeq2. Genome Biol. 2014;15:550.
228. Wickham H, Francois R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation.2017.
229. Wickham H. feather: R Bindings to the Feather ’API’. 2016.
230. Gohel D. flextable: Functions for Tabular Reporting. 2018.
113
231. Wickham H. forcats: Tools for Working with Categorical Variables (Factors).2017.
232. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al.Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol.2013;9.
233. Clarke E, SherrillMix S. ggbeeswarm: Categorical Scatter (Violin Point) Plots.2017.
234. Attali D, Baker C. ggExtra: Add Marginal Histograms to ’ggplot2’, and More ’ggplot2’Enhancements. 2018.
235. Wickham H. ggplot2: Elegant Graphics for Data Analysis. SpringerVerlag New York;2009.
236. Slowikowski K. ggrepel: Repulsive Text and Label Geoms for ’ggplot2’. 2017.
237. AhlmannEltze C. ggsignif: Significance Brackets for ’ggplot2’. 2017.
238. Henry L, Wickham H, Chang W. ggstance: Horizontal ’ggplot2’ Components.2016.
239. Hahne F, Ivanek R. Statistical Genomics: Methods and Protocols. In: Mathé E, DavisS, editors. New York, NY: Springer New York; 2016. pp. 335–51.
240. Xie Y. knitr: A GeneralPurpose Package for Dynamic Report Generation in R.2018.
241. Xie Y. Dynamic Documents with R and knitr. 2nd ed. Boca Raton, Florida: Chapman;Hall/CRC; 2015.
242. Xie Y. knitr: A Comprehensive Tool for Reproducible Research in R. In: Stodden V,Leisch F, Peng RD, editors. Implementing Reproducible Computational Research.Chapman; Hall/CRC; 2014.
243. Wild F. lsa: Latent Semantic Analysis. 2015.
244. Mayakonda A, Koeffler PH. Maftools: Efficient analysis, visualization andsummarization of MAF files from largescale cohort based cancer studies. BioRxiv.2016;
245. Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum byincorporating continuous wavelet transformbased pattern matching. Vol. 22,Bioinformatics. 2006. pp. 2059–65.
246. Bengtsson H. matrixStats: Functions that Apply to Rows and Columns of Matrices(and to Vectors). 2018.
247. Kolde R. pheatmap: Pretty Heatmaps. 2015.
248. Gerds TA, Ozenne B. Publish: Format Output of Various Routines in a Suitable Wayfor Reports and Publication. 2018.
114
249. Henry L, Wickham H. purrr: Functional Programming Tools. 2018.
250. Neuwirth E. RColorBrewer: ColorBrewer Palettes. 2014.
251. Wickham H, Hester J, Francois R. readr: Read Rectangular Text Data. 2017.
252. Wickham H, Bryan J. readxl: Read Excel Files. 2017.
253. Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, SalibianBarrera M, etal. robustbase: Basic Robust Statistics. 2016.
254. Todorov V, Filzmoser P. An ObjectOriented Framework for Robust MultivariateAnalysis. J Stat Softw. 2009;32(3):1–47.
255. Wickham H. tidyverse: Easily Install and Load ’Tidyverse’ Packages. 2017.
256. Soneson C, Love MI, Robinson MD. Differential analyses for RNAseq:transcriptlevel estimates improve genelevel inferences. F1000Res. 2015;4.
257. Garnier S. viridis: Default Color Maps from ’matplotlib’. 2018.
258. Lung ML, Cheung AKL, Dai W, Leong MML, Tsao GSW. EpsteinBarr virus infectionsuppresses the DNA repair mechanisms in nasopharyngeal epithelial cells viareduction of the H3K4me3 mark. New Orleans, LA: 107th Annual Meeting of theAmerican Association for Cancer Research; American Association for CancerResearch; 2016.
259. Cho SW, Xu J, Sun R, Mumbach MR, Carter AC, Chen YG, et al. Promoter oflncRNA Gene PVT1 Is a TumorSuppressor DNA Boundary Element. Cell. 2018May;173(6):1398–1412.e22.
260. Li M, Chen D, Shiloh A, Luo J, Nikolaev AY, Qin J, et al. Deubiquitination of p53 byHAUSP is an important pathway for p53 stabilization. Nature. 2002Apr;416(6881):648–53.
261. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al.HighResolution CRISPR Screens Reveal Fitness Genes and GenotypeSpecificCancer Liabilities. Cell. 2015 Dec;163(6):1515–26.
262. Holowaty MN, Frappier L. HAUSP/USP7 as an EpsteinBarr virus target. BiochemSoc Trans. 2004 Nov;32(Pt 5):731–2.
263. Lindner HA. Deubiquitination in virus infection. Virology. 2007Jun;362(2):245–56.
264. Forte E, Luftig MA. MDM2dependent inhibition of p53 is required for EpsteinBarrvirus Bcell growth transformation and infectedcell survival. J Virol. 2009Mar;83(6):2491–9.
265. Renouf B, Hollville E, Pujals A, Tétaud C, Garibal J, Wiels J. Activation of p53 byMDM2 antagonists has differential apoptotic effects on EpsteinBarr virus(EBV)positive and EBVnegative Burkitt’s lymphoma cells. Leukemia. 2009Sep;23(9):1557–63.
115
266. Morin RD, MendezLago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, et al.Frequent mutation of histonemodifying genes in nonHodgkin lymphoma. Nature.2011 Jul;476(7360):298–303.
267. Nascimento EM, Cox CL, MacArthur S, Hussain S, Trotter M, Blanco S, et al. Theopposing transcriptional functions of Sin3a and cMyc are required to maintaintissue homeostasis. Nat Cell Biol. 2011 Nov;13(12):1395–405.
268. Nishiyama M, Skoultchi AI, Nakayama KI. Histone H1 recruitment by CHD8 isessential for suppression of the Wntβcatenin signaling pathway. Mol Cell Biol.2012 Jan;32(2):501–12.
269. Wilson BG, Roberts CWM. SWI/SNF nucleosome remodellers and cancer. Nat RevCancer. 2011 Jun;11(7):481–92.
270. Lunning MA, Green MR. Mutation of chromatin modifiers; an emerging hallmark ofgerminal center Bcell lymphomas. Blood Cancer J. 2015 Oct;5:e361.
271. Kadoch C, Crabtree GR. Mammalian SWI/SNF chromatin remodeling complexesand cancer: Mechanistic insights gained from human genomics. Sci Adv. 2015Jun;1(5):e1500447.
272. Nagl NG Jr, Wang X, Patsialou A, Van Scoy M, Moran E. Distinct mammalianSWI/SNF chromatin remodeling complexes with opposing roles in cellcyclecontrol. EMBO J. 2007 Feb;26(3):752–63.
273. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition ofnative chromatin for fast and sensitive epigenomic profiling of open chromatin,DNAbinding proteins and nucleosome position. Nat Methods. 2013Dec;10(12):1213–8.
274. Fujiwara S, Baek S, Varticovski L, Kim S, Hager GL. High Quality ATACSeq DataRecovered from Cryopreserved Breast Cell Lines and Tissue. Sci Rep. 2019Jan;9(1):516.
275. Farmer H, McCabe N, Lord CJ, Tutt ANJ, Johnson DA, Richardson TB, et al.Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy.Nature. 2005 Apr;434(7035):917–21.
276. Hoffman GR, Rahal R, Buxton F, Xiang K, McAllister G, Frias E, et al. Functionalepigenetics approach identifies BRM/SMARCA2 as a critical synthetic lethal targetin BRG1deficient cancers. Proc Natl Acad Sci U S A. 2014Feb;111(8):3128–33.
277. Helming KC, Wang X, Wilson BG, Vazquez F, Haswell JR, Manchester HE, et al.ARID1B is a specific vulnerability in ARID1Amutant cancers. Nat Med. 2014Mar;20(3):251–4.
278. Santen GWE, Aten E, Sun Y, Almomani R, Gilissen C, Nielsen M, et al. Mutations inSWI/SNF chromatin remodeling complex gene ARID1B cause CoffinSirissyndrome. Nat Genet. 2012 Mar;44(4):379–80.
116
279. Deciphering Developmental Disorders Study. Largescale discovery of novel geneticcauses of developmental disorders. Nature. 2015 Mar;519(7542):223–8.
280. Tishkoff SA, Williams SM. Genetic analysis of African populations: human evolutionand complex disease. Nat Rev Genet. 2002 Aug;3(8):611–21.
281. Bhat NM, Bieber MM, Chapman CJ, Stevenson FK, Teng NN. Human antilipid Amonoclonal antibodies bind to human B cells and the i antigen on cord red bloodcells. J Immunol. 1993 Nov;151(9):5011–21.
282. Spellerberg MB, Chapman CJ, Mockridge CI, Isenberg DA, Stevenson FK. Dualrecognition of lipid A and DNA by human antibodies encoded by the V H421gene: A possible link between infection and lupus. Hum Antibodies.1995;6(2):52–6.
283. Baptista MJ, Calpe E, Fernandez E, Colomo L, CardesaSalzmann TM, AbrisquetaP, et al. Analysis of the IGHV region in Burkitt’s lymphomas supports a germinalcenter origin and a role for superantigens in lymphomagenesis. Leuk Res. 2014Apr;38(4):509–15.
284. Amato T, Abate F, Piccaluga P, Iacono M, Fallerini C, Renieri A, et al. ClonalityAnalysis of Immunoglobulin Gene Rearrangement by NextGenerationSequencing in Endemic Burkitt Lymphoma Suggests Antigen Drive Activation ofBCR as Opposed to Sporadic Burkitt Lymphoma. Am J Clin Pathol. 2016Jan;145(1):116–27.
285. Lombardo KA, Coffey DG, Morales AJ, Carlson CS, Towlerton AMH, Gerdts SE, etal. Highthroughput sequencing of the Bcell receptor in African Burkitt lymphomareveals clues to pathogenesis. Blood Adv. 2017 Mar;1(9):535–44.
286. Martorelli D, Guidoboni M, De Re V, Muraro E, Turrini R, Merlo A, et al. IGKV3proteins as candidate ”offtheshelf” vaccines for kappalight chainrestricted BcellnonHodgkin lymphomas. Clin Cancer Res. 2012 Aug;18(15):4080–91.
287. Miller G. Immortalization of human lymphocytes by EpsteinBarr virus. Yale J BiolMed. 1982 May;55(34):305–10.
288. Bornkamm GW. EpsteinBarr virus and the pathogenesis of Burkitt’s lymphoma:more questions than answers. Int J Cancer. 2009 Apr;124(8):1745–55.
289. Okazaki IM, Hiai H, Kakazu N, Yamada S, Muramatsu M, Kinoshita K, et al.Constitutive expression of AID leads to tumorigenesis. J Exp Med. 2003May;197(9):1173–81.
290. Ramiro AR, Jankovic M, Eisenreich T, Difilippantonio S, ChenKiang S, MuramatsuM, et al. AID is required for cmyc/IgH chromosome translocations in vivo. Cell.2004 Aug;118(4):431–8.
291. Unniraman S, Zhou S, Schatz DG. Identification of an AIDindependent pathway forchromosomal translocations between the Igh switch region and Myc. NatImmunol. 2004 Nov;5(11):1117–23.
117
292. Pasqualucci L, Bhagat G, Jankovic M, Compagno M, Smith P, Muramatsu M, et al.AID is required for germinal centerderived lymphomagenesis. Nat Genet. 2008Jan;40(1):108–12.
293. Takizawa M, Tolarová H, Li Z, Dubois W, Lim S, Callen E, et al. AID expressionlevels determine the extent of cMyc oncogenic translocations and the incidence ofB cell tumor development. J Exp Med. 2008 Sep;205(9):1949–57.
294. Robbiani DF, Deroubaix S, Feldhahn N, Oliveira TY, Callen E, Wang Q, et al.Plasmodium Infection Promotes Genomic Instability and AIDDependent B CellLymphoma. Cell. 2015 Aug;162(4):727–37.
295. Riley KJ, Rabinowitz GS, Yario TA, Luna JM, Darnell RB, Steitz JA. EBV and humanmicroRNAs cotarget oncogenic and apoptotic viral and human genes duringlatency. EMBO J. 2012 May;31(9):2207–21.
296. Lin X, Tsai MH, Shumilov A, Poirey R, Bannert H, Middeldorp JM, et al. TheEpsteinBarr Virus BART miRNA Cluster of the M81 Strain Modulates MultipleFunctions in Primary B Cells. PLoS Pathog. 2015 Dec;11(12):e1005344.
297. Kang D, Skalsky RL, Cullen BR. EBV BART MicroRNAs Target MultipleProapoptotic Cellular Genes to Promote Epithelial Cell Survival. PLoS Pathog.2015 Jun;11(6):e1004979.
298. Kim H, Choi H, Lee SK. EpsteinBarr Virus MicroRNA miRBART205p SuppressesLytic Induction by Inhibiting BADMediated caspase3Dependent Apoptosis. JVirol. 2016 Feb;90(3):1359–68.
299. Harold C, Cox D, Riley KJ. EpsteinBarr viral microRNAs target caspase 3. Virol J.2016 Aug;13:145.
300. Reisman D, Yates J, Sugden B. A putative origin of replication of plasmids derivedfrom EpsteinBarr virus is composed of two cisacting components. Mol Cell Biol.1985 Aug;5(8):1822–32.
301. Sugden B, Marsh K, Yates J. A vector that replicates as a plasmid and can beefficiently selected in Blymphoblasts transformed by EpsteinBarr virus. Mol CellBiol. 1985 Feb;5(2):410–3.
302. Ambinder RF. Gammaherpesviruses and ”HitandRun” oncogenesis. Am J Pathol.2000 Jan;156(1):1–3.
303. Trivedi P, Zhang QJ, Chen F, Minarovits J, Ekman M, Biberfeld P, et al. Parallelexistence of EpsteinBarr virus (EBV) positive and negative cells in a sporadiccase of Burkitt lymphoma. Oncogene. 1995 Aug;11(3):505–10.
304. Snijder J, Ortego MS, Weidle C, Stuart AB, Gray MD, McElrath MJ, et al. AnAntibody Targeting the Fusion Machinery Neutralizes DualTropic Infection andDefines a Site of Vulnerability on EpsteinBarr Virus. Immunity. 2018Apr;48(4):799–811.e9.
118
305. Messick TE, Smith GR, Soldan SS, McDonnell ME, Deakyne JS, Malecka KA, et al.Structurebased design of smallmolecule inhibitors of EBNA1 DNA binding blocksEpsteinBarr virus latent infection and tumor growth. Sci Transl Med. 2019Mar;11(482).
306. Lee J, Kosowicz JG, Hayward SD, Desai P, Stone J, Lee JM, et al. PharmacologicActivation of Lytic EpsteinBarr Virus Gene Expression Without Virion Production.J Virol. 2019 Jul;
307. Razzouk BI, Srinivas S, Sample CE, Singh V, Sixbey JW. EpsteinBarr Virus DNArecombination and loss in sporadic Burkitt’s lymphoma. J Infect Dis. 1996Mar;173(3):529–35.
308. Ambrosio MR, Navari M, Di Lisio L, Leon EA, Onnis A, Gazaneo S, et al. TheEpstein Barrencoded BART63p microRNA affects regulation of cell growth andimmuno response in Burkitt lymphoma. Infect Agent Cancer. 2014 Apr;9:12.
309. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: theNCBI database of genetic variation. Nucleic Acids Res. 2001Jan;29(1):308–11.
310. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysisof proteincoding genetic variation in 60,706 humans. Nature. 2016Aug;536(7616):285–91.
119
Appendix A
Supplemental Data FileDescription:
Supplemental Table 1. Patient metadata. Clinical and molecular characteristics of thediscovery and validation cases. ICGC metadata are not republished here.
Supplemental Table 2. Simple somatic mutations in the discovery cohort. The mutationsare restricted to exonic and splice regions. Unless a mutation affected a BLassociatedgene and was nonsynonymous, we excluded all mutations with a minor allele fractiongreater than 10−4 according to dbSNP or ExAC.309,310 With the exception of the first twocolumns, this table follows The Cancer Genome Atlas (TCGA) Mutation AnnotationFormat (MAF).
Supplemental Table 3. Simple somatic mutations in the validation cohort. This tablefollows the same criteria as Supplemental Table 2.
Supplemental Table 4. Somatic copy number variations in the discovery cohort. With theexception of the first two columns, this table follows the segments output format bySequenza.205
Supplemental Table 5. Somatic structural variations in the discovery cohort. With theexception of the first two columns, this table follows the BEDPE output format by thesvtools vcftobedpe tool, which converted Manta VCF files.203,204
Supplemental Table 6. Noncoding mutation peaks.
Supplemental Table 7. Significantly mutated genes. This table shows the methods thatidentified each gene as significantly mutated (1) or not (0).
Supplemental Table 8. Mutation status for BLassociated genes and pathways. Thistable considers all mutations types displayed in Figure 2.4 (minus the ICGC cases).
Supplemental Table 9. Fisher’s exact tests on mutation prevalence. This table containsthe underlying counts of mutated and unmutated cases that were used in comparing themutation prevalence between disease subtypes (i.e. tumor EBV status, clinical variantstatus, and EBV genome type).
Filename:
GrandeBruno_Supplemental_Tables.xlsx
120
Appendix B
Mutation (Lollipop) PlotsThis appendix contains mutation plots (also known as lollipop plots) for everyBLassociated gene (BLG) that beared somatic nonsynonymous SSMs in the discoverycohort. The following plots were generated using the ProteinPaint tool by St. JudeChildren’s Research Hospital. Mutations detected in BL (N = 106 cases) and DLBCL (N =153 cases) genomes are shown above and below the gene model, respectively.
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Pro
tein
leng
th20
040
060
080
010
0012
0014
0016
00
200
400
600
800
1000
1200
1400
1600
Load
ing
...SM
AR
CA
4
Q19
4*X
531_
splic
e
L773PL783PT814ME821KD881GT910ME920K
2R
973W
2R
973Q
4
P974SS1155IG1162VR1189Q
3R1192CE1212delG1232S
R12
43W
Load
ing
...B
urki
tt ly
mph
oma
23 m
utat
ions
1 di
seas
e
R70
4WR
973Q
L109
2HN
1223
HN
1223
T
2Lo
adin
g ...
Diff
use
larg
e B
-cel
l lym
phom
a5
mut
atio
ns1
dise
ase
Fork
head
_NFo
rkhe
ad N
-term
inal
regi
on
QLQ
QLQ
HS
Ado
mai
n in
hel
icas
es a
nd a
ssoc
iate
d w
ith S
AN
T do
mai
ns
BR
Kdo
mai
n in
tran
scrip
tion
and
CH
RO
MO
dom
ain
helic
ases
SN
F2_N
SN
F2 fa
mily
N-te
rmin
al d
omai
n
DE
XD
cD
EA
D-li
ke h
elic
ases
sup
erfa
mily
. A d
iver
se fa
mily
of p
rote
ins
invo
lved
...
othe
rAT
P bi
ndin
g si
te [c
hem
ical
bin
ding
]
othe
rpu
tativ
e M
g++
bind
ing
site
[ion
bin
ding
]
Hel
icas
e_C
Hel
icas
e co
nser
ved
C-te
rmin
al d
omai
n
othe
rnu
cleo
tide
bind
ing
regi
on [c
hem
ical
bin
ding
]
othe
rAT
P-b
indi
ng s
ite [c
hem
ical
bin
ding
]
SnA
CS
nf2-
ATP
coup
ling,
chr
omat
in re
mod
ellin
g co
mpl
ex
Bro
mo_
SN
F2L2
Bro
mod
omai
n, S
NF2
L2-li
ke s
ubfa
mily
, spe
cific
to a
nim
als.
SN
F2L2
(SN
F2- .
..
othe
rac
etyl
lysi
ne b
indi
ng s
ite
MIS
SE
NS
EN
ON
SE
NS
EP
RO
TEIN
DE
LS
PLI
CE
Som
atic
143
144
145
146
147