Geneticandmolecularcharacterizationof …summit.sfu.ca/system/files/iritems1/20250/etd20639.pdf · 2020. 9. 27. · Geneticandmolecularcharacterizationof paediatricendemicandsporadicBurkitt

Genetic and molecular characterization ofpaediatric endemic and sporadic Burkitt

lymphomaby

Bruno Grande

B.Sc., McGill University, 2013

Thesis Submitted in Partial Fulfillment of theRequirements for the Degree of

Doctor of Philosophy

in theDepartment of Molecular Biology and Biochemistry

Faculty of Science

© Bruno Grande 2019SIMON FRASER UNIVERSITY

Fall 2019

Copyright in this work rests with the author. Please ensure that any reproductionor reuse is done in accordance with the relevant national copyright legislation.

ApprovalName: Bruno Grande

Degree: Doctor of Philosophy (Molecular Biology andBiochemistry)

Title: Genetic and molecular characterization of paediatricendemic and sporadic Burkitt lymphoma

Examining Committee: Chair: Christopher BehProfessor

Ryan D. MorinSenior SupervisorAssociate Professor

Jack N. ChenSupervisorProfessor

Sohrab P. ShahSupervisorAssociate ProfessorDepartments of Pathologyand Computer ScienceUniversity of British Columbia

Sharon M. GorskiInternal ExaminerProfessor

Sandeep S. DavéExternal ExaminerProfessorDepartment of MedicineDuke University

Date Defended: December 3rd, 2019

ii

Ethics Statement

iii

Abstract

Though generally curable with intensive chemotherapy in resourcerich settings, Burkitt

lymphoma (BL) remains a deadly disease in older patients and in subSaharan Africa.

EpsteinBarr virus (EBV) positivity is a feature in over 90% of cases in malariaendemic

regions and up to 30% elsewhere. However, the molecular features of BL have not been

comprehensively evaluated when taking into account tumour EBV status or geographic

origin. In this thesis, I describe an integrative analysis of whole genome and transcriptome

data generated from a large cohort of endemic and sporadic paediatric BL patients. This

approach revealed that the mutational landscape of BL genomes is primarily shaped by

four different processes, and that at least two of them—aberrant somatic hypermutation

and defects in DNA mismatch repair—appear associated with the presence of EBV. After

identifying novel candidate BL genes such as SIN3A, USP7, and CHD8, I explored the

incidence of mutations affecting genes and pathways involved with BL pathogenesis and

found that EBVpositive tumours had significantly fewer driver mutations, especially

among genes with roles in apoptosis, and that this difference did not exist when

comparing geographic subtypes of BL. I also identified a subset of immunoglobulin

variable region genes encoding clonal Bcell receptors (BCRs) that were disproportionally

used in the tumours, including IGHV434, known to produce autoreactive antibodies, and

IGKV320, a feature described in other Bcell malignancies but not yet in BL. Many of

these results suggest that tumour EBV status defines a specific BL entity irrespective of

geographic origin with particular molecular properties and distinct pathogenic

mechanisms. The novel mutation patterns identified here imply potential improvements

that could be brought to BL therapy. This includes the rational use of DNAdamaging

chemotherapy in some BL patients and targeted agents such as the CDK4/6 inhibitor

palbociclib in others. The importance of BCR signaling in BL strengthens the potential

benefit of inhibitors for PI3K, Syk and Src family kinases among these patients. Lastly, the

identification of USP7 as a tumoursuppressor gene in BL highlights the potential clinical

utility of MDM2 inhibitors in treating patients with otherwise wildtype TP53.

iv

Keywords: Burkitt lymphoma; cancer genomics; whole genome and transcriptome

sequencing; pathogenesis; Epstein–Barr virus

v

Dedication

To my dad,

whose fateful battle with brain cancer

inspired me to pursue cancer research,

and my mom,

who did everything in her power to

ensure I could pursue cancer research.

vi

Acknowledgements

In the final year of my undergraduate degree in biochemistry, I realized that I wanted to

pursue graduate studies in bioinformatics. If I had been aware of how grossly

underqualified I was at the time, I might have given up on the ambition altogether.

However, naïve as I was, I submitted applications to join various research groups focused

on cancer genomics. My supervisor, Ryan Morin, was the only professor willing to take a

chance on me, someone with virtually no knowledge of bioinformatics. I will be forever

grateful for the risk you took back then, and I hope this dissertation means the gamble

paid off. Over the past six years, you have been instrumental in my growth as a scientist,

a writer, a teacher, a collaborator, and most importantly, an independent and critical

thinker. The level of support you provided, especially during those pivotal first few years,

was above and beyond what I have come to expect from busy professors. I have never

felt like you were out of reach if I had a question to ask or was seeking feedback. Thank

you for believing in me and providing me with career opportunities.

My PhD journey included many productive collaborations and rewarding interactions with

other researchers and administrative staff. First, I would like to thank Jack Chen and

Sohrab Shah for sitting on my supervisory committee and providing guidance throughout

my degree. I enjoyed picking your brains during committee meetings and having

thoughtprovoking discussions about my research. Similarly, I wish to extend my

appreciation to Sharon Gorski and Sandeep Davé for agreeing to act as my internal and

external examiners, respectively. Second, I want to acknowledge the many collaborators

on the Burkitt Lymphoma Genome Sequencing Project, especially Daniela Gerhard and

Louis Staudt. I have learned much from your scientific rigour, lessons that I shall carry

with me for the rest of my career. Third, I must thank the graduate program assistant for

my department, Mimi Fourie. I am truly grateful for the continual assistance you provided

me throughout my PhD degree. Finally, I would like to recognize the monumental effort

required to manage a project of this scale, particularly the role played by Karen Novik.

You have the patience of a saint, and despite how complicated the project was at times,

everything about it felt organized thanks for you.

vii

I had the pleasure of working with some amazing labmates, many of whom I consider

friends. Together, we achieved something that we should be proud of: building a

supportive and enriching research environment that fosters collaboration and skill sharing.

I enjoyed participating in those spontaneous conversations around the lab on topics

ranging from science to board games, and everything in between. The environment you

helped create made it easier for me to weather the challenges and frustrations of graduate

school. Specifically, I had the privilege of working with these outstanding colleagues:

Marco Albuquerque, Miguel Alcaide, Sarah Arthur, Kevin Bushell, Lauren Chong, Krysta

Coyle, Daniel Fornika, Laura Hilton, Aixiang Jiang, Rebecca Johnston, Marija Jovanovic,

Nicole Knoetze, Prasath Pararajalingam, Christopher Rushton, Selin Jessa, Jeffrey Tang,

and Nicole Thomas. Thank you for being such an amazing team!

This project was made possible with the generous financial support from various funding

agencies. I want to thank the Foundation for Burkitt Lymphoma Research, including its

Scientific Advisory Board, and the National Cancer Institute for their role in initiating,

funding, managing, and advising for this project. I also wish to acknowledge Simon Fraser

University and its private donors for endowing the following awards: Graduate Fellowship,

Dr. Bruce Brandhorst Graduate Prize in MBB, Travel and Minor Research Award,

Weyerhaeuser Molecular Biology Graduate Scholarship, President’s PhD Scholarship,

and Dean’s Graduate Fellowship. My stipend was funded in part by Genome Canada,

Genome British Columbia, the Canadian Institutes of Health Research, Mitacs, and the

Team Finn Foundation. Travel funds were provided by the Canadian Institutes of Health,

the Canadian Cancer Society, the John Bosdet Memorial Fund with BC Cancer, and the

Foundation for Burkitt Lymphoma Research.

These acknowledgements would not be complete if I did not thank my partner, Santina

Lin, for all of the moral support she has given me over the years. As a fellow

bioinformatician, you could actually empathize when I complained about software

installation issues or cryptic error messages in R. I have always felt like I had a shoulder

to lean on when the science proved difficult. You have this amazing knack for inspiring me

with your achievements, which encourages me to push myself harder and aim higher.

Through thick and thin, you stood by me and I will never forget that. I could not ask for a

better best friend.

viii

For my final acknowledgements, I need to provide some context. On Christmas Eve 1997,

my family found out that my dad had a brain tumour. We were told that it was inoperable

and prognosis was bleak. The doctors estimated that my dad had six months to live at

best. That would be the end of it if my parents had accepted their fate. I was 6 years old at

the time, and my brother and sister were even younger. We simply would not have known

our dad. That would indeed be the case if it was not for my parents’ determination. Within

a few weeks, we found a neurosurgeon willing to operate on my dad. The surgery was

successful and had no neurological complications. My dad was back at work a mere two

months later and resumed his life as if the whole thing had just been a nightmare.

Alas, I am afraid this story does not have a happy ending. Five years later, owing to a limp

my dad developed, we became aware that the tumour had started growing again. A

second surgery was performed, but my dad was not so lucky this time. Brain swelling

prevented the neurosurgeon from replacing the part of his skull that had been removed for

the procedure. The operation resulted in a severe loss of motor skills on his left side. The

builtup intracranial pressure led to a steady deteriotation of his vision until he became

completely blind. After years of being under control, his epilepsy started acting up. I will

never forget the moment when I was 14 years old and had to call the ambulance because

my dad was uttering things as if his mind had travelled more than a decade back in time.

Little did I know that was the last time he would ever be home. Three months later, my

dad fell into a coma and drew his final breath on September 14th, 2005.

I share this story because it helps the readers fully appreciate why I am so grateful for my

parents. I remember my dad persevering, not losing his sense of humour, his loving

nature, his soul. I got to witness his courage firsthand in the face of grim adversity, and I

am a better person for it. However, the true hero of this story is one who worked tirelessly

in the background: my mom. You were the one who accompanied dad to every

appointment; who stayed long hours at the hospital; who helped him get around when he

lost his vision; who took on the burden of providing for a family of four as a widow; who

sacrificed much so that I had the opportunity to achieve my dreams. Simply put, I would

not be where I am today if it were not for your incredible determination and strength. From

the bottom of my heart, thank you, mom, for everything you have done for me.

ix

Table of Contents

Approval ii

Ethics Statement iii

Abstract iv

Dedication vi

Acknowledgements vii

Table of Contents x

List of Tables xiii

List of Figures xiv

Glossary xvi

Preface xviii

1 Introduction to Burkitt Lymphoma 1

1.1 Clinical and epidemiological features . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Pathogenesis of Burkitt lymphoma . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Celloforigin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.2 Role of MYC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.3 Known genetic and molecular aberrations . . . . . . . . . . . . . . . 10

1.2.4 Epstein–Barr virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.5 Malaria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Problem statement and thesis overview . . . . . . . . . . . . . . . . . . . . 19

2 Discovery of genetic and molecular aberrations in BL 20

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

x

2.2.1 Clinical and molecular characteristics of BL cases . . . . . . . . . . 22

2.2.2 Datadriven inference of tumour EBV status and genome type . . . 25

2.2.3 Structural and copy number variations affecting MYC . . . . . . . . 25

2.2.4 Refining list of genes with potential roles in BL pathogenesis . . . . 28

2.2.5 Challenges with genetic comparison between BL and DLBCL . . . . 31

2.2.6 Novel mutation patterns in BLassociated genes . . . . . . . . . . . 31

2.2.7 Landscape of noncoding mutations shaped by somatic hypermutation 33

2.2.8 Robust identification of mutational signatures in BL genomes . . . . 38

2.2.9 Nonuniform V gene segment usage in immunoglobulin repertoire . 43

2.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3.1 Case accrual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3.2 Sample processing and nucleic acid extraction . . . . . . . . . . . . 49

2.3.3 Library construction and sequencing . . . . . . . . . . . . . . . . . . 50

2.3.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 EBV defines a BL entity with distinct molecular and pathogenic features 64

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2.1 Fewer driver mutations in EBVpositive BL despite mutation burden 66

3.2.2 Variation in mutation burden explained by mutational signatures . . 67

3.2.3 Proteinaltering mutations associated with tumour EBV status . . . . 71

3.2.4 Deregulated AICDA activity in EBVpositive BL . . . . . . . . . . . . 72

3.2.5 EBV genome copy number uncorrelated with EBVassociated effects 73

3.2.6 Genetic comparison of intraabdominal and headonly tumours . . . 76

3.2.7 Variable distribution of MYC breakpoints in BL subtypes . . . . . . . 76

3.2.8 V gene usage not determined by tumour EBV status . . . . . . . . . 77

3.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.3.1 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4 Discussion and future directions 81

4.1 De novo mutational signatures . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Noncoding mutation peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

xi

4.3 Nonsynonymous mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Bcell receptor repertoire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.5 Epstein–Barr virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.6 Hitandrun hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Bibliography 97

Appendix A Supplemental Data File 120

Appendix B Mutation (Lollipop) Plots 121

xii

List of TablesTable 1.1 Overview of clinical variants . . . . . . . . . . . . . . . . . . . . . . . . . 5

Table 2.1 Clinical and molecular summary of discovery cohort . . . . . . . . . . . . 23

Table 2.2 Clinical and molecular summary of validation cohort . . . . . . . . . . . . 24

Table 3.1 Linear regression of mutational signatures . . . . . . . . . . . . . . . . . 71

Table 3.2 McNemar’s test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Table 3.3 Linear regression of AICDA expression . . . . . . . . . . . . . . . . . . . 75

Table 3.4 Linear regression of breakpoint distance from MYC . . . . . . . . . . . . 77

xiii

List of FiguresFigure 1.1 Endemic BL patient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure 1.2 BL distribution in Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Figure 1.3 Interplay between EBV and malaria . . . . . . . . . . . . . . . . . . . . 4

Figure 1.4 Diagnostic methodology for highgrade Bcell lymphomas . . . . . . . 6

Figure 1.5 Bcell development and germinal centre Bcell lymphomas . . . . . . . 8

Figure 1.6 Molecular pathways contributing to BL pathogenesis . . . . . . . . . . 11

Figure 2.1 Molecular differences between EBVpositive and EBVnegative BL . . 26

Figure 2.2 Translocations between MYC and immunoglobulin loci . . . . . . . . . 27

Figure 2.3 Landscape of copy number variations . . . . . . . . . . . . . . . . . . 28

Figure 2.4 Nonsynonymous mutations in BLassociated genes . . . . . . . . . . 30

Figure 2.5 Structural variations in DDX3X . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 2.6 Splicing branch point mutations in DDX3X . . . . . . . . . . . . . . . . 32

Figure 2.7 AICDA mutations in BLassociated genes . . . . . . . . . . . . . . . . 34

Figure 2.8 Mutually exclusive mutations in BLassociated pathways . . . . . . . . 34

Figure 2.9 Features of noncoding mutation peaks . . . . . . . . . . . . . . . . . . 36

Figure 2.10 AICDA mutations in noncoding mutation peaks . . . . . . . . . . . . . 37

Figure 2.11 Peak gene expression as a function of peak mutation status . . . . . . 37

Figure 2.12 Correlation between AICDA and mutations within peaks . . . . . . . . 38

Figure 2.13 Known and novel targets of aberrant somatic hypermutation . . . . . . 39

Figure 2.14 Characteristics of de novo mutational signatures . . . . . . . . . . . . 41

Figure 2.15 Prevalence of de novo mutational signatures . . . . . . . . . . . . . . 42

Figure 2.16 Correlation with de novo mutational signatures . . . . . . . . . . . . . 43

Figure 2.17 Dominant immunoglobulin rearrangements . . . . . . . . . . . . . . . . 45

Figure 2.18 Immunoglobulin V gene usage in BL and DLBCL . . . . . . . . . . . . 46

Figure 3.1 Genomewide mutation burden per BL subtype . . . . . . . . . . . . . 68

Figure 3.2 Mutation burden in BLassociated genes per BL subtype . . . . . . . . 69

Figure 3.3 Mutational signatures per BL subtype . . . . . . . . . . . . . . . . . . . 70

Figure 3.4 Differential incidence of nonsynonymous mutations in BL subtypes . . 72

xiv

Figure 3.5 AICDA expression per BL subtype . . . . . . . . . . . . . . . . . . . . 74

Figure 3.6 Correlation with EBV genome copy number . . . . . . . . . . . . . . . 75

Figure 3.7 Genetic comparison of anatomic BL subtypes . . . . . . . . . . . . . . 77

Figure 3.8 Immunoglobulin V gene usage per BL subtypes . . . . . . . . . . . . . 79

Figure 4.1 PVT1 promoter mutations and MYC activation . . . . . . . . . . . . . . 84

Figure 4.2 PVT1 promoter mutations and BL pathogenesis . . . . . . . . . . . . . 85

Figure 4.3 USP7 mutations and/or EBVencoded EBNA1 and TP53 degradation . 86

Figure 4.4 SIN3A and repression of MYC target genes . . . . . . . . . . . . . . . 87

Figure 4.5 CHD8 and repression of gene expression via chromatin remodelling . 88

Figure 4.6 Spontaneous loss of EBV during cell division . . . . . . . . . . . . . . 94

Figure 4.7 Putative model for BL pathogenesis . . . . . . . . . . . . . . . . . . . . 95

xv

GlossaryAICDA: Activationinduced cytidine deaminase. Mutagenic enzyme with a role ingenerating IG diversity during Bcell development, also known as AID.

aSHM: Aberrant SHM. Mutagenesis associated with AICDA activity that targets genomicregions outside of those normally affected by physiologic SHM.

BCR: Bcell receptor. Surfacebound IG.

BL: Burkitt lymphoma. An aggressive Bcell nonHodgkin lymphoma defined by MYCtranslocations and associated with EBV and malaria.

BLG: BLassociated gene. Gene identified as being potentially relevant to BLpathogenesis by virtue of being a recurrently mutated gene previously associated with BLor an SMG supported by at least two different methods.

BLGSP: Burkitt Lymphoma Genome Sequencing Project. International collaboration thatis funding, managing, and sequencing BL tumour genomes and transcriptomes.

CDR3: Complementaritydetermining region 3. Most variable region of an IG chain,spanning the VD, DJ, and/or VJ recombination junctions.

CNV: Copy number variation. Mutation type involving the copy number gain or loss ofgenomic segments of any size.

COSMIC: Catalogue Of Somatic Mutations In Cancer. Database containing variousfeatures of tumour genomes, including reference mutational signatures.

DLBCL: Diffuse large Bcell lymphoma. The most common form of NHL, featuringaggressive growth and molecular heterogeneity.

EBV: Epstein–Barr virus. A ubiquitous ɣherpesvirus initially discovered in BL tumour cellsbut later found in most adults and known to cause infectious mononucleosis.

FF: Fresh frozen. Method for preserving tumour tissue that is considered the goldstandard to ensure the quality of nucleic acids for sequencing.

FFPE: Formalinfixed paraffinembedded. Method for preserving tumour tissue that isassociated with lower quality of nucleic acid for sequencing.

FISH: Fluorescence in situ hybridization. Method for locating DNA/RNA sequences incells using fluorescence, often for determining the presence or absence of SVs.

HIV: Human immunodeficiency virus. Viral cause of AIDS.

ICGC: International Cancer Genome Consortium. Global collaboration of researchersperforming genomic, transcriptomic, and epigenomic analyses of tumours samples forvarious cancer types.

xvi

IG: Immunoglobulin. Term referring to the immunoglobulin protein(s), component(s) of theBCR or antibodies, or the associated gene(s).

IGH: Immunoglobulin heavy chain. IG heavy chain gene locus on chromosome 14.

IGK: Immunoglobulin κ light chain. IG light chain gene locus on chromosome 2.

IGL: Immunoglobulin λ light chain. IG light chain gene locus on chromosome 22.

Indel: Small insertion or deletion. Mutations consisting of inserted or deleted DNAsequence, generally less than 100 bp.

ISH: In situ hybridization. Method for locating DNA/RNA sequences in cells usingdetectable probes, often for determining the presence of absence of foreign nucleic acids(e.g. EBV EBER RNAs).

LCL: Lymphoblastoid cell line. Immortalized cell line derived from B cells.

MMR: Mismatch repair. Pathway for repairing small DNA errors.

NHL: NonHodgkin lymphoma. Class of lymphomas that includes BL and DLBCL.

PCR: Polymerase chain reaction. Method for amplifying nucleic acids.

PI3K: Phosphoinositide 3kinase. Class of enzymes involved in cell growth.

R: R programming language. Statistical programming language.

RNAseq: RNA sequencing.

SHM: Somatic hypermutation. Mutagenesis associated with AICDA activity that can eitherbe physiologic or ontarget, giving rise to IG diversity, or aberrant or offtarget, potentiallyintroducing driver mutations.

SNV: Single nucleotide variant. Singlebase substitution.

SOP: Standard operating procedure.

SSM: Simple somatic variant. Somatic SNV or indel.

SV: Structural variation. Mostly translocations and inversions.

SWI/SNF: Switch/sucrose nonfermentable.

TSS: Transcription start site. First base of the first exon of a gene transcript.

V(D)J: Variable, diversity, and joining gene segments. Gene segments that arerecombined to form the IG CDR3 region.

VAF: Variant allele fraction. Fraction of reads supporting an alternate allele.

VCF: Variant call format. File format for storing mutations.

WGS:Whole genome sequencing.

WHO:World Health Organization.

xvii

Preface

This thesis is an expanded version of the material originally published in Grande et al,

“Genomewide discovery of somatic coding and noncoding mutations in pediatric endemic

and sporadic Burkitt lymphoma”, Blood, 2019;133:13131324.1 Under the supervision of

Ryan Morin, I led the computational component of this project, including the analysis,

interpretation, and presentation of the sequencing data and clinical metadata. More

specifically, I designed and performed data analyses, implemented software tools,

maintained quality control, benchmarked computational methodologies, produced figures

and tables, and wrote the text. Furthermore, I was the first bioinformatics graduate

student in my research group, entailing work that is not captured in this thesis. Notably, I

set up the computational infrastructure for the laboratory virtually from scratch and

established standard analytical pipelines. I also played a central role in training incoming

undergraduate and graduate students as well as postdoctoral fellows in bioinformatics.

These responsibilities were central to my training as a PhD student.

Chapter 2 includes key contributions from coauthors of the above paper. Aixiang Jiang

and Ryan Morin designed and ran the Rainstorm and Doppler methodology for identifying

noncoding mutation peaks. Luka Culibrk and Eric Zhao ran the pipeline for determining

de novo mutational signatures. Nicole Knoetze designed the methodology for identifying

immunoglobulin clonotypes. Christopher Rushton authored a software tool for detecting

mutations that overlap the AICDA recognition motif and quantifying any enrichment or

depletion of such mutations. George Wright designed the McNemar’s test analysis. Corey

Casper, Thomas Gross, Elaine Jaffe, and Sam Mbulaiteye reviewed and advised on

consensus anatomic site classification. Daniela Gerhard, John Irvin, Jean Paul Martin,

MarieReine Martin, Marco Marra, Ryan Morin, and Louis Staudt designed and/or directed

the study. All other coauthors contributed to sample accrual, quality control and

processing, data generation and management, and logistics.

This thesis follows the convention of italicizing gene names whereas nonitalized gene

names refer to any encoded protein.

xviii

Chapter 1

Introduction to Burkitt Lymphoma

Burkitt lymphoma (BL) is a highly aggressive Bcell nonHodgkin lymphoma. It is

considered by some to be the Rosetta Stone of cancer research for its pivotal role in

historical discoveries in the field.2,3 It was the first human malignancy to have a viral

aetiology. It was the first tumour in which the activation of an oncogene via chromosomal

rearrangement was demonstrated. These rearrangements ultimately led to the discovery

of their target, MYC, now recognized as a quintessential protooncogene in many

cancers. It was also one of the first tumours to achieve high cure rates with chemotherapy

alone. To this day though, despite these important discoveries, researchers and clinicians

still face several questions and challenges related to prevention, diagnosis, pathogenesis,

and treatment of BL.

1.1 Clinical and epidemiological features

BL was first described in Uganda as a sarcoma by Denis Burkitt in 1958 but was later

recognized as a lymphoma.4,5 BL is most common in African children aged 2 to 8,

accounting for roughly half of paediatric cancer cases in some areas.4–6 BL predominantly

affects male patients, with maletofemale ratios ranging between 1.6:1 and 4:1.6–9 The

most striking feature of these tumours, other than their rapid growth, is their clinical

presentation. In the regions where this cancer is most common, the majority of BL

tumours affect the upper and/or lower jaw, often resulting in loss of teeth and abnormal

protrusion of the eyes (Figure 1.1).6 The abdomen is the second most frequently involved

anatomic site, presenting as abdominal swelling.6 Due to the rapid tumour growth, most

children die from BL within six months if untreated.6,10

The geographical distribution of BL incidence in Africa was determined through surveys

performed by mail or in person.6,11,12 Most cases were diagnosed in tropical equatorial

Africa, including a “tail” running down the African East Coast, forming the socalled

1

Figure 1.1: Endemic BL patient. “Large facial Burkitt’s Lymphoma” from Mike Blyth, licensedunder CC BYSA 2.5.

“lymphoma belt” (Figure 1.2). The map of BL incidence was found to closely correspond

to areas that (1) are below 1,500 m in altitude where average temperatures are above

15°C and (2) receive over 50 cm of rainfall per year.13 Distant regions with similar

geographical features, namely Papua–New Guinea, were later found to share the

elevated BL incidence first noted in equatorial Africa.10 Notably, the lymphoma belt

overlapped the geographical distribution of certain groups of mosquitos, which led to the

hypothesis that a mosquitoborne pathogen may be playing a role in BL tumour

formation.13 While a virus was initially suspected, other aetiological factors were also

proposed, such as malaria.14–16

In 1964, Epstein and colleagues discovered a ɣherpesvirus infecting tumour cells in

African BL and the same virus was also found in BL tumours from Papua–New

Guinea.18,19 This later became known as the Epstein–Barr virus (EBV) and the causative

agent for infectious mononucleosis.20 Over time, it was established that the virus was not

restricted to Africa, nor was the infection unique to BL patients within Africa.21,22 EBV was

nonetheless significantly more common in BL cases compared to healthy control cases.22

The paediatric nature of BL in equatorial Africa is consistent with early EBV infection seen

in these populations, which typically occurs during the first 16 months of infancy.23 A later

study also found that high serum antibody titres to EBV proteins were a risk factor for

2

https://commons.wikimedia.org/wiki/File:Large_facial_Burkitt's_Lymphoma.JPG

https://creativecommons.org/licenses/by-sa/2.5/legalcode

Figure 1.2: BL distribution in Africa. Areas indicated in black, roughly corresponding to equatorialAfrica, have the highest BL incidence. This is Figure 1 reprinted with permission from Burkitt,1983.17

developing BL.24 Therefore, these epidemiological findings suggested that EBV alone

could not trigger lymphomagenesis, but an aetiological link between EBV and BL could

not be excluded.

The ubiquity of EBV stimulated an increased focus on malaria as the primary

environmental factor responsible for the unique geographical distribution of BL in

equatorial Africa and Papua–New Guinea. Evidence for this hypothesis steadily

accumulated during the 1960s.17 First, local malarial intensity correlated with BL

incidence.25 The malignancy was rarely diagnosed in areas with little to no malaria,

including certain African islands (e.g. Zanzibar, Pemba, and Seychelles); urban

environments with limited mosquito breeding grounds; and areas with malarial control or

complete eradication (e.g. Kinshasa, Sri Lanka).25 For example, a decrease in severe

malaria infection in the Mengo Districts of Uganda coincided with a substantial decline in

BL incidence.26 In addition, preliminary studies showed an interesting relationship

between BL and the sickle cell trait, which protects against malarial infection. Despite

sharing a similar geographical distribution as malaria—and by extension, BL—the sickle

cell trait is less prevalent among BL patients, consistent with a shared susceptibility to

malaria and BL.27,28

3

The relationship between malaria and the age of BL incidence provides additional

evidence for an aetiological link.29 One report demonstrated a correlation between BL

incidence and the multiplicity of malaria infection in Ghana and Tanzania.30 More

specifically, both measures peak between 5 and 9 years of age. Notably, immigrants from

lowintensity malaria areas (e.g. highaltitude Rwanda and Burundi) have a distinct age

distribution of BL incidence.26,31 In one Ugandan study, roughly 50% of such immigrants

who were diagnosed with BL were over the age of 15 years.31 These results suggest that

intense malarial infection serves as a triggering event for BL formation, possibly in

conjunction with EBV (Figure 1.3).

Figure 1.3: Interplay between EBV and malaria. This is TextFigure 1 reprinted with permissionfrom Burkitt, 1969.25

Shortly after the initial description of BL, a number of reports from regions outside those

described above detailed cases of Bcell lymphoma that were indistinguishable at the

histological level from those in Africa.5,10,32,33 However, the incidence of these tumours

was much lower than their African counterparts. This discovery ultimately resulted in the

definition of epidemiological variants for BL known as clinical variants. Patients diagnosed

in malariaendemic areas are considered endemic BL (eBL) whereas those diagnosed

elsewhere represent the sporadic BL (sBL) variant. A third epidemiological subgroup was

defined after the observation that BL can arise as a complication in immunocompromised

patients. This disease, referred to as immunodeficiencyrelated BL, was first recognized

4

Table 1.1: Characteristics of the clinical variants of BL. This is Table 5.1 adapted from Robertson,2013.38

Variable Endemic BL Sporadic BL ImmunodeficiencyrelatedBL

Geography Equatorial Africa Worldwide Worldwide

Age incidence Children Children and adults Adults

Anatomic sites Jaws, facial bones,kidneys, liver,gonads, breast

Ileocecal region,Waldeyer’s ring,gonads, breast

Nodal, centralnervous system(CNS)

EBV infection 100% 5–30% 25–40%

Enviromental factor Malaria, arbovirus,euphorbia

NA NA

MYC breakpoints Far 5’ Exon, intron 1, and 5’ Exon and intron 1

IGH breakpoints VDJ region Switch region Switch region

Somatic IGH mutation Yes Yes Yes

during the human immunodeficiency virus (HIV) epidemic, but was also linked to

prolonged immunosuppression following organ transplantation.34–37 The three subtypes

differ in terms of epidemiological and clinical features such as incidence, association with

malaria and EBV, age of diagnosis, and anatomic sites affected by tumour growth (Table

1.1). Genetic and molecular differences were subsequently found, especially following the

emergence of highthroughput sequencing.

The criteria for BL diagnosis are summarized in the World Health Organization (WHO)

Classification of Tumours of Haematopoietic and Lymphoid Tissues (Figure 1.4).39 They

are primarily based on cell morphology, immunophenotype, and fluorescence in situ

hybridization (FISH). Briefly, BL morphology usually adopts a “starrysky” appearance

consisting of uniform mediumsized basophilic lymphoid cells with interspersed

macrophages forming the “stars” where BL cells underwent apoptosis. At the

immunohistochemical level, the tumour cells should be positive for surface

immunoglobulin, Bcell markers (i.e. CD19, CD20, CD22, CD79A, and PAX5), and

germinalcentre markers (i.e. CD10 and BCL6) while having little to no BCL2 staining. The

proliferation fraction marked by MKI67 is expected to be close to 100%. BL tumours

should also have strong MYC protein staining and are often positive for the MYC FISH

breakapart assay, which detects translocations affecting MYC, a genetic hallmark of BL.

5

These criteria apply equally to both eBL and sBL, which remain indistinguishable using

modern techniques. In practice, the distinction between BL and other highgrade Bcell

lymphomas such as diffuse large Bcell lymphoma (DLBCL) is not always welldefined

and can result in misdiagnosis. This problem is exacerbated in resourcepoor settings,

including equatorial Africa, which often lack facilities for performing more expensive

diagnostic tests such as immunohistochemical staining. Misdiagnosis is often fatal for BL

patients because they are treated with inappropriate regimens.40

Figure 1.4: Diagnostic methodology for highgrade Bcell lymphomas. This is Figure 4 reprintedwith permission from Swerdlow et al., 2016.41

In general, BL tumours tend to dramatically respond to intensive chemotherapy and are

considered curable for children in countries where proper supportive care is readily

available to manage treatmentrelated toxicity.42–45 Chemotherapeutic regimens typically

include a combination of cyclophosphamide, vincristine, prednisolone, doxorubicin,

cytarabine, and/or highdose methotrexate.46,47 However, BL remains fatal for children in

subSaharan Africa due to several reasons, including diagnosis typically occurring at an

advanced stage, the limited capacity to support intensive chemotherapeutic regimens,

and the confounding effects of poverty.48–51 Overall survival for eBL varies between 40%

and 70%.48,52,53 In the sporadic setting, treating adult and elderly patients has also been a

challenge and associated with high mortality.45 However, current clinical trials are

showing promise in overcoming the limitations of current treatment regimens.54 BL

relapse is rare, but if it does occur, it is seen within the first year after diagnosis and is

usually fatal.39,55,56 Prognostic indicators for BL include disease stage, bone marrow or

6

central nervous system involvement, unresected tumour size, serum lactate

dehydrogenase levels, and age.39

1.2 Pathogenesis of Burkitt lymphoma

1.2.1 Celloforigin

Bcell development is a highly regulated process whereby B cells progressively

differentiate by rearranging their genome in order to produce antibodies, also known as

immunoglobulins (IGs).57 An IG is composed of a heavy chain and a light chain. The

heavy chain is encoded by the IG heavy (IGH) locus, whereas the light chain is encoded

by either the IG κ (kappa; IGK) or λ (lambda; IGL) locus. Initially, B cells start off with a

germline configuration for all IG loci. The transition from a haematopoietic stem cell to an

immature B cell occurs in the bone marrow. First, the IGH locus undergoes VDJ

rearrangement, which results in the selection and juxtaposition of a variable (V) gene

segment, a diversity (D) gene segment, and a joining (J) gene segment. Second, the IGK

and/or IGL loci, which lack diversity segments, undergo VJ rearrangement. The purpose

of V(D)J rearrangement is to produce a diverse repertoire of IGs—and thus

antibodies—capable of detecting and responding to virtually any pathogen.

Following V(D)J rearrangement, the immature B cell exits the bone marrow and enters the

peripheral circulation, where it expresses the IG on the cell surface in the form of a Bcell

receptor (BCR).57 Upon antigenic stimulation of the BCR, B cells enter the germinal

centre, which are transient structures in secondary lymphoid organs wherein they

complete affinity maturation (Figure 1.5). These cells become centroblasts, which

comprise rapidly dividing B cells in the germinal centre dark zone. Here, centroblasts

undergo somatic hypermutation (SHM) of the IG loci. This process involves the

introduction of mutations within the variable regions of the IG loci in an effort to produce

antibodies with higher affinity for the initiating antigen. This process is catalytically driven

by activationinduced cytidine deaminase (AICDA), also known as AID. Centroblasts that

have undergone some degree of SHM transit to the germinal centre light zone where they

become centrocytes and cease to proliferate. Based on the antigen affinity of their BCR,

centrocytes are either selected to differentiate into plasma cells or memory B cells or are

eliminated via apoptosis in the event of disadvantageous mutations. Alternatively,

7

centrocytes may reenter the dark zone for additional cycles of proliferation and SHM in a

process called “cyclic reentry”.

At every step of Bcell development, the tight regulation that is in place can fail and result

in malignant transformation (Figure 1.5). The type of B cell that gives rise to a particular

lymphoma is termed the “celloforigin”. The postulated celloforigin for BL is one that has

underwent the germinal centre reaction given that the IG loci have been mutated by

AICDA.58–60 More precisely, BL cells most closely resemble centroblasts from the

germinal centre dark zone in terms of gene expression.61 The celloforigin framework

also accounts for the histological similarity between BL and DLBCL tumours considering

that the latter can arise from the germinal centre as well. Consistent with their germinal

centre origin, BL and DLBCL often acquire mutations in nonIG regions due to the

offtarget enzymatic activity of AICDA in a process called aberrant SHM (aSHM). Due to

aSHM, several genes are “hypermutated” in lymphomas including MYC.62 Because

AICDA primarily targets singlestranded DNA, aSHM mostly affects the first kilobase (kbp)

downstream of transcription start sites (TSS) for actively transcribed genes.63–65

Figure 1.5: Bcell development and germinal centre Bcell lymphomas. This is Figure 2 adaptedwith permission from Basso et al., 2015.66

8

1.2.2 Role of MYC

The MYC gene encodes for the transcription factor MYC, which is estimated to regulate

up to 20% of all human genes.67 These target genes have roles in several important

biological processes—many of which are relevant to cancer—including cell cycle control,

cell growth and metabolism, and angiogenesis.68 On the other hand, MYC also sensitizes

cells to apoptosis, presumably to keep cells in check by tempering uncontrolled

proliferation with cell death.68,69 In Bcell development, MYC serves as an inducer of cell

division under specific circumstances. MYC is largely absent in B cells, in large part owing

to its transcriptional repression by BCL6.70,71 However, MYC is briefly expressed when B

cells enter the dark zone, either upon initial entry into the germinal centre or during cyclic

reentry.71

In BL, MYC plays a central role in initiating and maintaining tumour growth. Originally

described in 1972, cytogenetic aberrations affecting chromosome 8 were considered a

genetic hallmark of BL.72–74 A decade after their discovery, the target of these genomic

rearrangements was identified as MYC, a human homolog for the viral transforming

vmyc gene.75,76 More specifically, these translocations put MYC in proximity of one of the

three IG loci and thus under the control of strong IG enhancers. They also tend to

uncouple MYC expression from BCL6 repression by removing BCL6 binding sites in the

MYC promoter.71 The role of these translocations in lymphomagenesis was confirmed

when transgenic (EμMyc) mice developed aggressive lymphomas after coupling MYC

expression with an IG enhancer.77 In human and murine tumours, these translocations

cause constitutive expression of MYC, thereby promoting cell growth and proliferation.

Deregulated MYC activity also promotes the apoptosis pathway which, if not disrupted,

should lead to cell death.68 This safeguard may explain the latent period of up to five

months before tumour formation seen in the EμMyc mouse model. The requirement for

abrogating apoptosis a priori is also consistent with the lack of IGMYC translocations

found in circulating B cells in healthy individuals.78 On the other hand, IGBCL2

translocations are found in circulating B cells, suggesting this is a MYCspecific effect.

Hence, additional genetic or molecular events are required to cooperate with MYC to give

rise to BL tumours.

9

The distribution of chromosomal breakpoints in the MYC and IG loci provides clues to the

origin of these oncogenic translocations. Notably, the MYC breakpoints exhibit a different

pattern in sporadic and endemic cases.79,80 In sBL, the breakpoints are in close proximity

of the MYC TSS, with many overlapping the first exon or the first intron. In contrast, eBL

exhibits a more diffuse distribution of breakpoints, which span a 1Mbp region centred on

MYC, with a minority of translocations occurring near the TSS. The large distances

between the breakpoint and the target oncogene seen in lymphomas seem compatible

with the capability of IG enhancers to induce longrange epigenetic

reprogramming.81

MYC translocations in BL mostly involve one of the three IG loci.38 Each locus is partnered

with MYC at roughly the same proportions in endemic and sporadic BL. The IGH locus on

chromosome 14 is the most commonly involved, translocated with MYC in roughly 80% of

BL cases. The IG loci encoding the light chains IGK and IGL on chromosomes 2 and 22,

respectively, account for the remaining 20% of translocations. The IGH breakpoints were

initially thought to also segregate differently among the clinical variants.82 However,

several studies later demonstrated that the association between breakpoint location in

IGH and geographic origin was much weaker than initially estimated.80,83–86

More precisely, the breakpoints in IGH mostly affect the switch regions, which are

involved in class switch recombination.39 The purpose of class switch recombination is to

swap the constant (C) portion of the IG while maintaining the same variable VDJ

sequence, which is responsible for binding the antigen. This is accomplished by

introducing doublestrand DNA breaks in the switch regions, removing the intervening

DNA, and repairing the break via nonhomologous end joining. These doublestrand DNA

breaks are mediated by AICDA, the same enzyme responsible for SHM. During aSHM,

AICDA can cause the formation of oncogenic MYC translocations, implicating the enzyme

in BL pathogenesis.87

1.2.3 Known genetic and molecular aberrations

Whereas MYC is a potent protooncogene, animal models demonstrated that MYC

deregulation is insufficient for triggering lymphomagenesis, indicating the existence of

additional aetiological factors.88 The involvement of EBV and malaria in BL pathogenesis

10

is strongly suspected and is discussed below, but these environmental factors cannot

account for all BL cases given the existence of EBVnegative cases outside of

malariaendemic regions. Over the past three decades, significant progress has been

made in our understanding of the genetic and molecular underpinnings of BL (Figure

1.6).

Figure 1.6: Molecular pathways contributing to BL pathogenesis. The encoded proteins ofrecurrently mutated genes are highlighted in colour (red, oncogenes; blue, tumour suppressors).The percentages indicate the fraction of BL cases with mutations affecting the associated genes.This is Figure 4 adapted with permission from Pasqualucci, 2019.89

Soon after TP53 was identified as a tumoursuppressor gene in 1989, it was found

recurrently mutated in BL.90 This observation is consistent with the critical role the gene

plays in apoptosis given how MYC deregulation predisposes cells to programmed cell

death. Considering the aforementioned latency observed in EμMyc mice, the involvement

of other genes that regulate apoptosis was investigated. Notably, the homologs for TP53

(Tp53) and CDKN2A (Cdkn2a) were often mutated in the murine tumours in addition to

having increased expression of the MDM2 homolog (Mdm2).91 Cdkn2a encodes a tumour

suppressor capable of inducing G1/S cellcycle arrest and apoptosis, while Mdm2 is an

11

oncogene whose product is capable of promoting the degradation of Tp53 protein. A

concurrent study demonstrated an accelerated disease progression in EμMyc mice when

they were crossed with mice bearing Tp53 or Cdkn2a mutations.92 Mutations in CDKN2A

and overexpression of MDM2 were later confirmed in human BL cell lines.93

In 2012, several highthroughput sequencing studies provided a comprehensive

description of the landscape of somatic mutations in BL.94–97 A number of additional

genes were implicated in BL pathogenesis, some having established roles in other

malignancies and others remaining uncharacterized. For instance, CCND3, which

encodes a Dtype cyclin, was found to be commonly mutated in BL, especially among

sporadic cases.94,96,97 CCND3 functions by regulating the G1/S transition and promoting

cellcycle progression. Variants in CCND3 strictly affect the carboxylterminal of the

encoded protein and many of these mutations cause premature truncation of the protein.

Mutation clusters are a hallmark feature of oncogenes but truncating mutations are more

commonly a feature of tumour suppressor genes. In this case, functional work

demonstrated that the missense mutations and truncating mutations in this region

promote the stability of CCND3 protein.94

In these large sequencing studies, TCF3 and its negative regulator, ID3, were also

identified as recurrently mutated in BL.94,96,97 TCF3 encodes for a transcription factor with

a central role in Bcell development, most notably by modulating IG gene expression.

Mutations in TCF3 are strictly missense and target the basic helixloophelix domain of

the E47 transcript isoform while the corresponding domain of the E12 isoform remains

unaffected. These alterations were shown to result in higher E47 transcript levels, thereby

promoting activity.94 On the other hand, mutations in ID3 are not only more frequent but

include several that are predicted to truncate and deactivate the protein, consistent with

its role as a tumour suppressor. Mutations in ID3 or TCF3 increase BCR signalling by

inducing IG expression and repressing PTPN6, which encodes a phosphatase (SHP1)

that dampens BCR signalling.98 In turn, increased BCR activity promotes

phosphoinositide 3kinase (PI3K) signalling in a growthpromoting pathway termed “tonic”

BCR signalling that is largely antigenindependent. Moreover, TCF3 also induces CCND3

expression, exerting additional pressure on cellcycle progression.

12

Other less frequent genetic lesions capable of activating PI3K signalling in BL include

deactivating mutations in PTEN, an established tumoursuppressor gene with an

inhibitory role in PI3K signalling, and focal amplifications of the MIR17HG locus, which

encodes microRNAs (miRNAs) capable of reducing PTEN translation.98 Alterations in

FOXO1 may be related to the relationship between FOXO1 and the PI3K pathway, but the

exact effect of the mutations is still under investigation.99,100 The obvious role that PI3K is

playing in BL pathogenesis may present a therapeutic opportunity and justifies the clinical

investigation of the use of inhibitors for PI3K, Syk and Src family kinases.94

PI3K signalling is also activated by mutations affecting the GNA13 signalling pathway.

Functional experiments have demonstrated that these variants can deregulate AKT, a key

component of the PI3K pathway.101 These mutations also resulted in a lack of

confinement of germinal centre B cells, which may be associated with increased disease

dissemination. In BL, the most commonly mutated genes are GNA13, encoding a guanine

nucleotidebinding protein (G protein), and P2RY8, encoding an associated G

proteincoupled receptor. Inactivating mutations in RHOA, a downstream target of GNA13

signalling, are thought to have similar consequences on the pathway.

Another set of genes with recurrent mutations is ARID1A and SMARCA4, both encoding

components of the switch/sucrose nonfermentable (SWI/SNF) complex.94–97 This

complex regulates gene expression by repositioning nucleosomes along DNA, thereby

facilitating transcription factor binding.102 At first glance, the mutation pattern in both

genes suggests that they are tumour suppressors, consistent with their role in other

malignancies. Beyond that though, the mechanism of action of these mutations in BL

remains unclear. The same can be said of DDX3X, another tumour suppressor gene

commonly mutated in BL whose role in pathogenesis is unknown.94,97 The gene encodes

an RNA helicase and is located on chromosome X, which may account for the relatively

high maletofemale ratio mentioned earlier. Its structural homologue situated on the

chromosome Y, DDX3Y, shares roughly 90% sequence identity but its expression is

restricted to male germline cells, suggesting a role distinct from that of DDX3X.103 DDX3X

mutations have been described in other EBVassociated cancers, such as natural

killer/Tcell lymphoma, which suggests a function related to the virus.104 Additional

investigation is required to elucidate the consequences of mutations in these genes.

13

While they may not be readily targetable due to being tumour suppressor genes,

mutations affecting ARID1A, SMARCA4, or DDX3X could potentially be exploited for

synthetic lethal interactions with other genes.

1.2.4 Epstein–Barr virus

Since its discovery in BL, EBV has been linked with two lymphoproliferative diseases and

at least seven additional cancer types, mostly involving lymphocytes and epithelial

cells.105 Today, an estimated 200,000 cancer cases per year are attributable to EBV

infection.106 Yet, despite being the first virus to be associated with cancer, the underlying

mechanisms that promote tumour formation remain poorly understood.

For decades, the epidemiological evidence presented earlier in this chapter provided the

strongest case for an oncogenic role for EBV in BL pathogenesis with little support from

functional studies.107 In the early 1970s, the direct capability of transforming B cells was

confirmed when EBV was used to immortalize B cells in vitro to form lymphoblastoid cell

lines (LCLs).108 A pivotal point in EBV research was also achieved in 1984 with the

publication of the viral genome sequence, enabling new molecular analyses.109 Despite

the experimental utility of LCLs, EBV gene expression in vitro differs greatly from that in

vivo, which has complicated the search for a reliable and representative in vitro model

system for EBVpositive BL.110

The observed variation in EBV gene expression ultimately led to the identification of

different EBV gene expression programs associated with distinct latency states. LCLs

express all latent genes, defined as Latency III.111 In contrast, EBVpositive BL tumours

only express EBNA1 and some noncoding genes including EBER1 and EBER2, termed

Latency I.110,112 EBV gene expression in BL cells is presumably restricted in

immunocompetent patients to avoid detection by the immune system. Additional latency

programs such as Latency IIa and IIb that express an intermediate number of genes are

observed in other contexts.111 These expression differences highlight the limitation of

EBVpositive cell lines for studying the role of EBV in BL. For example, the EBV genes

EBNA2 and LMP1 were deemed essential for transformation in vitro for LCLs, and yet

they are not detected in clinical BL samples.113 Furthermore, while EBNA1 is the only

expressed protein in BL, it does not seem critical for B cell immortalization in vitro.114 It

14

thus appears that the mechanisms by which EBV promotes transformation are not entirely

consistent.

A breakthrough was made when the EBVpositive Akata cell line was generated from a

BL sample.115 Unlike previous cell lines, researchers could derive a viable EBVnegative

clone, which allowed for comparative studies.116. As expected, the EBVpositive clones

were relatively more malignant than their EBVnegative counterparts, in part due to

increased resistance to apoptosis.116–119 Later, EBNA1 was found to promote survival in

BL cell lines by inhibiting apoptosis in an EBERindependent manner.120 The importance

of this gene was also demonstrated in transgenic mice expressing EBNA1 in B cells,

although this finding remains controversial.107,121 While EBNA1 is the only consistently

expressed proteincoding gene in BL, heterogeneous EBV gene expression has been

reported by multiple studies.122 For instance, LMP1 and LMP2 were shown to be

transiently expressed in BL and may have similar oncogenic roles as in LCLs.123,124 That

being said, it is reasonable to focus on the role of EBNA1 given its universal presence in

EBVpositive BL, making it a prime target for therapy.

The role of noncoding genes that are expressed alongside EBNA1 in BL has also been

explored. For instance, the EBER genes do not seem essential for the generation of

LCLs.113 On the other hand, they promote tumourigenicity in BL cell lines, although the

underlying mechanism remains elusive.118,125,126 Some studies have shown an inhibitory

effect on the human PKR protein, which in turn represses interferonαinduced apoptosis,

but these findings have been challenged.126,127 Alternatively, the EBER transcripts appear

responsible for increasing levels of the cytokine interleukin10 (IL10) seen in EBVpositive

tumours.128 Not only could this result in growthpromoting autocrine signalling, but IL10

can promote tumour growth through immune evasion by attracting macrophages to engulf

apoptotic cells.129 That being said, studies have shown similar increases in IL10 levels

due to malaria, so the culprit for this molecular change remains unclear.130,131

Other studies have shown that EBV can have an impact on miRNAmediated regulation

through cellular or viral miRNAs, which may promote lymphomagenesis. For example,

hsamiR127 was found to be upregulated in EBVpositive tumours, although the

mechanism for upregulation was not explored.132 The authors proposed a model whereby

15

EBV increases the expression of hsamiR127, which in turn mediates Bcell

differentiation by degrading PRDM1 (i.e. BLIMP1) and XBP1 transcripts. Another study

demonstrated a role for a subset of EBV miRNAs in suppressing apoptosis, possibly

through direct posttranscriptional regulation of the proapoptotic protein CASP3.133 These

potential miRNA:mRNA interactions will likely continue to be identified as more miRNA

and RNA sequencing data are generated, providing a broader perspective on the effects

of EBV on the BL transcriptome.134

Another compelling, albeit controversial, effect of EBV on BL genomes is the activation of

AICDA and the ensuing aSHM.135 In infectious mononucleosis patients, EBVpositive B

cells from the peripheral blood had more active SHM than their EBVnegative

counterparts.136 In vitro, EBV caused an increase in AICDA expression in B cells, which

had the notable consequence of introducing mutations in cancer genes such as

TP53.137,138 These in vitro studies are consistent with results from BL tumour sequencing,

which have shown an increased number of mutations in the IG loci of EBVpositive

tumours.139 The underlying mechanism of this effect has been the focus of more recent

studies, with some attributing the increase to the EBV gene LMP1 and others attributing

to EBNA3C.140,141 These proposed mechanisms must be reconciled with the fact that

these EBV genes are not consistently expressed, or at least detected, in BL. In contrast,

one study showed relatively lower AICDA activity in EBVpositive cells, but unlike BL

tumour cells, these cells also expressed EBNA2, limiting the relevance of this

finding.112,142,143 Overall, it appears that the viral effect on AICDA depends on the context,

as is the case with many other aspects of EBV.

EBV is clonal in BL tumours, and while consistent with an early role in tumourigenesis, a

late but strong influence on tumour growth cannot be excluded, which would be hard to

distinguish in bulk tumour sequencing.144 The inhibition of apoptosis mediated by EBV

would ostensibly benefit the formation of BL by removing the safeguard in place

preventing uncontrolled MYCdriven proliferation. Furthermore, disrupting apoptosis

would also facilitate the survival of cells harbouring doublestrand DNA breaks by

avoiding cell death, thereby allowing the accumulation of potential driver mutations.49 It is

generally thought that EBV infection of B cells occurs before the MYC translocation

arises.122 If EBV also activates AICDA, its presence would also increase the likelihood of

16

forming the oncogenic translocation. Additionally, given the continual cell proliferation

seen in BL and that the EBV episome can be spontaneously lost during cell division, it is

expected to completely disappear from the tumour.145,146 In other words, any

EBVnegative tumour cells that result spontaneously from loss of the EBV genome during

cell division are presumably outcompeted by the EBVpositive cells. Therefore,

EBVnegative tumours must rely on alternative EBVindependent mechanisms to achieve

similar effects, which may be more difficult to attain and could explain the lower incidence

of EBVnegative BL.107

1.2.5 Malaria

It is a matter of debate whether malaria has a direct effect on BL tumourigenesis or an

indirect effect by altering the host environment. Research into the role of malaria in BL

pathogenesis has been hampered by the lack of adequate model systems for BL. Early

on, an aetiological link was supported by in vivo mouse models that formed lymphoma

tumours resembling BL histologically upon infection with malaria.147 The intensity of

malarial infection correlated with the frequency of spontaneous tumour formation.147

Additionally, prior infection with malaria predisposed mice to developing lymphoma

tumours after being inoculated with cellfree tumour extract derived from murine

lymphomas.147 The rationale for inoculation was that the tumour extract may contain

factors such as viruses that promote lymphomagenesis. Indeed, mice treated with the

cellfree tumour extract more frequently developed lymphomas. These results reveal a

possible synergy between malaria and a component of the tumour extract, potentially viral

in nature. The model for BL formation evolved to consider the impact of malaria on

lymphoid tissue but in vivo experiments that dissect the individual role of each pathogen

are sparse.25 What remains certain is that the presence of both malaria and EBV infection

lead to an increased risk of BL but the molecular nature of this hostenvironment

interaction remains elusive to this day.

Some have argued that the increase in AICDA expression observed in endemic BL is

primarily due to malaria infection and the more important role of EBV is to suppress

apoptosis.49 Evidence supporting this effect of malaria on AICDA is steadily

accumulating.148,149 The mechanism has not been fully characterized yet, but one

17

possibility is the activation of Tolllike receptors on B cells by malariaassociated agonists

such as haemozoin, which in turn induces AICDA expression.49 Interestingly, a

synergistic effect between malaria and EBV on AICDA expression has been described,

whereby the EBV load in the blood is correlated with AICDA levels in patients from

malariaendemic regions, but this correlation ceases to exists in patients from areas of

low exposure to malaria.149 The underlying reason for this compounded effect on AICDA

activity remains unknown, but this has led to many suspecting an interaction between

malaria and EBV.29,150

Additional evidence for synergy arises when malaria interacts with the immune system. It

is commonly thought that malaria infection chronically activates the Bcell system, thereby

increasing the number of B cells transiting through the germinal centre and heightening

the risk for MYC translocations.107,148 In this process, EBVinfected cells are preferentially

expanded in the germinal centre, exacerbating the risk for BL formation.123,148 The

underlying mechanism of this interaction remains uncertain, but some work has shown

that a malarial protein, CIDR1α, is capable of inducing lytic reactivation of EBVinfected

memory B cells.151 Interestingly, CIDR1α can also activate pathways that result in

suppression of apoptosis, which may be relevant to BL pathogenesis.152

Lastly, another potential contribution of malaria to BL formation is the resulting Tcell

immunosuppression that is seen during acute malarial infection, which provides a window

of opportunity for EBVinfected B cells to proliferate.153–155 Indeed, the number of

EBVinfected cells in circulation is significantly higher in children during and following an

acute episode of malaria.156 Hence, the clear geographic association between BL

incidence and the distribution of malaria parasites might simply be due to the ability of

malaria to “distract” the immune system enough to allow EBV to infect more cells and/or

to permit broader gene expression programs, which are known to be oncogenic in the in

vitro setting.157 Under this model, I expect EBV infection to immortalize some B cells first;

then, malaria facilitates the expansion of EBVinfected B cells; and finally, this increase in

EBVinfected B cells correlates with the risk of forming a MYC translocation.158

18

1.3 Problem statement and thesis overview

Despite being able to effectively cure paediatric BL, this is only true for privileged patients

with access to proper supportive care, who mostly consist of children with sporadic BL.

Prognosis for children with endemic BL remains dismal. The severe toxicity of current

treatment regimens also needs to be considered because it is thought to be a major

contributor to the lack of success of treatment in endemic and adult sporadic BL. There is

thus an urgent need to advance our understanding of BL pathogenesis, especially in the

comparative setting, in order to identify new potential therapeutic targets. Several open

questions exist in the literature regarding BL. While many of these questions are not

conclusively addressed herein, this thesis presents key advancements in our knowledge

of BL biology and provides support for longstanding hypotheses.

In this work, I aimed to characterize the genetic and molecular landscape of paediatric

sporadic and endemic BL. I do not consider adult cases or immunodeficiencyassociated

cases. I focus on the mutational landscape and to a lesser degree, gene expression

profiling by leveraging whole genome and transcriptome sequencing datasets. The

hypotheses underpinning this thesis are: (1) hitherto uncharacterized features of BL

genomes and transcriptomes may provide novel insight into BL biology and open up new

avenues for targeted therapy; and (2) molecular features of BL vary primarily based on

tumour EBV status (and potentially EBV genome type) rather than geographic origin and

thus treatment should be tailored accordingly. These hypotheses are investigated in

Chapters 2 and 3, respectively. In Chapter 2, I will describe novel features of BL genomes

and extend previously made observations. In Chapter 3, I will demonstrate the importance

of tumour EBV status relative to geographic origin in determining features with likely roles

in pathogenesis. Finally, I will discuss these findings in Chapter 4 and explore potential

avenues for future research in this disease implied by this work.

19

Chapter 2

Discovery of genetic and molecularaberrations in BL

2.1 Introduction

Modern technologies such as highthroughput sequencing have greatly accelerated the

pace and scale at which researchers can characterize the genetic and molecular features

of cancer. Identifying these features can provide pivotal insight into the mechanisms

underlying tumour initiation and progression. In turn, an improved understanding of

disease aetiology can pave the way for the development of more efficient and/or less toxic

treatments, often by virtue of targeting specific features of malignant cells.

Since 2012, a number of published studies have analysed the BL genome and

transcriptome using highthroughput sequencing.94,96,97,159–165 Despite this volume of

work, several open questions regarding BL pathogenesis remain, as laid out in Chapter 1.

This owes in part to the technological and sample limitations of past studies. A majority of

patient cohorts whose BL samples underwent sequencing were small and most lacked

sufficient representation of endemic and/or EBVpositive cases to provide sufficient

statistical power. Additionally, some of these studies relied heavily on tumouronly RNA or

exome sequencing data. While costeffective, this sequencing strategy poses several

constraints on downstream analyses. First, the lack of matched normal data greatly

complicates the distinction between somatic and germline variants. This is especially

difficult for endemic cases because the African population features more germline

polymorphisms that are underrepresented in current databases. Second, because

coverage in RNA and exome sequencing is biased towards to exonic regions, this

naturally limits the number and type of mutations that can be detected. Third, with RNA

sequencing (RNAseq) data, variable gene expression may reduce the sensitivity for

variant detection, especially for genes with lower expression and for lossoffunction

20

mutations that result in nonsensemediated decay. Fourth, the possibility of physiologic or

aberrant RNA editing adds an additional layer of complexity for identifying true somatic

mutations.

To more comprehensively study the molecular aetiology of BL and specifically overcome

these limitations, the Burkitt Lymphoma Genome Sequencing Project (BLGSP)

assembled a comprehensive patient cohort and subjected these to both whole genome

and transcriptome sequencing. The discovery and validation cohorts feature endemic and

sporadic cases as well as EBVpositive and EBVnegative tumours, allowing for their

comparison, detailed in Chapter 3. Whole genome sequencing (WGS) was performed on

tumour and germline DNA for the accurate detection of somatic mutations in all cases.

Compared to RNA and exome sequencing, WGS enables the identification of noncoding

variants in intronic and intergenic regions as well as more accurate copy number

variations (CNVs) and structural variations (SVs). Another key difference with this dataset

is that library preparation for RNA sequencing relied on ribosomal RNA (rRNA) depletion

from total RNA rather than poly(A) RNA enrichment. This theoretically permits the

profiling of noncoding RNAs (ncRNAs) regardless of the presence of a poly(A) tail and

allows the quantification of all EBV transcripts. In short, at the outset of this project, I

gained access to an unprecedented data set and was thereby poised to discover genetic

and molecular features of BL not possible in previous studies.

DLBCL shares some genetic features with BL and has been studied more rigorously

using genomic techniques. Two main gene expression subtypes exist, which roughly

correspond to the presumed celloforigin, namely germinalcentre Bcell DLBCL and

activated Bcell DLBCL. Of these two subtypes, germinalcentre Bcell DLBCL is

considered the most similar to BL at the molecular level, because both of these

lymphomas are thought to derive from germinal centre B cells. This gene expression

subtype has recently been divided further into two groups based on additional gene

expression features.166,167 Given the growing understanding of DLBCL and shared

molecular features, the molecular and genetic relationship between BL and the subgroups

of DLBCL should be investigated further. While both BL and DLBCL are considered to be

aggressive Bcell nonHodgkin lymphomas, their aetiology, prognosis, and response to

treatment are distinct and can be the focus of further study.

21

In this chapter, I set out to discover novel genetic and molecular features of BL by

leveraging the most comprehensive genomic dataset to date. Briefly, five genes were

associated with BL for the first time and help nominate potential novel therapeutic

opportunities. WGS also enabled the identification of discrete regions enriched in

noncoding mutations, which may be disrupting regulatory elements. Four mutational

signatures were discerned de novo, shedding light on the underlying mechanisms

responsible for mutagenesis in BL. Lastly, IG V gene usage was assessed using RNAseq

data, clearly demonstrating nonuniform V gene usage. In summary, this chapter provides

an exhaustive description of the mutational landscape of paediatric BL.

2.2 Results

2.2.1 Clinical and molecular characteristics of BL cases

All cases considered here were less than 21 years old at diagnosis and thus deemed

paediatric. The discovery cohort consisted of 106 BL cases: 74 endemic BL (eBL) cases

from Uganda and 32 sporadic BL (sBL) cases from the United States and Germany. The

Ugandan and American cases were accrued for the BLGSP. The 15 German cases were

accrued for the International Cancer Genome Consortium (ICGC) Molecular Mechanisms

in Malignant Lymphoma by Sequencing (MMMLSeq) project. The ICGC cases were

included in some analyses to increase the number of sBL cases. Both projects generated

WGS and RNAseq data. I had access to WGS data for both tumour and normal tissue

and RNAseq data for tumour tissue. However, I did not utilize the ICGC RNAseq data to

avoid technical sources of variation (or “batch effects”) due to differences in sample

handling, library preparation method, and sequencing protocols. The clinical and

molecular characteristics of the discovery cohort are summarized in Table 2.1. Patient

metadata are presented per case in the discovery and validation cohorts in Supplemental

Table 1 of Appendix A.

Cases that failed the strict criteria for qualifying for the BLGSP discovery cohort were

included in the BLGSP validation cohort instead. The validation cohort consisted of 29 BL

cases: 24 eBL from Uganda and 5 sBL cases from the United States. Instead of WGS,

these cases were subjected to targeted DNA sequencing of recurrently mutated regions

22

Table 2.1: Summary of clinical and molecular characteristics of the discovery cohort. Cases fromthe BLGSP and the ICGC are shown separately. FF, fresh frozen tissue; FFPE, formalinfixedparaffinembedded tissue; BM, bone marrow; CNS, central nervous system.

Variable Level BLGSP(n=91)

ICGC(n=15)

Total(n=106)

Female 32 (35%) 1 (7%) 33 (31%)SexMale 59 (65%) 14 (93%) 73 (69%)

Endemic BL 74 (81%) 0 (0%) 74 (70%)Clinical variantSporadic BL 17 (19%) 15 (100%) 32 (30%)

EBVpositive 71 (78%) 0 (0%) 71 (67%)EBV statusEBVnegative 20 (22%) 15 (100%) 35 (33%)

EBV type 1 59 (65%) 0 (0%) 59 (56%)EBV type 2 12 (13%) 0 (0%) 12 (11%)

EBV type

EBVnegative 20 (22%) 15 (100%) 35 (33%)

0 5 yr 21 (23%) 6 (40%) 27 (25%)6 10 yr 50 (55%) 5 (33%) 55 (52%)11 15 yr 18 (20%) 2 (13%) 20 (19%)

Age group

16 20 yr 2 (2%) 2 (13%) 4 (4%)

FF 88 (97%) 15 (100%) 103 (97%)Tumor biopsyFFPE 3 (3%) 0 (0%) 3 (3%)

IGHMYC 74 (81%) 11 (73%) 85 (80%)IGLMYC 8 (9%) 3 (20%) 11 (10%)IGKMYC 7 (8%) 1 (7%) 8 (8%)

IGMYCtranslocations

Other 2 (2%) 0 (0%) 2 (2%)

IgM 63 (69%) 0 (0%) 63 (59%)IgG 11 (12%) 0 (0%) 11 (10%)

IG isotype

Undetectable 17 (19%) 15 (100%) 32 (30%)

Headonly disease 29 (32%) 0 (0%) 29 (27%)Intraabdominal disease 16 (18%) 0 (0%) 16 (15%)Disseminated disease (noBM/CNS involvement)

36 (40%) 0 (0%) 36 (34%)

Disseminated disease(BM/CNS involvement)

8 (9%) 0 (0%) 8 (8%)

Anatomic site

Unknown 2 (2%) 15 (100%) 17 (16%)

in addition to RNAseq using the same protocol as the discovery cohort. The clinical and

molecular characteristics of the validation cohort are summarized in Table 2.2.

The BLGSP tumour and matched normal genomes were sequenced to an average

nonredundant depth of 82X (range 55–96) and 41X (range 30–51), respectively. The

ICGC tumour and normal genomes were sequenced to a lower depth of 40X (range

29–62). Because of their lower sequencing coverage, the ICGC genomes had fewer

mutations on average, presumably due to limited sensitivity for mutation detection. For this

23

Table 2.2: Summary of clinical and molecular characteristics of the validation cohort.

Variable Level BLGSP(n=29)

Female 11 (38%)SexMale 18 (62%)

Endemic BL 24 (83%)Clinical variantSporadic BL 5 (17%)

EBVpositive 23 (79%)EBV statusEBVnegative 6 (21%)

EBV type 1 22 (76%)EBV type 2 1 (3%)

EBV type

EBVnegative 6 (21%)

0 5 yr 7 (24%)6 10 yr 19 (66%)11 15 yr 1 (3%)16 20 yr 1 (3%)

Age group

Unknown 1 (3%)

FF 29 (100%)Tumor biopsyFFPE 0 (0%)

IgM 19 (66%)IgG 1 (3%)IgA 1 (3%)

IG isotype

Undetectable 8 (28%)

Headonly disease 6 (21%)Intraabdominal disease 14 (48%)Disseminated disease (noBM/CNS involvement)

4 (14%)

Disseminated disease(BM/CNS involvement)

1 (3%)

Anatomic site

Unknown 4 (14%)

reason, I omitted the ICGC cases from analyses relating to global mutation rates, which

would likely be affected by this technical variable. The BLGSP validation tumour and

normal samples were sequenced relatively deeper, namely 243X (range 158–392).

To complement the transcriptome data from the discovery and validation cohorts, I

included RNAseq data from a small group of healthy tonsils donors (“tonsil cohort”), which

were accrued through the BLGSP. Both centroblasts and centrocytes were cellsorted

from the tonsils and separately underwent RNAseq, yielding six libraries for each cell

type. Derived from the germinal centre, centroblasts and centrocytes are considered the

closest celloforigin for BL and thus the most appropriate normal comparator for gene

expression. Specifically, centroblasts were selected for CD19+, CD38+, IgD–, CXCR4+,

24

and CD83–, whereas centrocytes were selected for CD19+, CD38+, IgD–, CXCR4–, and

CD83+. The BLGSP tumour and tonsil RNAseq datasets had 200M (range 100–289M)

and 219M reads (range 204–240M) on average, respectively.

2.2.2 Datadriven inference of tumour EBV status and genome type

The EBV genome encodes two small noncoding RNA genes called EBER1 and EBER2,

which are both highly expressed in host cells. EBER in situ hybridization (ISH) is the

standard clinical assay for determining tumour EBV status. However, for most cases

analyzed here, EBER ISH was not performed. As a result, tumour EBV status was

inferred from the raw sequencing data using two different methods. First, I calculated the

fraction of WGS reads that aligned to the EBV genome (Figure 2.1A). Second, to emulate

EBER ISH, I counted the number of RNAseq reads aligning to the EBER1 and EBER2

genomic loci (Figure 2.1B). Both approaches yielded a clear bimodal distribution, which

was taken to represent the EBVpositive and EBVnegative cases. Importantly, the two

methods agreed with one another for every case. Additionally, the inferred tumour EBV

status was concordant with available results from EBER ISH (N = 5) or EBV PCR (N = 1).

Ultimately, the discovery cohort had 71 (67%) EBVpositive cases and 35 (33%)

EBVnegative cases. I also determined the EBV genome type (i.e. type 1 or type 2) where

applicable (Figure 2.1C). Out of 71 EBVpositive cases, EBV type 1 and type 2 were

found in 59 (83%) and 12 (17%) tumours, respectively. All cases with EBV type 2 were

endemic (i.e. from Uganda).

2.2.3 Structural and copy number variations affecting MYC

In the discovery cohort, 104 out of 106 tumours had detectable translocations placing

MYC in proximity to an IG enhancer (Figure 2.2). Among these tumours, IGH, IGL and

IGK were involved in the MYC rearrangements of 85 (82%), 11 (11%) and 8 (8%)

tumours, respectively. While lacking traditional IGMYC translocations, the remaining two

tumours featured more complex rearrangements involving MYC. One of these was a sBL

case (BLGSP711900123) with a reciprocal structural variation between the MYC and

BCL6 loci that resulted in the focal gain of MYC, possibly in the form of a double minute.

The other was an eBL case (BLGSP710600277) with a complex set of translocations

rearranging MYC and IGH via an intergenic region on chromosome 17.

25

1e−05

1e−04

1e−03

1e−02

1e−01

EBV−negative EBV−positive

Inferred EBV infection status

EB

V p

erce

ntag

e of

WG

S r

eads

(lo

g)

A

1

10

100

1000

10000

EBV−negative EBV−positive

Inferred EBV infection status

EB

ER

RN

A−

seq

read

cou

nt (

log)

B

0.5

1.0

2.0

4.0

EBV type 1 EBV type 2

Inferred EBV genome type

Fre

quen

cy r

atio

for

k−m

ers

uniq

ueto

EB

V ty

pe 1

and

type

2 (

log)

C

Figure 2.1: Molecular differences between EBVpositive and EBVnegative BL tumours. (A)Fraction of mapped reads from whole genome sequencing data that aligned to the EBV genome(log scale). The minimum threshold for calling EBVpositive samples was 0.006, indicated by thedashed line. (B) RNAseq read counts for EBER1 and EBER2 (log scale). The minimum count forcalling EBVpositive samples was 250 reads, indicated by the dashed line. A pseudocount of 1was added to all values prior to log transformation. This excludes the ICGC cases whose RNAseqdata were not analyzed. (C) Ratio between the counts for 21mers that are unique to EBV type 1and type 2, respectively, calculated from whole genome sequencing reads aligned to the EBVgenome. The minimum ratio for calling EBV type 1 samples was 1, indicated by the dashed line.

In addition to translocations, other structural alterations affecting MYC were found. First, I

observed telomeric gains of chromosome 8q in six (5.7 %) tumours (Figure 2.3). In these

tumours, the associated IGMYC breakpoints were upstream of MYC, confirming the

inclusion of the protooncogene in the gain. These events may be the result of

unbalanced MYC translocations and further promote MYC expression. Second, focal

gains were also found in three cases (2.8%), ranging from 50 to 180 kbp. Third, one eBL

case (BLGSP710600086) has distinctive CNVs on chromosome 11q, namely highlevel

gains of a region spanning 11q22.3–q23.2 followed by telomeric loss of 11q23.3–qter

(Figure 2.3). These CNVs are reminiscent of those characteristic of the new WHO entity

“Burkittlike lymphoma with 11q aberration”, which is also defined by the lack of IGMYC

translocations. In this case though, the 11q CNVs coexist with an IGMYC translocation,

indicating that these events are not strictly necessarily mutually exclusive.

26

8

2

22

14

CASC8

MYC PVT1

IGK

C

IGK

V

IGL VIGL C

IGH C

IGH

V

Figure 2.2: Rearrangements of the immunoglobulin loci. Translocations (shown in center)between the MYC locus (chromosome 8) and the IGH (chromosome 14), IGK (chromosome 2), orIGL (chromosome 22) loci in tumours with WGS data (N = 106). The inner track displays therainfall plot for simple somatic mutations in these regions. Mutations that overlap AICDArecognition sites (RGYW) are shown in red.

27

chr1 chr2 chr3 chr4 chr5

20%

10%

0%

10%

20%C

NV

inci

denc

e

chr6 chr7 chr8 chr9 chr10 chr11 chr12

20%

10%

0%

10%

20%

CN

V in

cide

nce

chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX

20%

10%

0%

10%

20%

CN

V in

cide

nce

Figure 2.3: Landscape of copy number variations. Proportion of cohort affected by copy numbergains and losses are shown in red and blue, respectively. CNVs that are smaller than 100 kbp arenot displayed.

2.2.4 Refining list of genes with potential roles in BL pathogenesis

To assemble a list of BLassociated genes (BLGs), I identified somatic single nucleotide

variants (SNVs) and small insertions/deletions (indels), collectively known as simple

somatic mutations (SSMs), from paired tumournormal WGS data using Strelka.168

Exonic and splicesite SSMs in the discovery and validation cohorts are listed in

Supplemental Tables 2 and 3, respectively, of Appendix A. I analyzed somatic SSMs

using two separate strategies.

First, I identified significantly mutated genes in the discovery cohort using an ensemble

approach involving four complementary methods: OncodriveCLUST for identifying genes

with mutation hotspots; OncodriveFM and OncodriveFML for identifying genes with

functional mutation bias using different metrics; and MutSigCV for identifying genes that

are mutated more frequently than what is expected due to chance. To be considered

28

significantly mutated, a gene needed to be supported by two or more methods (Qvalue <

0.1). Most genes identified through this approach have already been associated with BL,

including some recently discovered candidate BL genes such as TFAP4 and

KMT2D.163,164 I also identified genes not previously described as recurrently mutated in

BL, namely SIN3A, USP7, HIST1H1E, CHD8, and RFX7. The supporting methods for

each gene are shown in Supplemental Table 4 of Appendix A.

Second, I employed more lenient criteria whereby genes previously reported as

recurrently mutated in BL were also considered BLGs if they were altered in at least five

cases of the discovery cohort. This approach led to the inclusion of MYC, MIR17HG,

CDKN2A, and PTEN as BLGs. In total, I identified 27 BLGs and organized them into

groups of related genes (Figure 2.4). In addition to SSMs, I also considered CNVs and

SVs affecting BLGs, which are listed in Supplemental Tables 5 and 6, respectively, of

Appendix A. The mutation status for each BLG and pathway per sample is summarized in

Supplemental Table 7 of Appendix A.

At least 74 genes have been previously reported as candidate BL genes but are not

featured on my list of BLGs.94,96,97,159–165 Out of these genes, only two were discussed in

more than one of these publications: CREBBP and CARD11. Both considered

DLBCLassociated genes, they are mutated in one (0.94%) and three (2.8%) cases,

respectively, and thus do not meet my criterion for being considered a bona fide BLG. The

remaining 72 genes are mutated in at most four (3.8%) cases with the exception of RYR2,

which is mutated in seven (6.6%) cases. I did not include RYR2 as a BLG given its large

size and known status as a false positive significantly mutated gene.169 Given the lack of

support for the remaining genes, I presume that most of these are affected by passenger

somatic or germline mutations. As an example, CCNF was previously reported as

harbouring a somatic mutation hotspot but lacked nonsynonymous SSMs in this BL

cohort.161 While I was unable to identify any somatic mutations at the purported hotspot

position in this cohort, I did find two eBL cases with support for this variant in both the

tumour and normal DNA, strongly suggesting that this mutation is a single nucleotide

polymorphism. I also found that this variant exists in the dbSNP database. Among the

populations in the 1000 Genomes Project, the African population had the highest

29

TCF3/ID3 module (altered in 44%)

TCF3

ID3

6.7%

40%

BCR/PI3K signaling (altered in 34%)

PTEN

MIR17HG

FOXO1

3.7%

10%

24%

MYC regulation (altered in 67%)

TFAP4

SIN3A

MYC

9.6%

16%

61%

Apoptosis (altered in 44%)

CDKN2A

USP7

TP53

3.7%

8.1%

35%

SWI/SNF complex (altered in 59%)

SMARCA4

ARID1A

19%

40%

Epigenetic regulation (altered in 30%)

BCL7A

CHD8

HIST1H1E

KMT2D

5.9%

7.4%

8.1%

12%

GPCR signaling (altered in 36%)

P2RY8

RHOA

GNA13

8.1%

12%

19%

Other (altered in 78%)

RFX7

ETS1

PCBP1

GNAI2

CCND3

FBXO11

DDX3X

5.9%

9.6%

14%

14%

19%

26%

56%

0 25 50 75

Mutation countMutation type

Missense

Truncating/splicing

Gain (focal)

Gain (large)

Deletion (focal)

Deletion (large)

Multiple hits

Figure 2.4: Landscape of nonsynonymous mutations in BLGs for the discovery and validationcohorts (N = 135). Cases are reordered for each pathway to highlight any mutual exclusivity.Mutations are colored according to their predicted consequence on the protein (i.e. mutation type)and are tabulated in the righthand barplots. Focal gains and deletions were defined as thosesmaller than 1 Mbp.

30

alternate allele frequency, consistent with my observation of this germline variant has only

been seen in eBL cases.

2.2.5 Challenges with genetic comparison between BL and DLBCL

Given the relationship between BL and DLBCL, it would be interesting to perform a

genetic comparison of somatic mutations. In a recent publication, I contributed to the

assembly and analysis of WGS data from 153 DLBCL cases.170 These two large BL and

DLBCL WGS datasets present a unique opportunity to compare the genetic features of

both diseases. However, important differences between both datasets limit the

interpretability of any findings. First, mutations were identified differently for the DLBCL

genomes compared to those detected in the BL genomes. While the methodology could

be harmonized, this represents a nontrivial task because filters to remove mutation

artifacts in BL can rely on the tumours’ relative purity and clonality. The same filters would

most likely be too aggressive for filtering mutations in DLBCL tumours, which tend to be

less pure and harbour subclonal heterogeneity. Second, the sequencing coverage is not

consistent across the DLBCL dataset, which introduces the same caveat as the ICGC BL

dataset. Namely, variable coverage is associated with varying degrees of sensitivity for

mutation detection, which limits any attempt at comparing the incidence of mutations. For

these reasons, I do not present a comparison of somatic nonsynonymous mutations

between BL and DLBCL.

2.2.6 Novel mutation patterns in BLassociated genes

By considering other mutation types more readily detected using WGS, I observed novel

mutation patterns in some BLGs and consequently, higher incidence of mutations beyond

what has been reported previously. For example, I found focal deletions or inversions

affecting DDX3X in six (5.7%) cases, all of which are predicted to disrupt the open

reading frame by affecting one or more exons (Figure 2.5). Two additional cases (2.8%)

had mutations affecting the splicing branch point of intron 6 (Ensembl transcript

ENST00000399959; Figure 2.6). Both tumours showed aberrant transcript splicing in the

RNAseq data. Considering these novel mutation types, with the exception of MYC,

DDX3X was the most commonly mutated gene, with a total of 75 (56%) affected cases in

the discovery and validation cohorts.

31

Figure 2.5: Focal structural variations affecting DDX3X visualized in the Integrative GenomicsViewer (IGV). The left panel shows the deletion of an exon; the middle panel shows the deletion ofthe entire gene; and the right panel shows the inversion of some exons.

Figure 2.6: Somatic mutations altering a splicing branch point in DDX3X. The top panel shows theintronexon boundary of intron 6 and somatic mutations detected in the discovery cohort; themiddle panel shows the sequence context where recurrent noncoding mutations occur; and thebottom panel shows the sequence motif for splicing branch point for reference.

32

The mutation pattern in GNAI2 was also clarified by my analysis. This gene is affected by

nonsilent mutations in 19 (14%) cases at one of three hotspots: G45, R179, and

K271/K272 (Appendix B for mutation/lollipop plots). While mutations at some of these loci

have been described before, the recurrent inframe deletions of K272 have not been

reported.171 Analogous mutations in GNAS are known to be activating in other

cancers.172,173 Considering that the hotspot mutations in GNAI2 affect residues in

proximity of the protein GDP binding site, it is possible that they share a common function

in activating the encoded protein.

A previous report found that ID3 was enriched in mutations that overlapped AICDA

recognition sites (RGYW), which are presumed to be introduced by aSHM.97 Within the

gene body of every BLG, I compared the observed mutation rate of nucleotides forming

AICDA recognition sites with the expected rate (Qvalues < 0.1, binomial exact test). In

addition to ID3, I found a similar enrichment of mutations affecting AICDA recognition

sites in HIST1H1E, MYC, BCL7A, and ETS1, whereas the opposite trend was seen in

GNAI2 and RHOA (Figure 2.7). The observed constraints on which codons are mutated in

GNAI2 and RHOA can explain the the lack of mutations in AICDA recognition sites. In

other words, there appears to be a selection against variants being introduced elsewhere

in the genes.

I also investigated the relationship of mutations to one another. Specifically, mutual

exclusivity can shed light on mutations that are functionally redundant or whose

cooccurrence may be lethal to the cell. I quantified mutual exclusivity using the previously

established groups of related genes (Figure 2.8). The only genes whose mutations were

mutually exclusive were the components of the SWI/SNF pathway, namely ARID1A and

SMARCA4 (Qvalue = 0.000023; CoMEt exact test).174,175

2.2.7 Landscape of noncoding mutations shaped by somatichypermutation

One key advantage of WGS over exome or RNA sequencing is the ability to

comprehensively determine the landscape of noncoding mutations, especially in intronic

and intergenic regions. Here, I had access to a sufficient number of BL genomes to

characterize the genomewide landscape of noncoding mutations. I used the Rainstorm

33

ID3

ETS1

BCL7A

RHOAGNAI2

HIST1H1E

MYC

0

1

2

3

4

0.0 0.5 1.0 1.5 2.0

Odds ratio (mutations at any base in AICDA motif)

Odd

s ra

tio (

mut

atio

ns a

t G/C

in A

ICD

A m

otif)

Enrichment/depletionof AICDA mutations

Depleted

Enriched

Neutral

Figure 2.7: Enrichment or depletion of mutations affecting AICDA recognition sites (RGYW) inBLGs. The Xaxis displays the odds ratio between the observed and expected mutation rates of allbases in AICDA recognition sites. The Yaxis shows the odds ratio between the observed andexpected mutation rates of guaninecytosine pairs in AICDA recognition sites. BLGs with asignificant enrichment or depletion according to either metric are displayed in red and blue,respectively (Qvalues < 0.1, binomial exact test).

SWI/SNF complex

Apoptosis

GPCR signaling

TCF3/ID3 module

Epigenetic regulation

BCR/PI3K signaling

MYC regulation

0 1 2 3 4

−log10(Q−value)

Figure 2.8: Mutual exclusivity of mutations affecting BLGs associated with each pathway (Cometexact test). The dashed line represents the minimum Qvalue threshold of 0.1.

34

and Doppler algorithms for genomewide inference of discrete genomic regions enriched

for noncoding mutations in the cohort.170 These regions are referred to here as

“noncoding mutation peaks” (“peaks”, for brevity). They are listed in Supplemental Table

8 of Appendix A. I identified 70 peaks with a median size of 1,539 bp (range 20–10,652;

Figure 2.9A). Out of the 38 peaks mutated in 15 or more patients, 17 overlapped one of

the three IG loci and were separately considered as three respective groups. Of the

remaining commonly mutated peaks, there was a clear bimodal distribution in the

distance from the nearest TSS. Specifically, 17 were within 3 kbp of a TSS and were thus

categorized as TSSproximal, while the other three were considered TSSdistal (Figure

2.9B). Additionally, most TSSproximal peaks were associated with genes or regions

known to be affected by aSHM in other lymphomas including DLBCL (Figure 2.9C).176

Given that most peaks were TSSproximal and associated with genes targeted by aSHM,

I hypothesized that these regions are mutated by AICDA in a subset of BL tumours.

Consistent with AICDA activity, I found an enrichment of mutations affecting AICDA

recognition sites (RGYW) in 61% of peaks (Qvalues < 0.1, binomial exact test; Figure

2.10). Given that active transcription is known to facilitate AICDAmediated mutation, I

explored the expression of genes associated with TSSproximal peaks (i.e. “peak target

genes”).63,64,177 Peak target genes were among the most highly expressed genes in all

tumours, including those cases lacking mutations in these regions (median

transcriptspermillion expression percentile = 98.3). I also did not find a strong correlation

between the presence of mutations in a peak and higher target gene expression (Figure

2.11). Overall, AICDA expression correlated with the number of mutated peaks (Figure

2.9C) and the number of mutations within peaks (Pvalue = 2.3 × 10−8, Pearson

correlation test; Figure 2.12). Altogether, these findings demonstrate that discrete

genomic regions in BL accumulate noncoding mutations, and most appear to be the

consequence of AICDAmediated aSHM.

Though several mutation peaks identified here overlap known targets of aSHM, many of

these regions or genes are not known to be targeted by aSHM in BL. Notably, I found a

mutation peak 54 kbp downstream of MYC that overlaps the promoter and first intron of

PVT1, a locus that produces a long noncoding RNA (lncRNA) and a known target of

MYC.178 PVT1 promoter mutations occurred in 17% of 106 BL cases compared to only

35

0

4

8

12

0 2 4 6 8 10

Non−coding mutation peak size (kb)

Fre

quen

cyA TSS−proximal TSS−distal

0.0 0.5 1.0 1.5 0 100 200 300 4000

10

20

30

Distance from nearest TSS (kb)

Fre

quen

cy

B

AIC

DA

expression

8

9

10

11

12

13

IG loci

IGK locus

IGL locus

IGH locus

0.3

3.0

Mut./kbp

TS

S−

proximal

PVT1 (−755 to +3,376)

BIRC3 (+60 to +996)

RHOH (−719 to +1,417)

ST6GAL1 (−853 to +976)

MIR142 (−1,081 to +992)

ZFP36L1 (+408 to +1,412)

DTX1 (−1,476 to +973)

CXCR4 (−575 to +2,204)

BTG2 (+225 to +2,138)

BCL7A (−3,274 to +5,811)

TCL1A (−2,113 to +1,427)

BCL6 (−904 to +3,029)

BACH2 (−2,345 to +5,398)

MYC (−1,017 to +8,532)

0.3

1.0

3.0

10.0

Mut./kbp

TS

S−

distal

ST6GAL1 enhancer (intronic)

BCL6 enhancer (intergenic)

PAX5 enhancer (intergenic)

1

3

Mut./kbp

C

Figure 2.9: Noncoding mutation peaks. (A) Size distribution of noncoding mutation peaks (orsimply, “peaks”). (B) Distance between peaks and the respective nearest TSS. Peaks overlappingimmunoglobulin loci are omitted. (C) Density of noncoding mutations as mutations per kilobase(mut./kbp) in peaks annotated with the nearest transcription start site (relative position inparentheses) or regulatory element. Peaks overlapping IG loci are shown separately. Tumourscorrespond to columns and are ordered based on AICDA expression, as shown in the top panel.

36

0

1

2

3

4

5

−0.6 −0.3 0.0 0.3 0.6

log10(Odds ratio)

−lo

g 10(

Q−

valu

e)Mutations in any base of AICDA motif

0

1

2

3

4

5

−0.6 −0.3 0.0 0.3 0.6

log10(Odds ratio)

−lo

g 10(

Q−

valu

e)

Mutations in G/C of AICDA motif

Figure 2.10: Enrichment or depletion of mutations affecting AICDA recognition sites (RGYW) inpeaks altered in at least 15 cases. The left panel displays the tests considering the mutation rateof all bases in AICDA recognition sites. The right panel shows the tests considering the mutationrate of guaninecytosine pairs in AICDA recognition sites. The vertical dashed line indicates aneutral log odds ratio, and the horizontal dashed line indicates the minimum Qvalue threshold of0.1 (binomial exact tests). Peaks with a significant enrichment are displayed in red.

*** * ** ***

0.5

1.0

1.5

LTB

SERPINA9

HIST1H

4J

RCC1

HIST1H

2BK

ETS1

ST6GAL1

RHOH

ZFP36L1

BTG2

FOXO1DTX1

POU2AF1

CXCR4

BCL7A

TCL1A

BIRC3

RFTN1BCL6

BACH2

BMP7

RNF144B

Non−coding mutation peak

Rel

ativ

e ge

ne e

xpre

ssio

n

Mutation status

Unmutated Mutated

Figure 2.11: Variancestabilized expression values of genes associated with TSSproximal peaksaccording to the mutation status of each peak. Only proteincoding genes are displayed. For eachgene, expression values were normalized by the median expression in unmutated tumours.Significance brackets: *, Qvalue < 0.1; **, Qvalue < 0.001 (Mann–Whitney U test).

37

5000

10000

8 9 10 11 12 13

AICDA expression

Num

ber

of m

utat

ions

in p

eaks

EBV status

EBV−positive

EBV−negative

Figure 2.12: Correlation between variancestabilized AICDA expression and the number ofmutations in noncoding mutation peaks.

4.6% in a cohort of 153 DLBCL cases.170 Another noncoding mutation peak affected a

distal enhancer for PAX5, a transcription factor with an important role in Bcell

differentiation. Mutations in this enhancer were found in 11% of 150 chronic lymphocytic

leukemia cases, whereas I observe a higher mutation incidence (20%) in 106 BL

genomes, which is comparable to that observed in 153 DLBCL genomes (23%).170,179

Guaninecytosine pairs in AICDA recognition sites (RGYW) were mutated at a higher than

expected rate in the PAX5 enhancer and PVT1 promoter mutation peaks, reminiscent of

the cytosine deamination seen during aSHM (Qvalues = 0.0045 and 0.056, respectively;

binomial exact test). These variants raise the possibility that AICDA is contributing to BL

by introducing noncoding mutations in regulatory regions.

2.2.8 Robust identification of mutational signatures in BL genomes

Several mutational processes shape the landscape of somatic variants in tumour

genomes, each resulting in a distinct mutational signature.180 Here, a mutational

signature is defined by a pattern of mutations based on base change and trinucleotide

context. At the time of this work, there were 30 robust reference signatures in the

Catalogue of Somatic Mutations in Cancer (COSMIC) database, some having been

38

A

B

C

Figure 2.13: Known and novel targets of aberrant somatic hypermutation. Noncoding mutationpeaks overlapping (A) BACH2, (B) PVT1 promoter region, and (C) distal PAX5 enhancer.Mutations from the BL discovery cohort (N = 106) and a DLBCL cohort (N = 153) are shownseparately.

39

attributed to known or suspected mutational processes.180,181 To investigate the

mutational processes active in BL cells, I inferred mutational signatures de novo using

standard methodology.180 Similar to unsupervised clustering, a range of signature counts

is tested, and the optimal number is decided by maximizing stability while minimizing

reconstruction error (Figure 2.14A). In this cohort of 106 genomes, the optimal number of

signatures was four (Figure 2.14B). Each of these “BL signatures” (designated BL

signatures A through D) was paired with a COSMIC reference signature (version 2) based

on cosine similarity to infer putative etiologies (Figure 2.14C).

The pattern for BL signature A displayed a relatively uniform distribution of mutation types

with a slight bias towards C>T substitutions. This mutation composition was most similar

to COSMIC signature 5, which is found in all cancer types and most tumours. Its ubiquity

is due to the fact that it is one of two signatures that result from clocklike processes, the

other being COSMIC signature 1.181 In Bcell lymphomas, signature 5 was more common

than signature 1 and presented a stronger correlation with age at diagnosis.181 In BL, this

clocklike process is the most common source of mutations, accounting for 39% (range

1.1–80%) of SSMs on average (Figure 2.15). BL signature B was defined by a

preponderance of T>G—and to a lesser extent, T>C—mutations in the NpTpT context.

This pattern shared the highest similarity with COSMIC signature 17, which has no known

aetiology. This signature has previously been found in several cancer types including

Bcell lymphomas, and it is associated with 17% (3.1–63%) of mutations in these cases

(Figure 2.15). The lack of understanding for this signature limits my capacity to infer its

relevance to BL.

Whereas BL signatures A and B are either expected or unaccounted for, the remaining

two signatures reveal potentially tumourspecific mutational mechanisms. BL signature C

is composed of mutations altering T or C (i.e. Y) in the GpYpN or TpYpN contexts. While

the proportions of different types of mutations differ slightly, this signature is most similar

to COSMIC signature 15, which is not typically represented in Bcell lymphomas.

Defective DNA mismatch repair (MMR) has been proposed as the mechanism

responsible for signature 15. This finding suggests that MMR may be disrupted in a subset

of BL tumours, although the mechanism is unclear. That being said, compared to the other

signatures, it is the least common in BL genomes, accounting for 10% (range 1.3–40%) of

40

234

5

6

7

0.5

0.6

0.7

0.8

0.9

1.0

1000 1500 2000 2500 3000

Reconstruction errorS

tabi

lity

A

C>A C>G C>T T>A T>C T>G

BL signature A

BL signature B

BL signature C

BL signature D

AC

AA

CC

AC

GA

CT

CC

AC

CC

CC

GC

CT

GC

AG

CC

GC

GG

CT

TC

AT

CC

TC

GT

CT

AC

AA

CC

AC

GA

CT

CC

AC

CC

CC

GC

CT

GC

AG

CC

GC

GG

CT

TC

AT

CC

TC

GT

CT

AC

AA

CC

AC

GA

CT

CC

AC

CC

CC

GC

CT

GC

AG

CC

GC

GG

CT

TC

AT

CC

TC

GT

CT

AT

AA

TC

AT

GA

TT

CT

AC

TC

CT

GC

TT

GT

AG

TC

GT

GG

TT

TT

AT

TC

TT

GT

TT

AT

AA

TC

AT

GA

TT

CT

AC

TC

CT

GC

TT

GT

AG

TC

GT

GG

TT

TT

AT

TC

TT

GT

TT

AT

AA

TC

AT

GA

TT

CT

AC

TC

CT

GC

TT

GT

AG

TC

GT

GG

TT

TT

AT

TC

TT

GT

TT

0

5

10

15

0

5

10

15

0

5

10

15

0

5

10

15

Mutation type

Pro

port

ion

(%)

B

17 9 5 15 28 1 8 16 6 14 12 3 19 25 29 26 20 30 4 18 21 24 10 11 23 7 2 22 27 13

D

C

B

A

COSMIC signature

BL

sign

atur

e

0.25 0.50 0.75

Cosine similarity

C

Figure 2.14: Characteristics of de novo mutational signatures. (A) Selecting the optimal number ofde novo mutational signatures (shown in red) by minimizing reconstruction error and maximizingstability. (B) Composition of each BL signature per base change and trinucleotide context. (C)Cosine similarity between the optimal set of BL signatures and all COSMIC reference signatures.Pairs made based on the highest cosine similarity are outlined in red.

41

variants (Figure 2.15). Lastly, BL signature D exhibited a pattern characterized by an

increased occurrence of substitutions affecting T, especially in the TpTpT context. Based

on cosine similarity, this BL signature was paired with COSMIC signature 9, which is

common in cancers derived from mature B cells. This pattern of mutations has been

attributed to polymerase η activity, which is associated with AICDAmediated mutagenesis

during both physiologic and aberrant SHM. Notably, SHM seems responsible for nearly as

many mutations as BL signature A, namely 34% (range 8.6–64%), highlighting the

importance of AICDA in shaping BL genomes (Figure 2.15).

BL signature A

BL signature B

BL signature C

BL signature D

0 25 50 75 100

0

5

10

0

10

20

30

0

10

20

30

40

0

5

10

15

Percent prevalence

Fre

quen

cy

Figure 2.15: Percent prevalence of de novo mutational signatures.

In order to validate the signatures that were identified, I sought to confirm their

relationship with the proposed aetiologies wherever I had relevant data. BL signature B

has no known aetiology, making it impossible to verify, and I had no metric to quantify the

degree of DNA MMR to correlate with BL signature C. On the other hand, BL signatures A

and D were each the only signature to strongly correlate with age at diagnosis and AICDA

expression, respectively (Qvalue = 5.5 × 10−9 and Qvalue = 1.7 × 10−13, respectively;

Pearson correlation test; Figure 2.16A). Additionally, I performed this calculation for all

42

possible solutions for each of the signatures paired with COSMIC signatures 5 and 9

(Figure 2.16B). Despite having been selected using an independent set of criteria, this

analysis showed the strongest correlation with the foursignature solution. This result

lends further credence to the robustness of my inferred signatures.

Age at diagnosis

AIC

DA

expression

0 1 2 3 4 5

D

C

B

A

D

C

B

A

− log10(Q−value)

BL

sign

atur

e

A

2 3

45

6 7

0.0

0.1

0.2

0.3

0.4

0.5

0.0 0.1 0.2 0.3

Pearson correlation with age

Pea

rson

cor

rela

tion

with

AIC

DA

exp

ress

ion

B

Figure 2.16: Correlation between de novo mutational signatures and biological features of BLgenomes. (A) Correlation between signatures from the optimal solution and age at diagnosis andAICDA expression (Pearson’s productmoment correlation test). (B) After generating solutionsranging from 2 to 7 signatures, for each solution, signatures were paired with COSMIC referencesignatures based on cosine similarity. Solutions with signatures paired with COSMIC bothsignatures 1/5 (agerelated) and 9 (AICDArelated) were tested for correlation with age atdiagnosis and AICDA expression, respectively (Pearson’s productmoment correlation).

2.2.9 Nonuniform V gene segment usage in immunoglobulinrepertoire

Given the importance of the BCR in BL, I sought to delineate the repertoire of V(D)J gene

segments used to encode the IG component of the BCR.94 Rearrangement of these

segments helps produce the highly variable complementaritydetermining region 3

(CDR3) sequence, which in turn determines antigen specificity and affinity.182 An IG

nucleotide CDR3 sequence is known as a clonotype, and clonotyping is the process of

identifying these sequences.183 The IG clonotype of the ancestral malignant B cell that

formed the BL tumour is expected to be present in virtually every tumour cell and thus be

clonal, also referred to as the dominant clonotypes. Each antibodyproducing cell contains

a distinct clonotype for the heavy and light chains. I utilized tumour RNAseq data to

perform clonotyping using MiXCR.184,185 By virtue of its reliance on RNAseq data, this

43

analysis is restricted to IG alleles that are expressed. Dominant clonotypes were defined

as those with a clonal fraction of at least 30% (Figure 2.17A). To eliminate spurious

clonotypes, I ignored any clonotypes with fewer than 30 supporting reads. The lack of

similar RNAseq data from environmentmatched controls preclude the comparison with

healthy reportoires. Here, I focused on the V gene segments of dominant clonotypes from

both the heavy and light chains because of their increased diversity.

I identified dominant clonotypes for the heavy and light chains in 96 (82%) and 104 (89%)

cases (N = 117), respectively. In order to account for tumours in which clonal

rearrangements were undetectable, I considered the number of reads attributable to IG

genes. As expected, the limited ability to detect rearrangements in these tumours can be

explained by their reduced heavy and light chain expression (Pvalues = 1.2 × 10−7 and

5.7 × 10−4, respectively, Mann–Whitney U test; Figure 2.17B). Among the dominant

clonotypes that were detected, V segment usage in BL appeared nonrandom, with a

small subset of V segments accounting for most of the clonotypes. Specifically, the five

most commonly used heavy and light chain V segments accounted for 44% and 41% of

dominant clonotypes, respectively. The pattern in BL (N = 117 cases) is similar to what is

seen in DLBCL (N = 323 cases; Figure 2.18A).170 While some V genes appear

differentially utilized between BL and DLBCL (e.g. IGHV320 and IGKV41), none of these

differences are significant (Qvalues > 0.1, Fisher’s exact test). In BL, the most recurrently

used heavy chain V segments were IGHV434 (16 %), IGHV330 (10 %), and IGHV37

(7.3 %). The most frequently used light chain V segment was IGKV320 (20 %). I was

able to recapitulate these findings using the WGS data, however less stringent criteria

were required owing to the lower coverage (Figure 2.18B). These results are consistent

with the established notion that BL relies on BCR activity for promoting PI3K signaling and

raises the possibility for positive selection of potentially autoreactive or antigendriven IG

clonotypes.94

44

30%

30

30%

30

30%

30

30%

30

30%

30

30%

30

Tumor Normal

IGH

IGK

IGL

10 100 1000 10000 10 100 1000 10000

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Clonal count

Clo

nal f

ract

ion

(nor

mal

ized

per

IG c

hain

)

Clonality Dominant Read count < 30 Read fraction < 30% Read fraction < 30% and read count < 30

A

*** **

Heavy chain Light chain

1e+01 1e+02 1e+03 1e+04 1e+05 1e+02 1e+03 1e+04 1e+05

Undetected

Detected

Read count

Clo

nal B

CR

B

Figure 2.17: Dominant immunoglobulin rearrangements. (A) Clonal fraction estimates and countsfor immunoglobulin heavy and light chain clones. Clonal (or “dominant”) rearrangements (shown inred) must have a minimum clonal fraction of 30% (indicated by horizontal dashed line) and at least30 supporting reads (indicated by vertical dashed line). (B) Total read count per sample supportingheavy and light IG chain clones according to whether a dominant clone was detected. Significancebrackets: **, Pvalue < 0.001; ***, Pvalue < 0.00001 (Mann–Whitney U test).

45

IGH IGK IGL

IGHV4−

34

IGHV3−

23

IGHV3−

30

IGHV3−

7

IGHV4−

39

IGHV4−

59

IGHV3−

48

IGHV3−

21

IGHV3−

15

IGKV3−

20

IGKV4−

1

IGKV1−

39

IGKV3−

15

IGKV1−

5

IGKV1−

33

IGKV3−

11

IGLV

1−40

IGLV

2−14

IGLV

1−51

IGLV

3−19

IGLV

1−44

IGLV

3−25

0%

5%

10%

15%

20%

V g

ene

usag

e

Disease BL DLBCL

RNA−seq dataA

IGH IGK IGL

IGHV4−

34

IGHV3−

23

IGHV3−

30

IGHV3−

7

IGHV4−

39

IGHV4−

59

IGHV3−

48

IGHV3−

21

IGHV3−

15

IGKV3−

20

IGKV4−

1

IGKV1−

39

IGKV3−

15

IGKV1−

5

IGKV1−

33

IGKV3−

11

IGLV

1−40

IGLV

2−14

IGLV

1−51

IGLV

3−19

IGLV

1−44

IGLV

3−25

0%

10%

20%

V g

ene

usag

e

WGS dataB

Figure 2.18: Immunoglobulin V gene usage. (A) Percent prevalence of immunoglobulin V genesamong dominant IG rearrangements in BL (N = 106) and DLBCL (N = 256) tumours with RNAseqdata. (B) Percent prevalence of immunoglobulin V genes among dominant IG rearrangements inBL (N = 91) tumours with WGS data, shown in the same order as panel A. V genes that aredominant in less than 10 BL tumours are not displayed.

46

2.3 Materials and methods

2.3.1 Case accrual

Additional details relating to case accrual can be found online in the standard operating

procedures (SOPs).1

Cohort

The cases were accrued at the following tissue source sites: Uganda Cancer Institute

(UCI, Uganda), Epidemiology of Burkitt’s Lymphoma in EastAfrican Children and Minors

(EMBLEM, Uganda), Children’s Oncology Group (COG, USA) who participated in a

clinical trial AALL1131, and St. Jude Children’s Research Hospital (USA). Contributing

tissue source sites provided documentation for Institutional Review Board approval for the

use of tissues submitted for molecular characterization. Clinical data was collected for

each case including initial enrollment data and one year and twoyear outcome data

(details below). The discovery cohort consisted of 91 paediatric BL cases originating from

patients aged between two and 20 years. BL subtypes within this cohort included 74

endemic and 17 paediatric sporadic cases (see Table 1 for details). Each BL case had

both tumour and matched normal tissue (blood, peripheral blood mononuclear cells,

lymph nodes, etc.), and the tumour was collected prior to any treatment. All cases had a

standardized central pathology review by three BL pathologists and confirmed as BL

diagnosis (details below). Once the diagnosis was confirmed, the tumour tissue used for

molecular characterization was evaluated for tumour nuclei and necrosis (details below).

The cases which did not meet the criteria of discovery, lacked matched normal tissue,

normal DNA, or the RNA was degraded or essential clinical data was missing, were

considered for validation. Validation cases with tumour and normal DNA were ultimately

selected for targeted sequencing and validation tumours with sufficient RNA also

underwent RNA sequencing (details below).

1https://ocg.cancer.gov/sites/default/files/BLGSP_SOP_manual.pdf

47

https://ocg.cancer.gov/sites/default/files/BLGSP_SOP_manual.pdf

Clinical data

The clinical data were collected by Nationwide Children’s Hospital (Columbus, OH) from

contributing sites after cases were accepted into the discovery or validation cohorts.

Followup data were then collected for two subsequent years. The clinical report form,

followup form, and treatment form can be found within the project standard operating

procedures (SOP #303). The following types of clinical information were collected:

demographic data (date of birth, sex, race, ethnicity, height, weight, vital status), tumour

information [date of diagnosis, tumour anatomic location, tumour status (tumour free/with

tumour), stage, lymph node status, history of prior cancers, synchronous cancers and

subsequent cancers], HIV status [HIV antibody status, date of diagnosis, CD4 counts, HIV

RNA load, Center for Disease Control and Prevention (CDC) HIV risk group,

coinfections, prior acquired immune deficiency syndrome (AIDS)defining conditions],

infectious disease status (hepatitis B virus, hepatitis C virus, Helicobacter pylori, malaria,

EBV), and treatment information [treatment type, tumour response, treatment dates,

highly active antiretroviral therapy (HAART) treatment status]. All dates and other

personally indefinable information were obfuscated prior to submission to the Office of

Cancer Genomics Data Coordinating Center in extensible markup language (XML) and

tabdelimited formats.2

Consensus pathology review

Consensus anatomic site classification

Anatomic site classification was performed by consensus review based on data reported

for sites of disease involvement. Many of the African cases did not have assessment of

bone marrow, cerebrospinal fluid, or total body imaging. Cases were classified into the

following categories: (A) Disseminated disease with no bone marrow (BM) and/or central

nervous system (CNS) involvement, documented disease involvement; (B) Headonly,

disease involvement of jaw with or without adjacent nodal involvement; (C)

Intraabdominal disease, disease confined to abdominal organs with or without abdominal

lymph node involvement; (D) Disseminated disease, disease involvement on both sides of

2https://ocg.cancer.gov/programs/cgci/datamatrix

48

https://ocg.cancer.gov/programs/cgci/data-matrix

diaphragm, but no documented BM or CNS involvement; (E) Unknown, insufficient data to

classification anatomic involvement.

2.3.2 Sample processing and nucleic acid extraction

Frozen specimens were shipped to and from Nationwide Children’s Hospital (Columbus,

OH) using a cryoport that maintained an average temperature of less than 180°C (SOP

#308). A top and bottom histologic section were cut from tumour and uninvolved tissue (if

it was to be used for healthy tissue control) for pathologic quality control review. These

were either stained with H&E or WrightGiemsa and imaged at 40X using an Aperio AT

Turbo or Aperio AT2 scanner. Images were reviewed by a boardcertified pathologist to

confirm that the tumour specimen was histologically consistent with BL, and that

uninvolved specimens contained no tumour cells. The tumour sections were required to

contain a minimum of 50% tumour cell nuclei, and less than 50% necrosis for inclusion in

the study. Nearly all samples had less than 20% necrosis.

RNA and DNA were extracted from fresh frozen (FF) (SOP #305) and FFPE tumour (SOP

#315316) and normal tissue specimens (mainly blood or granulocytes) using a

modification of the DNA/RNA AllPrep kit (Qiagen). Frozen samples were homogenized

and applied to a Qiagen DNA column, and FFPE samples were deparaffinized and

applied to a Qiagen FFPE DNA column. The flowthrough from the Qiagen DNA column

was processed using a mirVana miRNA Isolation Kit (Ambion) for FF tissues, and a High

Pure miRNA Kit (Roche) for FFPE tissues. This latter step generated RNA preparations

that included RNA <200 nt suitable for miRNA analysis. DNA was extracted from blood

using the QiaAmp blood midi kit (Qiagen; SOP #307).

DNA was quantified by PicoGreen assay, and was resolved by 1% agarose gel

electrophoresis to confirm high molecular weight fragments. A custom Sequenom single

nucleotide polymorphism (SNP) panel or the AmpF/STR Identifiler (Applied Biosystems)

was utilized to verify tumour DNA and germline DNA were derived from the same patient.

One hundred nanograms of each tumour and normal DNA were sent in duplicate to

Qiagen for REPLIg whole genome amplification using a 100 µg reaction scale. RNA was

quantified by measuring Abs260 with a ultraviolet spectrophotometer, and integrity was

49

measured using the RNA6000 nano assay (Agilent) to determine the RNA Integrity

Number for FF samples or DV200 for FFPE samples.

For inclusion in the discovery set, a tumour needed to pass pathology consensus review

(University of Nebraska Medical Center, Omaha, NE) and the specimen pathology quality

control review (Nationwide Children’s Hospital, Columbus, OH). In addition, a primary

tumour and a matched germline (blood, buccal, or uninvolved tissue) sample needed to

pass the following metrics: a minimum of 0.7 µg of DNA from FF or 0.25 µg of DNA from

FFPE, and 3 µg RNA from FF or 1 µg RNA from FFPE. The minimum RNA integrity

metrics were an RNA Integrity Number above 7.0 or DV200 above 30. Cases that did not

meet these metrics were included in the validation set if there was at least 0.7 µg of DNA

from the primary tumour available for DNA sequencing. Tumour RNA sequencing was

also performed for validation cases if there was sufficient RNA material.

2.3.3 Library construction and sequencing

Whole genome sequencing of fresh frozen samples

WGS libraries were constructed from DNA provided by Nationwide Children’s Hospital

(Columbus, OH) using a polymerase chain reaction (PCR)free protocol. To minimize

library bias and coverage gaps associated with PCR amplification of high GC or ATrich

regions, a version of the TruSeq DNA PCRfree kit (E68756877BGSC, New England

Biolabs) was implemented, automated on a Microlab NIMBUS liquid handling robot

(Hamilton). Briefly, 500 ng of genomic DNA was arrayed in a 96well microtitre plate and

subjected to shearing by sonication (Covaris LE220). Sheared DNA was endrepaired and

size selected using paramagnetic PCRClean DX beads (C1003450, Aline Biosciences)

targeting a 300400 bp fraction. After 3’ Atailing, full length TruSeq adapters were ligated.

Libraries were purified using paramagnetic (Aline Biosciences) beads. PCRfree genome

library concentrations were quantified using a qPCR Library Quantification kit (KAPA,

KK4824) prior to sequencing with pairedend 150 base reads on the Illumina HiSeqX

platform using V4 chemistry according to manufacturer recommendations.

50

Whole genome sequencing of formalinfixed, paraffinembedded samples

A 96well library construction protocol was performed from FFPE tissue extracted

genomic DNA provided by Nationwide Children’s Hospital (Columbus, OH). Since DNA

extracted from FFPE tissue will be damaged by the fixation process and prolonged

storage in nonideal conditions, variable DNA quality across the collection is expected

with some highly degraded samples. DNA was normalized to 500 ng in a volume of 62 μL

elution buffer (Qiagen) and transferred into a microTUBE plate for shearing on an LE220

(Covaris) acoustic sonicator using the conditions: Duty Factor, 20%; Peak Incident Power,

450W; Cycle per burst, 200; Duration, 2 x 60 seconds with an intervening spin. The profile

of sheared FFPE DNA extracted by the Qiagen Allprep DNA/RNA FFPE protocol has a

dominant DNA peak in the size range between 300 and 400 bp. To improve library quality

of FFPEderived DNA, solid phase reversible immobilization (SPRI) beadbased size

selection was performed before library construction to remove smaller DNA fragments

from highly degraded FFPE DNAs. If not removed early in the library construction

process, these smaller fragments would otherwise dominate the final amplified library.

FFPE DNA damage and endrepair and phosphorylation were combined in a single

reaction using an enzymatic premix (NEB), then bead purified using a 0.8:1

(bead:sample) ratio to remove small FFPE fragments. Repaired DNA fragments were next

Atailed for ligation to pairedend, partial Illumina sequencing adapters then purified twice

with SPRI beads (1:1 ratio). Fulllength adaptered products were achieved by performing

8 cycles PCR with primers introducing faulttolerant hexamer “barcodes” allowing

multiplexing of libraries. Indexed PCR products were double purified with 1 1:1 bead ratio.

Concentration of final libraries was determined using size profiles obtained from a high

sensitivity Caliper LabChip GX together with QuantiT (Invitrogen) quantification.

Strandspecific ribosomal RNA depletion RNA sequencing

RNAseq libraries were constructed from RNA provided by Nationwide Children’s Hospital

(Columbus, OH) using a strandspecific ribosomal depletion protocol. To remove

cytoplasmic and mitochondrial ribosomal RNA (rRNA) species from total RNA NEBNext

rRNA Depletion Kit for Human/Mouse/Rat was used (NEB, E6310X). Enzymatic reactions

were setup in a 96well plate (Thermo Fisher Scientific) on a Microlab NIMBUS liquid

51

handler (Hamilton Robotics, USA). 100 ng of DNase I treated total RNA in 6 µL was

hybridized to rRNA probes in a 7.5 µL reaction. Heatsealed plates were incubated at

95°C for 2 minutes followed by incremental reduction in temperature by 0.1°C per second

to 22°C (730 cycles). The rRNA in DNA hybrids were digested using RNase H in a 10 µL

reaction incubated in a thermocycler at 37°C for 30 minutes. To remove excess rRNA

probes (DNA) and residual genomic DNA contamination, DNase I was added in a total

reaction volume of 25 µL and incubated at 37°C for 30 minutes. RNA was purified using

RNA MagClean DX beads (Aline Biosciences, USA) with 15 minutes of binding time, 7

minutes clearing on a magnet followed by two 70% ethanol washes, 5 minutes to air dry

the RNA pellet and elution in 36 μL DEPC water. The plate containing RNA was stored at

80°C prior to cDNA synthesis.

Firststrand cDNA was synthesized from the purified RNA (minus rRNA) using the

Maxima H Minus First Strand cDNA Synthesis kit (ThermoFisher, USA) and random

hexamer primers at a concentration of 8ng/µL along with a final concentration of 0.4 µg/µL

Actinomycin D, followed by PCR Clean DX bead purification on a Microlab NIMBUS robot

(Hamilton Robotics, USA). The second strand cDNA was synthesized following the

NEBNext Ultra Directional Second Strand cDNA Synthesis protocol (NEB) that

incorporates deoxyribose uridine triphosphate (dUTP) in the deoxyribose nucleoside

triphosphate (dNTP) mix, allowing the second strand to be digested using USERTM

enzyme (NEB) in the postadapter ligation reaction and thus achieving strand

specificity.

cDNA was fragmented by Covaris LE220 sonication for 130 seconds (2 x 65 seconds) at

a “Duty cycle” of 30%, 450W Peak Incident Power and 200 Cycles per Burst in a 96well

microTUBE Plate (P/N: 520078) to achieve 200250 bp average fragment lengths. The

pairedend sequencing library was prepared following the BC Cancer Agency Genome

Sciences Centre strandspecific, platebased library construction protocol on a Microlab

NIMBUS robot (Hamilton Robotics, USA). Briefly, the sheared cDNA was subject to

endrepair and phosphorylation in a single reaction using an enzyme premix (NEB)

containing T4 DNA polymerase, Klenow DNA Polymerase and T4 polynucleotide kinase,

incubated at 20°C for 30 minutes. Repaired cDNA was purified in 96well format using

PCR Clean DX beads (Aline Biosciences, USA), and 3’ Atailed (adenylation) using

52

Klenow fragment (3’ to 5’ exo minus) and incubation at 37°C for 30 minutes prior to

enzyme heat inactivation. Illumina PE adapters were ligated at 20°C for 15 minutes. The

adapterligated products were purified using PCR Clean DX beads, then digested with

USERTM enzyme (1 U/µL, NEB) at 37°C for 15 minutes followed immediately by 13

cycles of indexed PCR using Phusion DNA Polymerase (Thermo Fisher Scientific

Inc. USA) and Illumina’s PE primer set. PCR parameters: 98°C for 1 minute followed by

13 cycles of 98°C 15 seconds, 65°C 30 seconds and 72°C 30 seconds, and then 72°C 5

minutes. The PCR products were purified and sizeselected using a 1:1 PCR Clean DX

bead ratio (twice), and the eluted DNA quality was assessed with Caliper LabChip GX for

DNA samples using the High Sensitivity Assay (PerkinElmer, Inc. USA) and quantified

using a QuantiT dsDNA High Sensitivity Assay Kit on a Qubit fluorometer (Invitrogen)

prior to library pooling and sizecorrected final molar concentration calculation for Illumina

HiSeq2500 sequencing with pairedend 75 base reads.

miRNA sequencing

miRNA sequencing (miRNAseq) libraries were constructed from 1 µg total RNA provided

by Nationwide Children’s Hospital (Columbus, OH) using a platebased protocol

developed at the British Columbia Cancer, Genome Sciences Centre (BCGSC). Negative

controls were added at three stages: elution buffer was added to one well when the total

RNA was loaded onto the plate, water to another well just before ligating the 3’ adapter,

and PCR brew mix to a final well just before PCR amplification. A 3’ adapter was ligated

using a truncated T4 RNA ligase2 (NEB Canada, cat. M0242L) with an incubation at 22°C

for 1 hour. This adapter is an adenylated, singlestranded DNA with the sequence 5’

/5rApp/ ATCTCGTATGCCGTCTTCTGCTTGT /3ddC/, which selectively ligates to

miRNAs. An RNA 5’ adapter was then ligated, using T4 RNA ligase (Ambion USA, cat.

AM2141) and adenosine triphosphate (ATP), and was incubated at 37°C for 1 hour. The

sequence of the single strand RNA adapter is 5’

GUUCAGAGUUCUACAGUCCGACGAUCUGGUCAA 3’.

Upon completion of adapter ligation, 1st strand cDNA was synthesized using Superscript

II Reverse Transcriptase (Invitrogen, cat.18064 014) and RT primer (5’

CAAGCAGAAGACGGCATACGAGAT 3’). Firststrand cDNA provided the template for the

53

final library PCR, into which index sequences were introduced to enable libraries to be

identified from a sequenced pool that contains multiple libraries. Briefly, a PCR brew mix

was made with the 3’ PCR primer (5’ CAAGCAGAAGACGGCATACGAGAT 3’), Phusion

Hot Start High Fidelity DNA polymerase (NEB Canada, cat. F540L), buffer, dNTPs and

dimethyl sulfoxide (DMSO). The mix was distributed evenly into a new 96well plate. A

Microlab NIMBUS robot (Hamilton Robotics, USA) was used to transfer the PCR template

(1st strand cDNA) and indexed 5’ PCR primers into the brew mix plate. Each indexed 5’

PCR primer, 5’

AATGATACGGCGACCACCGACAGNNNNNNGTTCAGAGTTCTACAGTCCGA 3’,

contains a unique sixnucleotide ‘index’ (shown here as N’s), and was added to each well

of the 96well PCR brew plate. PCR was performed at 98°C for 30 seconds, followed by

15 cycles of 98°C for 15 seconds, 62°C for 30 seconds and 72°C for 15 seconds, and

finally a 5 minute incubation at 72°C. Library qualities were assessed across the whole

plate using a Caliper LabChipGX DNA chip. PCR products were pooled and size selected

to remove larger cDNA fragments and smaller adapter contaminants, using a 96channel

automated size selection robot that was developed at the BCGSC. After size selection,

each pool was ethanol precipitated, quality checked using an Agilent Bioanalyzer

DNA1000 chip and quantified using a Qubit fluorometer (Invitrogen, cat. Q32854). Each

pool was diluted to a target concentration for cluster generation and loaded into a single

lane of an Illumina HiSeq2500 flow cell. Clusters were generated, and lanes were

sequenced with a 31nt main read for the insert and a 7nt read for the index.

Targeted sequencing by custom hybridization capture

Targeted sequencing libraries were constructed from DNA provided by Nationwide

Children’s Hospital (Columbus, OH) using a custom hybridization capture protocol. 50 ng

from each of 20 or 21 whole genome libraries was pooled prior to custom capture using

Agilent SureSelect XT Custom probes (4.8 Mbp) targeting 74,809 human and EBV

features.3 The features included the following: exons of recurrently mutated genes with

the exception of known targets of passenger mutations (e.g. TTN, mucin genes); exons of

several known DLBCL genes; exons of previously reported BL genes not found mutated

3https://cgcidata.nci.nih.gov/PreRelease/BLGSP/targeted_capture_sequencing/DESIGN/

54

https://cgci-data.nci.nih.gov/PreRelease/BLGSP/targeted_capture_sequencing/DESIGN/

in this data; whole gene bodies for DDX3X (chrX:4133277541364961, GRCh38) and

FBXO11 (chr2:4778263947907718); whole gene bodies and flanking regions for ID3

(chr1:2355791823657826) and BCL6 (chr3:187718649188265924); the recurrently

rearranged region surrounding MYC (chr8:127242368129788153); and noncoding

mutation peaks (details below). The pooled libraries were hybridized to the RNA probes at

65°C for 24 hours. Following hybridization, streptavidincoated magnetic beads (Dynal,

MyOne) were used for custom capture. Postcapture material was purified on MinElute

columns (Qiagen) followed by postcapture enrichment with 10 cycles of PCR using

primers that maintain the libraryspecific indices. Pooled libraries were sequenced on an

Illumina HiSeq 2500 instruments with v4 chemistry generating 125 base pairedend

reads.

2.3.4 Data analysis

Sequencing read alignment

WGS and targeted sequencing reads were aligned to the human reference genome

(GRCh38) with BWAMEM (version 0.7.6a; parameters: M).186,187 The human reference

genome that was used is a version of GRCh38 without alternate contigs that includes the

Epstein–Barr viral genome (GenBank accession AJ507799.2), which can be

downloaded.4 Read duplicate marking was done using sambamba (version 0.5.5).188

RNAseq reads were pseudoaligned using Salmon (version 0.8.2; details below).189 The

RNAseq reads were also aligned to the reference genome indicated above using the

JAGuaR pipeline.190 Tumour and matched normal WGS data for 15 cases from the ICGC

were obtained through a Data Access Compliance Office (DACO)approved project using

a virtual instance on the Cancer Genome Collaboratory.97,191 The ICGC WGS reads were

realigned using the above parameters.

Tumour EBV status and genome type

Owing to missing data from most cases, I devised a computational approach to directly

infer tumour EBV status and genome type from tumour WGS and RNAseq data. To

determine tumour EBV status, the fraction of reads aligning to the EBV genome was

4http://www.bcgsc.ca/downloads/genomes/9606/hg38_no_alt/bwa_0.7.6a_ind/genome/

55

http://www.bcgsc.ca/downloads/genomes/9606/hg38_no_alt/bwa_0.7.6a_ind/genome/

calculated using Samtools (version 1.6).186 Tumours were considered to be EBVpositive

when the EBV fraction of WGS reads was greater than 0.00006 (calculated from the

fraction represented by the EBV genome in the reference genome) and the number of

RNAseq reads mapped to the EBER1 (chrEBV:66296795) and EBER2

(chrEBV:69567128) loci in the JAGuaRbased alignments was greater than 250. There

were no cases with discordant EBV statuses inferred from the WGS and RNAseq data.

Although EBER expression was not quantified for the ICGC tumours because their

RNAseq data were not used in this project, they were all classified as EBVnegative

according to their WGS data, which is consistent with the EBV status reported by the

MMMLseq project. The minimum fraction of EBV reads was 0.01 for samples that

underwent targeted sequencing to account for the different ratio of human and EBV

genomic regions due to hybridization capture. EBV genome type was inferred for

EBVpositive tumours by comparing the counts for 21mers that are unique to either EBV

type 1 (GenBank accession NC_007605.1) or type 2 (GenBank accession NC_009334.1).

Kmer counting was performing on tumour WGS reads aligned to the EBV genome using

Jellyfish (version 2.2.6).192 EBV genome type was inferred to be type 1 or type 2 if the

count ratio of EBV type 1–specific kmers to EBV type 2–specific kmers was greater than

or lesser than 1, respectively.

Simple somatic mutations

The Strelka workflow (version 1.0.14) was used to call SSMs. The default configuration

for data aligned with bwa (strelka_config_bwa_default.ini) was used with the exception of

filtering SNVs with a minimum quality somatic score (QSS) of 25 (default 15). For SNVs

and indels, reference and alternate allele counts were taken from the Strelka output

variant call format (VCF) file.193 SNVs and indels were annotated using vcf2maf (version

1.6.12) and Ensembl Variant Effect Predictor (release 86).194 Transcript selection for

annotation was performed by vcf2maf with the following exception. Noncanonical

transcripts were instead selected if they were nonsynonymously mutated more

commonly than the canonical transcript (minimum increase of two affected cases). SNVs

and indels were further filtered for a minimum alternate allele count of six and a minimum

variant allele fraction (VAF) of 10% and 20% for FF and FFPE tumours, respectively.

Tumours with a median VAF below 25% were omitted from subsequent analyses due to

56

either excessive noise or low predicted tumour content. The same pipeline was used for

detecting SNVs and indels in the targeted validation sequencing data, with the exception

that depth filters were disabled for Strelka (isSkipDepthFilters = 1).

Significantly mutated genes

Considering only SNVs and indels, significantly mutated genes were identified using an

ensemble approach integrating four methods: MutSigCV, OncodriveFM, OncodriveFML,

and OncodriveCLUST.169,195–197 Mutations were lifted over from GRCh38 to GRCh37

using CrossMap (version 0.2.5) along with the “hg38ToHg19” chain file provided by the

UCSC Genome Browser.198,199 Lifting over variants was necessary because some of the

methods listed above rely on GRCh37 reference data. For consistency, the liftedover

mutations based on GRCh37 served as input for all methods. Nonsynonymous mutations

were defined as those with one of the following values in the Mutation Annotation Format

(MAF) file Variant_Classification field, as annotated by vcf2maf: Splice_Site,

Nonsense_Mutation, Frame_Shift_Del, Frame_Shift_Ins, Nonstop_Mutation,

Translation_Start_Site, In_Frame_Ins, In_Frame_Del, or Missense_Mutation. To minimize

noise, I only considered genes deemed significant (Qvalue < 0.1) by two or more

methods.

BLassociated genes

I defined BLGs as any gene deemed significantly mutated in this study or previously

described as recurrently mutated in BL with at least five affected patients in the discovery

cohort. Only nonsynonymous simple somatic mutations and copy number variations

(minimum size 10 kbp) were considered. To avoid considering mainly largescale events,

copy number variations affecting a BLG were required to be relatively small with a median

size of 10 Mbp or less. For each BLG, additional cryptic splicing variants (with support for

aberrant splicing in RNAseq data), structural variations, and copy number variations

were manually curated.

Noncoding mutation peaks

Pvalues were empirically determined for each peak by comparing its mutation rate with

an empirical distribution produced by calculating the mutation rates of identically sized

57

regions randomly sampled across the genome. The smallest and largest mutated position

on each chromosome were used to determine the range of positions available for

sampling with replacement. Positions that overlapped gaps in the reference genome such

as centromeres and telomeres were excluded. A “pseudopeak” was created from a

sampled position by extending each side to create regions with the same size as the

given mutation peak. The mutation rate of 100,000 such pseudopeaks was calculated to

generate the empirical null distribution of mutation rates genomewide. The empirical

Pvalue was calculated as the number of pseudopeaks with a higher mutation rate than

the given mutation peak divided by 100,000. Given that each mutation peak is tested

against independent null distributions, the Pvalues did not require multiple test

correction. All peaks had empirical Pvalues < 0.001 and were thus significantly mutated

above background rates.

Enrichment for AICDAmediated mutations

A bespoke algorithm was implemented in Python (version 3.6.1) to determine whether

certain regions, such as significantly mutated genes and noncoding mutation peaks,

were enriched for SNVs and indels consistent with AICDAmediated mutagenesis.200,201

Enrichment for putative AICDAmediated mutations in a given region was measured using

two binomial exact tests. First, the observed number of mutations affecting AICDA

recognition sites (number of successes), defined as regions that fit the AICDA motif

(RGYW), was compared to the expected number of such mutations, which was calculated

from the region’s mutation rate (probability of success) and the number of bases that

overlap AICDA recognition sites (number of trials). Second, the observed number of

mutations affecting the guaninecytosine pair targeted by AICDA (number of successes)

was compared to the expected number of such mutations, which was calculated from the

region’s mutation rate of guaninecytosine pairs (probability of success) and the number

of target guaninecytosine pairs in AICDA recognition sites (number of trials). Mutation

rates were calculated using the effective region size, which is equal to the product of the

region size and the cohort size. The effective region size ensures that the observed

number of mutations (number of successes) is never higher than the region size (number

of trials). Care was taken to avoid doublecounting mutations if they overlapped more than

one AICDA recognition site. This process was repeated for all regions of interest. The

58

regions for BLassociated genes were based on the transcripts that were affected by

nonsynonymous as opposed to entire gene bodies. The entire regions of noncoding

mutation peaks were considered. The inhouse program also annotated mutations based

on whether they overlapped an AICDA recognition site.

De novo mutational signatures

Mutational signatures were discovered using the previously described framework by

Alexandrov et al..202 I summarized somatic SNVs based on their mutational subtype, 5’

context, and 3’ context. This resulted in a mutation catalog matrix of 96 SNV classes for

each sample. I performed nonnegative matrix factorisation on the mutation catalog to

discover mutational signatures within the entire cohort. Signature stability was computed

by bootstrap resampling over 1000 total iterations (10 iterations in each of 100 cores).

The optimal nsignature solution, nopt, which simultaneously maximised signature stability

and minimised the Frobenius reconstruction error, was automatically selected,

nopt = argminn

(Rn − min(R)

max(R) − min(R)− Sn − min(S)max(S) − min(S)

),

where R and S are the vectors containing reconstruction errors and stability of each

nsignature solution, and Rn and Sn are the reconstruction error and stability of the

nsignature solution. This approach determined that the foursignature solution was

optimal. To determine matches to known mutational signatures, cosine similarity metrics

were computed against the 30 COSMIC reference mutational signatures. Where more

than one signature matched to a single COSMIC signature, the highest similarity match

was chosen and the remaining signatures were matched to the next most similar

COSMIC signature. For each nsignature solution, the Pearson correlation was calculated

between the age at diagnosis for each case and the predicted number of mutations

attributable to de novo signatures associated with age (COSMIC reference signatures 1

and 5), taking the maximum correlation if both COSMIC signatures were paired. Similarly,

for each nsignature solution, the Pearson correlation was calculated between AICDA

expression for each case and the predicted number of mutations attributable to the de

novo signature associated with AICDA activity (COSMIC reference signature 9).

59

Somatic structural variations

Somatic SVs were detected using the Manta pipeline (version 1.1.0) in paired

tumournormal mode using default parameters with the exception of a minimum somatic

score (SOMATICSCORE) of 45 (default 30).203 In FFPE samples, any inversions smaller

than 500 bp were considered noise and ignored. Variant allele fractions were calculated

from the reference and alternate allele counts reported in the Manta output variant call

format file. These files were converted to BEDPE format using the vcftobedpe tool from

the svtools package (version 0.3.2, commit 6d7b6ec8).204 SVs that overlapped any of the

significantly mutated genes were manually curated for inclusion as nonsynonymous

mutations. IGMYC translocations were identified as being any SV that met the following

conditions: (1) one breakpoint was near MYC (chr8:126393182130762146); (2) the

breakpoint near MYC was oriented such that exons 2 and 3 are included in the

rearrangement; (3) the other breakpoint was near an immunoglobulin heavy or light chain

locus, namely IGH (chr14:104589639107810399), IGK (chr2:8799951890599757), or

IGL (chr22:2103146523905532); and (4) the highestscoring translocation was selected

in the event of multiple candidate SVs. Tumours in which Manta failed to detect a

translocation that met the above criteria were manually inspected for such events, which

revealed IGMYC rearrangements in all remaining cases.

Somatic copy number variations

Sequenza was used to call somatic CNVs in tumournormal pairs.205 Sequenza

bam2seqz (parameters: –qlimit 30) generated the SEQZ files, which were then binned

using Sequenza seqzbinning (parameters: w 300 s). To eliminate noise, the putative

germline heterozygous positions identified by Sequenza were postfiltered to retain only

those represented in dbSNP (downloaded 20170403) “common all” single nucleotide

polymorphisms. Using bedtools intersect (parameters: wa), germline heterozygous

positions were removed if they overlapped gaps in the reference genome (e.g.

centromeres) or segmental duplications, which were obtained from the UCSC Table

Browser.206,207 Previously, the segmental duplications were merged if they overlapped

one another using bedtools merge, then filtered for a minimum size of 10 kbp, and

subsequently merged again using bedtools merge (parameters: d 10000). The Sequenza

60

R package was used to load the binned SEQZ data, fit a model for cellularity and ploidy,

and generate CNV segments.205 Sequenza was made aware of the sex of each case to

properly handle CNVs on the sex chromosomes. To simplify model fitting and avoid

incorrect local optima, ploidy and cellularity options were restricted as follows. Ploidy was

limited to the range between 1.8 and 2.5. Cellularity was restricted to an estimate of

tumour content derived from the VAF of SNVs and indels, defined as twice the VAF

corresponding to the first local density maximum below 50%.

Gene expression quantification

The tximport Bioconductor R package was used to summarize transcriptlevel read

counts at the gene level.208 The DESeq2 Bioconductor R package was used to correct

the read counts for library size and to perform a variancestabilizing data

transformation.209 These variancestabilized expression values were used for statistical

tests that require homoskedastic data.

miRNA expression profiling was performed separately on the miRNA sequencing data

using Canada’s Michael Smith Genome Sciences Centre miRNA processing pipeline,

which was used for The Cancer Genome Atlas project.210 The analysis was done using

miRBase release 21.211–215

Clonal Bcell receptors

MiXCR (version 2.1.3) was used to identify immunoglobulin heavy and light chain clones

from the RNAseq and WGS data as per the standard pipeline described in their

documentation.184,185 The MiXCR pipeline was also run on 323 DLBCL tumour samples

that underwent a strandspecific poly(A)selection RNAseq protocol.166 All RNAseq

reads were aligned using “mixcr align” (parameters: p rnaseq

OallowPartialAlignments=true) while for the WGS data, only reads originating from the

immunoglobulin regions (chr2:8866807890584447, chr14:105548159107030529, and

chr22:2189731823046831) or unmapped reads were aligned using “mixcr align”

(parameters: p rnaseq OallowPartialAlignments=true

OvParameters.geneFeatureToAlign=VGeneWithP). Two rounds of contig assembly was

performed using “mixcr assemblePartial” followed by clone assembly using “mixcr

61

assemble”. Clones were exported using “mixcr exportClones” (parameters: o t) options

to exclude any clones with outofframe sequences or stop codons. Clonal fraction was

calculated for heavy and light chains separately. Dominant clones in the RNAseq data

were defined as having a clonal fraction of at least 30% with a minimum of 30 supporting

reads. For the WGS analysis, dominant clones were defined as having the greatest clonal

fraction with at least two supporting reads. The topscoring V, D, J and C genes were

selected for each clone when multiple genes were possible.

Data and statistical analyses

Data and statistical analyses were done using the R statistical programming language

(version 3.4.2).216 Mann–Whitney U tests and Fisher’s exact tests were used where

appropriate with the wilcox.test and fisher.test functions in R, respectively. Correlation

between continuous variables was tested using Pearson’s productmoment correlation

coefficient with the cor.test function in R. Mutual exclusivity between mutations in different

genes was evaluated using the CoMEt exact test with the comet_exact_test function from

the cometExactTest package.174,175 Multiple hypothesis correction was performed using

the Benjamini–Hochberg method with the p.adjust function in R. Pvalues below 5% and

Qvalues (corrected Pvalues) below 10% were considered significant. Significantly used

R packages are listed below with their respective versions and citations.

Package Version References

argparse 1.1.1 217

bedr 1.0.4 218

biomaRt 2.32.1 219, 220

bookdown 0.7 221, 222

broom 0.4.3 223

circlize 0.4.1 224

cometExactTest 0.1.5 175

cowplot 0.9.3 225

data.table 1.11.4 226

DESeq2 1.16.1 227

dplyr 0.7.4 228

62

Package Version References

feather 0.3.1 229

flextable 0.4.4 230

forcats 0.2.0 231

GenomicRanges 1.28.6 232

ggbeeswarm 0.6.0 233

ggExtra 0.8 234

ggplot2 3.1.0 235

ggrepel 0.7.0 236

ggsignif 0.4.0 237

ggstance 0.3 238

Gviz 1.20.0 239

knitr 1.2 240, 241, 242

lsa 0.73.1 243

maftools 1.4.20 244

MassSpecWavelet 1.42.0 245

matrixStats 0.53.0 246

pheatmap 1.0.8 247

Publish 2018.04.17 248

purrr 0.2.5 249

RColorBrewer 1.12 250

readr 1.1.1 251

readxl 1.0.0 252

robustbase 0.927 253, 254

sequenza 2.1.2 205

tidyverse 1.1.1 255

tximport 1.4.0 256

viridis 0.4.1 257

63

Chapter 3

EBV defines a BL entity with distinctmolecular and pathogenicfeatures

3.1 Introduction

Our understanding of the genetic landscape of cancer has grown considerably over the

last few decades. We have also gained a concomitant appreciation of the intertumour

and intratumour heterogeneity that respectively exist between and within patient

tumours. This genetic heterogeneity has many clinical implications, most notably the

interplay between genetic features and treatment response or resistance. This newfound

appreciation has spurred the strategy of precision oncology, whereby patients are treated

based on the unique genetic makeup of their respective tumours. The goal of this

approach is simple: by taking into account the molecular features driving each tumour,

clinicians will be more successful in curing cancer. In practice, precision oncology hinges

on detailed knowledge of the mechanisms underpinning pathogenesis. Without this

knowledge, precision medicine would not be possible due to a lack of clinically actionable

(i.e. drugtargetable) genetic alterations.

On the surface, BL appears to be a poor candidate for precision medicine by virtue of

already being curable in most cases by standardofcare (i.e. intensive chemotherapy).

However, this view does not account for the toxicity of current treatment regimens geared

for BL, which severely degrades the quality of life for patients and can lead to additional

malignancies later in life. Additionally, this view is biased by the cure rates for children in

countries where proper supportive care is readily available.45 In reality, BL remains fatal

for children in subSaharan Africa, in part because healthcare delivery systems lack

capacity to administer intensive chemotherapy not to mention the poor outcome seen in

older patients, even in developed countries.48–51 When considering these issues, it

64

becomes clear that tailoring treatments for molecular features specific to BL presents an

opportunity to reduce both mortality and treatment morbidity in this patient population,

particularly those affected by BL in developing countries where this disease is particularly

common.

Currently, BL is classified based on geographic origin and immunocompetence: endemic

for cases diagnosed in malariaendemic areas, sporadic for cases diagnosed elsewhere,

and immunodeficiencyassociated for immunocompromised cases irrespective of locale.

While the endemic and sporadic subtypes differ from one another at the epidemiological

level, their definition has little basis in biology. Admittedly, both subtypes still have

important differences (e.g. tumour growth site), but considering disease pathogenesis

when stratifying patients is key for understanding treatment response and paving the way

for precision medicine. Compared to other cancers that have transitioned to molecularly

defined subtypes, the de facto classification system for BL appears outdated. Accordingly,

I hypothesized that there are common molecular features that more accurately explain

some of the observed differences in BL biology and clinical presentation. Specifically, I

hypothesized that the presence of EBV in BL tumours is more relevant for disease

aetiology than the geographic origin of the tumour. Finally, I also hypothesized that

additional molecular differences exist among EBVpositive tumours on the basis of EBV

genome type, namely type 1 and type 2.

In this chapter, I test these hypotheses by investigating the same BL dataset presented in

Chapter 2. Unlike previous studies, my cohort comprised patients representing two

common clinical variants, namely endemic and sporadic BL, whose samples were

processed using the same methodology, thus limiting technical sources of variation. The

high correlation between clinical variant and tumour EBV status introduced an analytical

challenge. Recall from Chapter 1 that most endemic cases are EBVpositive while most

sporadic cases are EBVnegative. However, this cohort included eight EBVnegative

endemic BLs and four EBVpositive sporadic BLs, which I termed “discordant” BL cases.

These discordant cases afforded an opportunity to distinguish between the features

associated with geography versus tumour EBV status.

65

Through this analysis, I found a number of mutational differences that are more strongly

associated with tumour EBV status than clinical variant. Despite having greater mutation

burden genomewide, EBVpositive tumours harboured fewer driver mutations,

particularly those affecting genes with roles in apoptosis such as TP53. The mutational

signatures I detected in BL genomes suggested that the increased mutation frequency in

EBVpositive tumours could be explained by defects in DNA mismatch repair and

elevated AICDA activity. Indeed, the presence of EBV was the most important variable in

determining AICDA expression level and aberrant somatic hypermutation. This level of

heterogeneity in BL has been previously underappreciated and presents new therapeutic

opportunities.

3.2 Results

3.2.1 Fewer driver mutations in EBVpositive BL despite mutationburden

Due to differences in sequencing coverage and tumour content, the mutation burden in BL

cannot be readily compared with other cancer cohorts. While downsampling sequencing

data was a possibility, I preferred to maintain sensitivity as high as possible. A comparison

of the mutation load among the BLGSP tumours, which had similarly high tumour content

and sequencing coverage, revealed one clear outlier (Figure 3.1A). I excluded case

BLGSP710600142 because its tumour genome was relatively hypermutated with

48,994 SSMs. The remaining BLGSP tumours featured 5,666 SSMs on average (range

1,481–14,115) and mutations from these cases were used for subsequent analyses.

Given the considerable range in mutation load among the remaining cases, I investigated

whether the number of mutations varied with any of the available patient or tumour

metadata (Figure 3.1B). Indeed, genomewide mutation burden was significantly

correlated with both geographic origin and tumour EBV status (Qvalues < 0.1,

Mann–Whitney U test). Based on median mutation counts, endemic and EBVpositive

tumours have 1.96 and 1.75fold more mutations than sporadic and EBVnegative

mutations, respectively. Similar differences were found when I separately considered

mutations within or outside noncoding mutation peaks described in Chapter 2. Lastly, the

same pattern was observed among nonsynonymous mutations affecting all

66

proteincoding genes. Hence, one could speculate that the greater mutation burden seen

in endemic and EBVpositive tumours could expedite the accumulation of driver

mutations.

To pursue this analysis further, I counted the number of putative driver mutations in each

case and made similar comparisons based on clinical variants and tumour EBV status

(Figure 3.2). Here, I defined putative driver mutations as nonsynonymous mutations (i.e.

SSMs, CNVs, and SVs) affecting any BLG, as determined in Chapter 2. Surprisingly,

despite having more mutations genomewide, EBVpositive tumours had significantly

fewer driver mutations (Qvalue = 0.0021, Mann–Whitney U test). On the other hand,

sporadic and endemic tumours lacked any difference in this regard (Qvalue = 0.368). In

other words, in the absence of EBV, there is a an elevated accumulation of driver

mutations, presumably compensating for the oncogenic role played by the virus. On the

other hand, I saw no difference in the number of driver mutations between tumours

infected with EBV type 1 and those infected with EBV type 2 (Qvalue = 0.815),

suggesting that EBV genome type is not as important, if at all, for BL

tumourigenesis.

3.2.2 Variation in mutation burden explained by mutational signatures

Considering the observed differences in mutation burden, I asked whether these could be

explained by the de novo mutational signatures identified in Chapter 2. For each sample, I

estimated the number of mutations contributed by each signature based on its exposure,

a measure of signature prevalence (Figure 3.3). Comparing BL genomes on the basis of

tumour EBV status or geographic origin, I found no difference in the number of mutations

related to BL signature A, which was associated with age. Similarly, no difference was

observed between EBV type 1–infected tumours and EBV type 2–infected tumours for

any of the signatures. On the other hand, a significantly higher representation of

mutations linked to BL signatures B, C, and D was found in EBVpositive and endemic

tumours (Qvalues < 0.1, Mann–Whitney U test). In other words, these three signatures

combined can account for the observed difference in genomewide mutation load. While

little is known about the aetiology underlying BL signature B, BL signatures C and D were

associated with defective DNA mismatch repair and AICDA activity, respectively. These

67

0

5

10

15

20

0 10000 20000 30000 40000 50000

Mutation burden (genome−wide)

Fre

quen

cyA

*

*

*

*

*

*

*

*

Clinical variant EBV status EBV type

All m

utationsV

ariants outsidem

utation peaksV

ariants insidem

utation peaksN

on−synonym

ousm

utations

Endemic BL Sporadic BL EBV−positive EBV−negative EBV type 1 EBV type 2

4000

8000

12000

16000

4000

8000

12000

16000

0

100

200

300

400

50

100

Mut

atio

n bu

rden

B

Figure 3.1: Genomewide mutation burden per BL subtype. (A) Distribution of the genomewidemutation burden across the discovery cohort. (B) Mutation frequency is shown for each diseasesubtype. From top to bottom, the following SSMs are considered in each tumour: all genomewideSSMs; SSMs outside mutation peaks; SSMs within mutation peaks; and nonsynonymous SSMsin any gene. This analysis was restricted to WGS data from the BLGSP discovery cohort excludingthe outlier (N = 90). Significance brackets: *, Qvalue < 0.1 (Mann–Whitney U test).

68

*



0

5

10

15F

requ

ency

of m

utat

ed B

LGs

Figure 3.2: Number of BLGs that are mutated in each BLGSP discovery and validation case. Allmutation types were considered. Discordant cases are highlighted as red points. Significancebrackets: *, Qvalue < 0.1 (Mann–Whitney U test).

findings indicate that these two mechanisms at least partially explain the greater mutation

burden in endemic or EBVpositive tumours independently of EBV genome type.

To isolate the source of this variation, I performed linear regression for each signature to

describe its relationship with relevant sample attributes (Table 3.1). As expected, BL

signature A was uniquely associated with age at diagnosis (Pvalue = 0.0021). While it

was significantly more common in endemic and EBVpositive tumours, BL signature B did

not associate specifically with any of the variables I considered (Pvalues > 0.05). In

contrast, BL signature C was found to be significantly associated with tumour EBV status

(Pvalue = 0.038) but not geographic origin (Pvalue = 0.23), suggesting a link between

EBV and DNA mismatch repair. Lastly, consistent with an aetiological link with AICDA, BL

signature D was strictly associated with AICDA expression (Pvalue = 0.00098). Notably,

neither BL signature B nor signature C correlated with AICDA expression, indicating that

these do not have a significant contribution from AICDA (Pvalues = 0.18 and 0.34,

respectively). In summary, I may partly attribute the difference in mutation burden to

defective DNA mismatch repair in EBVpositive tumours and variable AICDA

activity.

69

**

***

*

**

***

*


BL S

ignature A(C

OS

MIC

Sig. 5)

BL S

ignature B(C

OS

MIC

Sig. 17)

BL S

ignature C(C

OS

MIC

Sig. 15)

BL S

ignature D(C

OS

MIC

Sig. 9)


0

2000

4000

0

2500

5000

7500

10000

0

1000

2000

3000

0

2000

4000

6000

Est

imat

ed n

umbe

r of

mut

atio

ns

Figure 3.3: Prevalence of each mutational signature per BL subtype. Estimated number of singlenucleotide variants is shown per mutational signature for each disease subtype in the BLGSPdiscovery cohort excluding the outlier (N = 90). The four de novo mutational signatures (BL sig.)are annotated with the associated COSMIC reference signature (COSMIC sig.). ICGC cases wereexcluded to avoid the possible confounding effect of lower sequencing coverage. Significancebrackets: *, Qvalue < 0.1; **, Qvalue < 0.001; ***, Qvalue < 0.00001 (Mann–Whitney U test).

70

Table 3.1: Linear regression of mutational signatures. Linear regression of the estimated numberof mutations per signature (Sig.) as a function of various covariates. Tumor EBV status and clinicalvariant status were used as covariates in all models, age was used as a covariate for BL signatureA given its association with age, and AICDA expression was used as a covariate for BL signaturesB, C, and D. The linear models were also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).

BLSig.

Term Coefficient Standarderror

Bootstrap 95%CI (N = 10000)

Pvalue

EBV status (Ref: EBVpositive) 320.0 280 280 to 1000 0.26000Clinical variant (Ref: Endemic) 400.0 290 1000 to 110 0.16000

A

Age at diagnosis 80.0 25 24 to 160 0.00210


B

AICDA expression 180.0 140 950 to 120 0.18000


C



D


3.2.3 Proteinaltering mutations associated with tumour EBV status

Based on the observation that there are fewer driver mutations in EBVpositive tumours, I

identified the individual BLGs or biologically related gene sets (i.e. pathways) that were

differentially mutated based on geographic origin and/or tumour EBV status (Figure 3.4).

These results are summarized in Supplemental Table 9 of Appendix A. EBVnegative

tumours, but not sporadic tumours, more frequently had mutations in TP53 (Qvalue =

0.0044, Fisher’s exact test), a difference that became more striking when considering a

group comprising all BLGs with roles in apoptosis (Qvalue = 0.00024). I also found

differences in the mutation prevalence of SMARCA4 and CCND3 (Qvalues < 0.1), but I

was unable to confidently resolve whether these relate to geographic origin or EBV status.

In contrast to a previous report, I failed to identify any differentially mutated genes

between tumours infected by EBV type 1 and EBV type 2 (Qvalues > 0.1).163 In short, I

found greater contrast according to EBV status, consistent with the earlier observation

that the frequency of driver mutations varied based on the presence of EBV.

To confirm these findings, I compared tumour EBV status and clinical variant as predictors

of mutation status. For this analysis, I only considered differentially mutated genes and

71

CCND3

SMARCA4

Apoptosis

CCND3

SMARCA4

TP53

Clinical variant (Ref: Endemic BL) EBV status (Ref: EBV−positive) EBV type (Ref: EBV type 1)

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

0

1

2

3

log10(Odds ratio)

−lo

g 10(

Q−

valu

e)

Figure 3.4: Differential incidence of nonsynonymous mutations in molecular BL subtypes.Mutations are restricted to those affected BLGs. Significant differences are highlighted in red(Qvalues < 0.1, indicated by dashed line; Fisher’s exact test).

pathways, which were determined without including the 12 discordant cases. Among the

genes and pathways that were mutated in at least 10% of the cases, SMARCA4,

apoptosis, CCND3, and TP53 were differentially mutated (Qvalues < 0.1, Fisher’s exact

test). Tumour EBV status significantly outperformed geographic origin in predicting the

mutation status of the apoptosis pathway for the discordant cases (Pvalue = 0.0094,

McNemar’s test; Table 3.2). For the remaining genes, it remained inconclusive as to

whether their mutation status in the discordant cases were significantly better predicted

by EBV status or clinical variant (Pvalues > 0.05). Together, these findings demonstrate

that EBVpositive tumours are genetically defined by a paucity of mutations affecting

apoptotic genes, supporting the longstanding hypothesis that persistent EBV infection

abrogates apoptosis in BL tumour cells.

3.2.4 Deregulated AICDA activity in EBVpositive BL

My above analysis of mutational signatures revealed substantial variation in the number

of mutations predicted to be caused by BL signature C. Given that this signature is

aetiologically linked to AICDA activity, I compared AICDA expression based on geographic

origin and tumour EBV status (Figure 3.5A). Consistent with my earlier result, AICDA

expression was signicantly higher in endemic (Qvalue = 9.7 × 10−7, Mann–Whitney U

72

Table 3.2: McNemar’s test results. This table compares tumour EBV status and clinical variantstatus in their ability to predict the mutation status of genes or pathways that are differentiallymutated between EBVpositive eBLs and EBVnegative sBLs (i.e. excluding discordant cases).The McNemar’s test Pvalue indicates whether there is a significant difference in the predictiveperformance of tumour EBV status and clinical variant status.

Gene orPathway

EBV status Clinicalvariant

Mutatedcases

Unmutatedcases

Mutationprevalence

McNemar’stest

PvalueEBVpositive Endemic 27 63 30%EBVnegative Sporadic 13 5 72%EBVpositive Sporadic 1 3 25%

Apoptosis

EBVnegative Endemic 8 0 100%

0.0094

EBVpositive Endemic 9 81 10%EBVnegative Sporadic 8 10 44%EBVpositive Sporadic 0 4 0%

CCND3


0.7700


SMARCA4


0.3900


TP53


0.1500

test) and EBVpositive tumours (Qvalue = 1.9 × 10−8). Linear regression revealed a

stronger association of AICDA expression with tumour EBV status than with geographic

origin (Table 3.3). Consistent with this observation, if endemic and sporadic cases are

considered separately, EBVpositive tumours have higher AICDA expression for both

clinical variants (Figure 3.5B). After accounting for variation associated with EBV status,

geographic origin still significantly accounted for some of the remaining variation.

Altogether, these findings demonstrate that AICDA expression appears to be induced

especially in EBVpositive tumours, but there may also be an unexplained geographic

component to this phenomenon. This increased AICDA expression is expected to result in

enhanced aSHM, which was described as noncoding mutation peaks in Chapter 2.

3.2.5 EBV genome copy number uncorrelated with EBVassociatedeffects

Considering the above associations with tumour EBV status, I asked whether the number

of copies of the EBV genome per tumour cell correlated with the magnitude of the

73

*

Q = 2.9e−03

***

Q = 9.7e−07

***

Q = 1.9e−08 Q = 8.5e−01

Germinal centre Clinical variant EBV status EBV type

Centroblasts Centrocytes Endemic BL Sporadic BL EBV−positiveEBV−negative EBV type 1 EBV type 2

8

10

12

14

AIC

DA

exp

ress

ion

A

*

Q = 0.014

*

Q = 0.024

Endemic BL Sporadic BL

EBV−positive EBV−negative EBV−positive EBV−negative

8

10

12

14

AIC

DA

exp

ress

ion

B

Figure 3.5: AICDA expression per BL subtype. (A) Germinal centre samples (N = 12) are shownseparately from tumour samples (N = 117), which are partitioned according to differentclassification systems. Discordant cases are highlighted as red points. (B) VariancestabilizedAICDA expression in sporadic and endemic BL according to tumour EBV status. Significancebrackets: *, Qvalue < 0.1; ***, Qvalue < 0.00001 (Mann–Whitney U test).

74

Table 3.3: Linear regression of AICDA expression as a function of tumour EBV status and clinicalvariant status. This linear model was also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).



Pvalue

EBV status (Ref: EBVpositive) 1.30 0.27 1.9 to 0.46 6.4e06

Clinical variant (Ref: Endemic) 0.66 0.29 1.5 to 0.047 2.3e02

observed effects. I leveraged the stoichiometry of WGS reads and their relation to the

proportion of human and EBV DNA to estimate the EBV genome copy number. I

corrected for genome size, ploidy, and tumour content, which was estimated from the VAF

of clonal SSMs. An assumption for this analysis is that the EBV genome copies are

evenly distributed among the BL cells. The average EBV genome copy number per

tumour cell was 46 (range 13–189). Considering only EBVpositive tumours (N = 71), I

performed Spearman correlation tests for AICDA expression (Figure 3.6A) and

genomewide mutation burden (Figure 3.6B). In both cases, EBV genome copy number

did not correlate (Pvalues = 0.20 and 0.79, respectively), suggesting that the magnitude

of these effects is not related to the number of EBV copies per tumour cell.

Spearman correlation test

r = 0.052 / P = 0.679

10

11

12

13

25 50 75 100 125

EBV genome copy number per tumour cell

AIC

DA

exp

ress

ion

ASpearman correlation test

r = 0.15 / P = 0.22

5000

10000

25 50 75 100 125

EBV genome copy number per tumour cell

Mut

atio

n bu

rden

(ge

nom

e−w

ide)

B

Figure 3.6: Correlation between EBV genome copy number and (A) AICDA expression or (B)genomewide mutation burden.

75

3.2.6 Genetic comparison of intraabdominal and headonly tumours

As mentioned in Chapter 1, one of the most striking differences between endemic and

sporadic cases is the anatomic site affected by the tumour. Endemic cases mostly present

with jaw tumours while facial tumours are exceedingly rare in the sporadic setting; rather,

sporadic cases tend to present with abdominal tumours. Thus, I investigated whether

there were underlying molecular differences that could account for this contrast. While

differential gene expression analysis might seem suitable for this purpose, I encountered

many limitations of such an approach. Notably, normal tissue contamination from adjacent

and stromal cells would render it impossible to confidently assign any differences to the

tumour cells in bulk RNAseq. To avoid this issue, I focused on somatic genetic features

unique to the tumours. I compared the mutation incidence of every BLG and pathway

considered in Chapter 2 between tumours affecting different anatomical sites.

For this analysis, I selected 65 cases that were confidently annotated as facial or

intraabdominal tumours without lymph node involvement. Unfortunately, the ICGC cases

did not provide sufficient clinical metadata, which limited the number of sporadic cases

included in this analysis. The breakdown was 35 cases with jaw tumours and 30 cases

with abdominal disease (Figure 3.7A,B). As expected, 61% of endemic cases presented

with facial tumours, while no sporadic cases were annotated as such. No genes or

pathways had mutations that were significantly associated with anatomic site (Qvalues >

0.1, Fisher’s exact test; Figure 3.7C). That being said, one gene, FBXO11, had a Qvalue

of 0.12, indicating that there might be merit to this analysis, but I may have been

ultimately limited by the sample size.

3.2.7 Variable distribution of MYC breakpoints in BL subtypes

A known genetic feature of BL that warrants revisiting here is the variable distribution of

breakpoints affecting the MYC locus that are associated with an IG locus. As described in

Chapter 1, MYC breakpoints in sporadic cases are proximal to the TSS while they are

much more dispersed relative to MYC in endemic cases. I can recapitulate this result with

my data by comparing the absolute distance between the IGMYC breakpoint on

chromosome 8 and the MYC TSS among BL subtypes. Endemic and sporadic tumours as

well as EBVpositive and EBVnegative tumours both showed significant differences in the

76

0

10

20

30

40

50

Endemic BL Sporadic BL

Clinical variant

Num

ber

of c

ases

Anatomic siteHead−onlydiseaseIntra−abdominaldisease

A

0

10

20

30

40

50

EBV−positive EBV−negative

EBV status

Num

ber

of c

ases

Anatomic siteHead−onlydiseaseIntra−abdominaldisease

B FBXO11

Anatomic site (Ref: Head−only disease)

−2 −1 0 1 2

0.0

0.5

1.0

1.5

2.0

log10(Odds ratio)

−lo

g 10(

Q−

valu

e)

C

Figure 3.7: Genetic comparison of anatomic BL subtypes. (A) Number of endemic and sporadiccases per anatomic subtype. (B) Number of EBVpostive and EBVnegative cases per anatomicsubtype. (C) Differential incidence of nonsynonymous mutations in anatomic BL subtypes.Mutations are restricted to those affected BLGs. Significant differences are highlighted in red(Qvalues < 0.1, indicated by dashed line; Fisher’s exact test).

Table 3.4: Linear regression of the distance between MYC and the associated translocationbreakpoint on chromosome 8 (in kilobases) as a function of tumour EBV status and clinical variantstatus. This linear model was also bootstrapped 10,000 times to calculate bootstrap 95%confidence intervals (CI).



Pvalue

Clinical variant (Ref: Endemic) 14 43 140 to 180 0.76

EBV status (Ref: EBVpositive) 53 42 210 to 99 0.21

distance between the breakpoint and the MYC TSS (Pvalues = 0.0077 and 0.0099,

respectively; Mann–Whitney U test). However, linear regression was unable to assign this

variation to one classification system over the other (Pvalues = 0.76 and 0.21,

respectively; Table 3.4). These findings recapitulate what has been described previously,

but it remains unclear whether tumour EBV status is relatively a more important factor in

determining the IGMYC breakpoint location.

3.2.8 V gene usage not determined by tumour EBV status

In Chapter 2, I demonstrated that V gene usage was nonuniform for both heavy and light

IG chains. However, it was not clear whether specific antigens were eliciting the inclusion

77

of those V genes that were overrepresented among dominant clonotypes. Given the

polymicrobial origins of BL, namely the exposure to EBV and malaria, I investigated

whether a link existed between the presence of certain V genes and that of specific

pathogens. Here, I used the geographicallydefined clinical variants as a proxy for malaria

status with the assumption that most, if not all, endemic cases were infected at least once

by malaria. I also considered tumour EBV status as well as EBV genome type among the

EBVpositive cases. However, I found no significant difference in the prevalence of any of

the considered V genes between the various BL subtypes (Figure 3.8). The inconclusive

nature of these findings may not be surprising given that this IG repertoire analysis relied

on RNAseq rather than the more conventional highdepth targeted sequencing of the

CDR3 region. Further work on the BL repertoire of IG clonotypes is warranted.

3.3 Materials and methods

This chapter relies on the same dataset presented in Chapter 2. Similarly, most data

analyses were described in Chapter 2. The analytical methods that are specific to this

chapter are detailed below.

3.3.1 Data analysis

McNemar’s tests

Discordant cases were defined as EBVnegative endemic BL cases and EBVpositive

sporadic BL cases. Differentially mutated genes and pathways (referred to here as

features) were identified using the following criteria: (1) they must be mutated in at least

10% of cases, and (2) they were differentially mutated between EBVpositive endemic BL

cases and EBVnegative sporadic BL cases (Qvalue < 0.1, Fisher’s exact test).

Discordant cases were excluded from the Fisher’s exact tests to ensure that there is no

reason to believe a priori that the mutation status of these features are preferentially

associated with tumour EBV status or clinical variants. Following that, tumour EBV status

and clinical variant were used as naive predictors of the mutation status of these

differentially mutated features and determined whether or not they were correct for each

case. The performance of tumour EBV status and clinical variant as predictors were

compared using McNemar’s tests. Features with a significant difference according to the

78

IGH IGK IGL

Clinical variant

EB

V status

EB

V type

IGHV4−

34

IGHV3−

30

IGHV3−

7

IGHV4−

59

IGHV3−

23

IGHV3−

15

IGHV3−

21

IGHV4−

39

IGHV3−

48

IGKV3−

20

IGKV1−

39

IGKV1−

5

IGKV4−

1

IGKV3−

15

IGKV3−

11

IGKV1−

33

IGLV

3−25

IGLV

1−51

IGLV

2−14

IGLV

1−44

IGLV

1−40

IGLV

3−19

0%

10%

20%

30%

0%

10%

20%

30%

0%

10%

20%

30%

V g

ene

usag

e

SubtypeEndemic BL

Sporadic BL

EBV−positive

EBV−negative

EBV type 1

EBV type 2

Figure 3.8: Immunoglobulin V gene usage per BL subtypes. Percent prevalence ofimmunoglobulin V genes among dominant IG rearrangements in BL tumours with RNAseq data(N = 106). V genes that are dominant in fewer than 10 BL tumours in the RNAseq data are notdisplayed.

79

McNemar’s test (Pvalue < 0.05) indicate that the “winning” predictor is more strongly

associated with the mutation status of said features.

Data and statistical analyses

Data and statistical analyses were done using the R statistical programming language

(version 3.4.2).216 Mann–Whitney U tests, Fisher’s exact tests, and McNemar’s tests

were used where appropriate with the wilcox.test, fisher.test, and mcnemar.test functions

in R, respectively. Linear regressions were performed using the lm function in R and

bootstrapped 10,000 times to calculate bootstrap 95% confidence intervals using the boot

and boot.ci functions in R (adjusted bootstrap percentile interval).

80

Chapter 4

Discussion and future directions

BL is considered curable with intensive chemotherapy. In practice though, BL patients

suffer from severe side effects due to treatmentrelated toxicity, and many still die from the

disease or treatment complications. Currently, cure rates above 90% are only achievable

in children who have access to proper supportive care, consisting mostly of paediatric

sporadic cases. However, these fortunate patients represent only a minority of BL burden

worldwide considering the incidence of endemic cases, whose survival range from 45% to

70%.48,52,53 This reality motivated the genetic and molecular characterization of paediatric

endemic and sporadic BL presented in this thesis. Hereafter, I will discuss the main

findings from earlier chapters and their implications for the future of BL research.

4.1 De novo mutational signatures

The mutational landscape of BL is not uniform among BL tumours, as revealed by WGS.

Broadly speaking, the overall mutation burden was higher in endemic or EBVpositive

tumours, suggesting underlying differences in the mutational processes active in these

subtypes. In an attempt to understand the biological basis for these differences, I found

the genomes contained variable representations of four robust de novo mutational

signatures, each of which should be associated with a distinct aetiology. Based on

similarity to the reference COSMIC signatures, BL signatures A through D were

respectively attributed to age, an unknown mechanism, defective DNA MMR, and AICDA

activity. Given that only paediatric cases were considered here, it is not surprising that

there was no difference in the prevalence of the agerelated BL signature A on the basis

of geographic origin or tumour EBV status. On the other hand, the three other signatures

were all more prevalent in endemic or EBVpositive tumour genomes. Therefore, the

associated aetiology of each of these three signatures may account for the observed

variation in mutation burden across the discovery cohort. In other words, if the inferred

81

mechanisms are correct, most of the difference in mutation load can be explained by a

lack of DNA MMR and increased AICDA activity.

To refine this model of mutagenesis in BL, I used linear regression to assign variation in

the prevalence of these signatures to covariates such as geographic origin, tumour EBV

status, patient age, and tumour AICDA expression. The robustness of the mutation

signatures was confirmed by a strong association between BL signature A and age at

diagnosis, consistent with the signature’s presumed aetiology. In contrast, BL signature B

remained wholly unaccounted for given that it was not associated with any of the included

covariates. That being said, the lack of correlation with AICDA expression indicates that

this signature is not related to AICDA activity. Interestingly, the MMRrelated BL signature

C was significantly associated with tumour EBV status but not geographic origin. This is

consistent with a model wherein the presence of EBV results in an accumulation of

mutations due to insufficient or aberrant DNA repair. This suggests that the genomes are

in a more fragile state and raises the potential utility of DNAdamaging chemotherapy in

the context of EBVpositive BL. A link between EBV and DNA repair was reported in one

study, which described a loss of H3K4 trimethylation of DNA repair signalling genes due

to EBV in nasopharyngeal epithelial cells.258 This highlights the need to more thoroughly

characterize the BL epigenome in the context of EBV status, which has not been explored

to the same degree as the genome and transcriptome. In this case, DNA methylation

assays comparing EBVpositive and EBVnegative tumours could reveal the role for EBV

in genome and epigenome maintenance.

Lastly, the aetiology for BL signature D was confirmed by a linear correlation with AICDA

expression. After accounting for the contribution of AICDA expression, there was no

association with geographic origin or tumour EBV status. This led me to suspect that

AICDA expression was a confounding variable that is associated with both geographic

origin and tumour EBV status. Indeed, AICDA expression was substantially higher in

endemic or EBVpositive tumours. Given that AICDA was having a strong effect on the

mutational landscape of BL, I employed an approach similar to that used for mutational

signatures to understand the source of variation in expression. Strikingly, most of the

variation in AICDA expression was explained by tumour EBV status, and geographic

origin accounted for the little variation that remained. This finding establishes a strong

82

association between the presence of EBV and increased AICDA expression, and

consequently an elevation in mutation burden.

4.2 Noncoding mutation peaks

The result of deregulated AICDA activity, or aSHM, was readily observable in the

noncoding space. The BL genomes exhibited mutation patterns previously attributed to

focal enrichment of aSHM activity that have been documented in other B cell lymphomas.

The identification of noncoding mutation “peaks” was done solely based on mutation

density without any prior knowledge of gene annotations. Yet, among the most commonly

mutated peaks, the majority were either located in one of the three IG loci or near the TSS

of a gene. Corroborating the implication of AICDA, most genes affected by TSSproximal

peaks were known targets of aSHM in DLBCL (e.g. BACH2, _TCL1A__); the number of

mutated peaks per patient correlated with AICDA expression; most of the peaks were

almost exclusively mutated in EBVpositive tumours; and the mutations tended to occur in

the AICDA recognition motif.176 Although the bulk of these mutations are likely

passengers, the local enrichment of AICDAmediated mutations within some of these

peaks may also have functional consequences that benefit the tumours.

The differentiation of passenger and driver mutations is challenging, especially in the

noncoding setting. Among the putative targets of aSHM, I highlighted two potentially

relevant examples of recurrently mutated regulatory elements, namely the PAX5 enhancer

and the PVT1 promoter. Considering the role of PAX5 in Bcell development, future work

will need to clarify whether the mutations affecting the enhancer exert the same effect as

those seen in chronic lymphocytic leukemia.179 As for the PVT1 promoter, there is recent

evidence that this regulatory element acts as a tumoursuppressor by insulating intragenic

enhancers from inducing MYC expression.259 The same study also demonstrated that

PVT1 promoter mutations could enhance cancer cell growth, albeit in a distinct cell type,

namely breast cancer cells (Figure 4.1). The mutations I have observed in BL alter a

different TSS of PVT1 than the one studied previously. Furthermore, it is unclear whether

the effect on MYC expression will be similar given that the gene is already constitutively

activated by the translocated IG enhancer in BL. Considering the relative ease of

introducing point mutations compared to producing specific genomic rearrangements, it is

83

conceivable that these PVT1 promoter mutations are introduced by EBVinduced AICDA

prior to the IGMYC translocation as a temporary means of promoting growth (Figure 4.2).

In this case, they are expected to remain as a record of a previous driver from an early

progenitor of the malignant clone that ultimately acquired a MYC translocation. I could not

readily test this hypothesis from the bulk sequencing data I had access to in this thesis

given the difficulty of determining mutation timing, especially structural variations. More

precise methods of determining the presence or absence of these mutations at the

singlecell level could shed light on the chronology of BL progression.

Figure 4.1: Putative mechanism of MYC activation mediated by PVT1 promoter mutations.259Figure created with BioRender.com.

4.3 Nonsynonymous mutations

Despite bearing a greater mutation burden, EBVpositive BL genomes have fewer

putative driver mutations affecting BLGs. Together, these two features may account for

the younger age of onset in EBVpositive (or endemic) cases. More specifically, I found a

relative paucity of nonsynonymous mutations in SMARCA4 and CCND3 among

EBVpositive or endemic cases, which has been reported previously.161,163 In other

84

Figure 4.2: Potential role for PVT1 promoter mutations in BL pathogenesis. Figure created withBioRender.com.

words, the CDK4/6 inhibitor palbociclib would be predicted to be more effective in

EBVnegative or sporadic BL.94 However, these differences are not as striking as the

disparity in the prevalence of mutations affecting genes with roles in apoptosis, namely

TP53, USP7, and CDKN2A. A similar but less pronounced difference exists for TP53

when it is considered alone. Importantly, these differences relating to apoptosis and TP53

are strictly associated with tumour EBV status and not geographic origin. This novel

observation was aided by my discovery of USP7 as a recurrently mutated gene in BL.

This gene encodes a deubiquitinase that counteracts MDM2mediated ubiquitination and

degradation of TP53 (Figure 4.3).260 Despite its status as an essential gene in one study,

USP7 has the mutational pattern of a tumoursuppressor in BL.261

The relevance of USP7 is underscored by its known interaction with the protein encoded

by EBNA1, the only consistently expressed EBV protein in BL.49,262 EBNA1 can disrupt

the interaction between TP53 and USP7, which is predicted to have an effect similar to

nonsynonymous variants, namely the loss of TP53 (Figure 4.3).263 These data suggest

85

that EBV may present an alternative mechanism for disrupting apotosis in BL in addition

to somatic mutations. Functional experiments would be required to investigate the

interaction between EBNA1 and USP7 in vivo. Preliminary support for this model exists

based on in vitro experiments that have demonstrated that MDM2 is essential for survival

in lymphoblastoid cell lines transformed by EBV.264,265 Although this hypothetical function

for EBNA1 is compelling, I cannot exclude the potential role of other EBV latency or lytic

genes, which may only be transiently expressed such that their expression is not

detectable using bulk RNAseq. Regardless of the mechanism, the lack of mutations

affecting apoptosis in EBVpositive tumours is consistent with EBVmediated suppression

of apoptosis in BL cells, which is predicted to alleviate the selective pressure for acquiring

mutations affecting genes involved in this process.

Figure 4.3: Potential role for USP7 mutations and/or EBVencoded EBNA1 in abrogatingapoptosis by enhancing MDM2mediated degradation of TP53. Figure created withBioRender.com.

This work also extends the emerging theme of chromatin modifiers as recurrently mutated

in Bcell nonHodgkin lymphomas including BL.96,266 This includes two genes that were

86

associated with BL for the first time, namely SIN3A and CHD8. SIN3A encodes a

transcriptional repressor that acts through histone deacetylase complexes.267 Its ability to

repress MYC target genes is clearly relevant to BL and consistent with the propensity of

mutations in BL predicted to truncate and thus deactivate the protein (Figure 4.4).267 The

loss of SIN3Amediated repression of MYC targets is expected to further promote the

fitness of BL cells. The protein encoded by CHD8 can also act as a repressor of

transcription through chromatin regulation, but unlike SIN3A, it achieves this via the

recruitment of histone H1 (Figure 4.5).268 The specific targets of H1 recruitment remains

unclear and thus the contribution of CHD8 to BL pathogenesis warrants further

investigation.

Figure 4.4: Putative mechanism for SIN3A in repressing the expression of MYC target genes.267Figure created with BioRender.com.

Perhaps the most compelling mutation pattern exemplifying the importance of chromatin

structure in BL biology is the recurrence of mutations affecting members of the SWI/SNF

complex. Similar observations have been made in other cancer types, including other

germinal centre Bcell lymphomas.269,270 In paediatric BL, they represent the most

commonly mutated group of genes other than MYC with a mutation incidence of 59%.

87

Figure 4.5: Putative mechanism for CHD8 in repressing gene expression by recruiting histone H1and thereby condensing chromatin.268 Figure created with BioRender.com.

This nucleosome remodelling pathway also exhibits mutually exclusive mutations,

confirming a functional redundancy between variants affecting ARID1A and SMARCA4. In

spite of this functional redundancy, there is a strong contrast between the types of

mutations affecting each gene. Most mutations in ARID1A are predicted to truncate the

protein, consistent with a tumour suppressor role, whereas SMARCA4 is mainly disrupted

by missense variants. Generally speaking, a lack of truncating mutations in favour of

missense mutations is suggestive of an oncogene, especially when the variants are

constrained to certain regions of the protein. Indeed, all missense mutations in SMARCA4

form two visible clusters affecting residues 773–974 (size 202) and 1155–1243 (size 89),

which can be seen in Appendix B (Ensembl transcript ENST00000429416; 1647 residues

in total). That being said, the SWI/SNF complex is described as a tumoursuppressor in

most cancers, the exception thus far being synovial sarcoma.271 Despite these conflicting

observations regarding the role of SMARCA4 in BL pathogenesis, it is clear that the

missense mutations in this gene have a more nuanced effect on the encoded protein than

a simple gene knockout.

88

Despite their high prevalence, the functional consequence of these mutations has not

been explored in the context of paediatric BL. The challenge of studying the SWI/SNF

complex largely stems from its ability to have both positive and negative effects on gene

expression, which appear dependent on the subunit composition. Notably, in murine

preosteoblast cells, ARID1Acontaining SWI/SNF complexes were found to repress MYC

expression, which could account for the high prevalence of mutations deactivating

ARID1A in BL.269,272 In the same model system, MYC transcription was also dependent

on ARID1Bcontaining SWI/SNF complexes, suggesting that the complex may remain

important in BL as long as ARID1A is excluded as a subunit. This observation could

explain the mutation pattern seen in SMARCA4, namely the lack of truncating mutations,

since the encoded protein is a key component of the SWI/SNF complex. Mutations in one

of the two clusters in SMARCA4 may disrupt the tertiary or quaternary structure of the

complex, potentially by altering proteinprotein interfaces. All that being said, without data

from more relevant cell lines, these potential mechanisms for mutations affecting the

SWI/SNF complex in BL remain hypotheses that need to be tested in future

experiments.

Given that the SWI/SNF complex is known to regulate nucleosome remodelling, one

possible approach to elucidate the effect of mutations disrupting this complex would be to

assess open chromatin. Notably, the assay for transposaseaccessible chromatin using

sequencing (ATACseq) seems an appropriate methodology to apply to BL samples.273 A

challenge with this method is the difficulty of application to clinical samples such as FF

tissue, although recent developments are overcoming this limitation.274 While many of

these chromatin modifiers appear to be tumoursuppressors, improving our understanding

of their role in BL pathogenesis may still reveal therapeutical opportunities that could be

exploited, such as synthetic lethality.275 In fact, short hairpin RNA (shRNA) screens have

identified promising candidate genes whose knockdowns are synthetic lethal when

combined with mutated components of the SWI/SNF complex.271 For example,

SMARCA4mutant cancer cells were highly sensitive to shRNAmediated depletion of

SMARCA2.276 Similarly, in another screen of cancer cell lines, mutations in ARID1A were

synthetic lethal in combination with a depletion of ARID1B.277 The dependency of the

tumour on other paralogs when one is mutated suggest that they occupy the same

89

position in the complex.271 However, while these paralogs may be “structurally

redundant”, developmental data indicate that they are not necessarily functionally

redundant. For instance, germline mutations in ARID1B are associated with

developmental disorders, demonstrating that it is not functionally redundant with

ARID1A.278,279 Hence, while these screens have identified therapeutical opportunities for

a large portion of BL tumours, additional work will be required to minimize any toxicity

related to the essential role played by these genes and their encoded proteins.

Despite these discoveries, much work remains to be done to fully understand the effect of

nonsynonymous driver mutations in BL pathogenesis. Notably, the role of several BLGs

remains unknown, including the most commonly mutated gene in BL, DDX3X. Most BLGs

appear to be tumour suppressor genes by virtue of their mutation pattern, which may limit

the potential utility of knowing their function from a therapeutical standpoint. This work has

also focused exclusively on somatic mutations and did not consider the possibility of

germline variants due to the difficulty of assessing their pathogenicity, especially in African

populations where there remains insufficient data representing the natural genetic

variation in this population.280

4.4 Bcell receptor repertoire

Another genomic feature unique to Bcell malignancies is the somatic rearrangement and

mutation of the three IG regions for the generation of the heavy and light chains that

together form the BCR and secreted antibodies. Previously, I described SHM affecting all

three IG regions, an expected physiologic consequence of B cells that have transited

through the germinal centre. In BL, I observed a greater mutation burden of the IG loci in

EBVpositive tumours, which has been reported previously.139 Although this study

ascribed this difference to distinct cells of origin, my data suggests that it can be primarily

explained by variation in AICDA expression. I also determined the V, D, and J gene

segments that were recombined to generate the expressed IG heavy and light chain

alleles. In particular, I explored V gene usage among the clonal rearrangements for each

tumour with the hypothesis that some V gene segments may be selected more than

others for providing a selective advantage to the tumour. It is worth noting that this

analysis is limited by the use of RNAseq data rather than a more conventional targeted

90

DNA sequencing approach such as adaptive immunity receptor repertoire sequencing

(AIRRseq). Nonetheless, the high BCR expression in BL tumours allowed an exploratory

analysis of V gene usage.

My findings supported my hypothesis that some V genes were overrepresented among

the clonal IG rearrangements. This complements existing data demonstrating the

importance of BCR signaling in BL, thus supporting the clinical use of inhibitors for PI3K,

Syk and Src family kinases.94 Of the commonly used heavy chain V genes, IGHV434 is

the best characterized with an established role in autoreactivity.281,282 This potentially

reveals an alternative or complementary approach for sustaining BCR activation in BLs, in

addition to genetic alterations that increase BCR expression via TCF3 or ID3 mutations.94

Previous reports have suggested a possible role for superantigens in BL.283–285

Interestingly, the most commonly observed clonal light chain V gene was IGKV320.

Preferential IGKV320 usage has been observed in other Bcell nonHodgkin lymphomas,

especially in those linked to hepatitis C virus (HCV) infection.286 To my knowledge, this is

the first time that biased usage of IGKV320 is described in BL, which features one of the

highest frequencies of IGKV320 usage among HCVnegative Bcell malignancies. If this

preliminary observation is confirmed in a larger study, BL patients could benefit from

emerging BCRdirected vaccines that target IGKV320 peptides.286

4.5 Epstein–Barr virus

Since the initial observation of EBV in the tumour cells of BL patients 55 years ago, the

effect of the virus on B cells has been the focus of many studies.18 Its ability to

immortalize B cells in vitro is certainly indicative of a role for EBV in BL pathogenesis, and

yet its functional role remains elusive to this day.287 The lack of progress in this area can

be partly attributed to the challenge of reliably modelling EBVpositive BL in an

experimental setting.135 The difficulty stems from the fact that EBV adopts different gene

expression programs depending on the context, especially in response to the immune

system.111 Generally speaking, the greater the immune surveillance, the fewer genes

EBV will express in order to avoid detection. For this reason, studying the behaviour of

EBV in cell lines—even those derived from BL patients—cannot be readily generalized to

infer its behaviour in lymphomagenesis. The application of highthroughput sequencing to

91

clinical BL samples aims at overcoming this challenge by studying the differences in

tumour biology between EBVpositive and EBVnegative samples.

One of the major findings presented in this thesis is a compelling association between

EBV and AICDA activity. A link between the two has long been hypothesized but with a

paucity of evidence from in vivo studies.288 The present work addresses this lack of data

by showing increased AICDA expression in EBVpositive BL and concomitant aSHM.

While these data are unable to distinguish between correlation and causation, they are

consistent with in vitro experiments that have demonstrated a causative link.140,141 This

relationship between EBV and AICDA is important given that aSHM is thought to promote

the doublestrand breaks that lead to the hallmark IGMYC translocation.289–293 Also, I

and others have found that this process introduces mutations in BLassociated genes

such as ID3.97 It is worth noting that other studies have demonstrated increases in AICDA

expression due to malaria infection.148,149,294 This may explain the weak albeit significant

association between AICDA expression and geographic origin in the linear regression

described earlier. If this is the case, these data suggest that either EBV has a stronger

influence on the transcriptional regulation of AICDA than malaria or its effect on AICDA

may be longerlasting than that of malaria. By mediating this effect on AICDA, EBV and

potentially malaria promote the accumulation of potential driver mutations in BL.

Another key finding is the depletion of mutations altering genes with roles in apoptosis in

EBVpositive tumours. The lack of difference based on geographic origin strengthens the

evidence that EBV disrupts apoptosis, which is not a new idea.288 If my earlier proposed

mechanism that EBNA1 interacts with USP7 to cause TP53 degradation is validated, this

would point to MDM2 inhibitors as a valid treatment approach in TP53–wildtype patients

with either EBV infection or USP7 mutations. That being said, other studies have

suggested alternative mechanisms based on in vitro work. For instance, the apoptosis

regulator CASP3 can be targeted by EBV miRNAs to abrogate the pathway.133,295–299

The mechanistic details for the effect in BL must be elucidated in future functional

experiments in order to pave the way for the development of therapies targeting EBV.

Accordingly, the fact that MYCtranslocated cells undergo apoptosis implies that the B

cells that initiate EBVpositive BL tumours are virally infected before the IGMYC

rearrangement and thereby protected from a fate of MYCmediated apoptosis.69 This

92

model can be unified with the fact that EBV induces AICDA expression in these cells,

increasing their risk of acquiring doublestrand breaks and promoting the formation of this

fundamental translocation. In contrast, EBVnegative tumours follow a similar

progression, but they acquire mutations necessary to disrupt apoptosis as early events

prior to the MYC translocation rather than relying on EBV.

It is worth acknowledging that roughly 30% of EBVpositive tumours also have mutations

affecting apoptosis. It remains an open question whether these mutations came before or

after EBV infection since my bulk sequencing data cannot accurately resolve mutation

timing. Furthermore, given that the viral genome is maintained as an episome in tumour

cells and can be spontaneously lost during cell division, I expect EBV to be depleted from

the tumour cell population unless the virus provides a competitive advantage (Figure

4.6).116,146,300,301 In fact, the immunogenicity of EBV may accelerate this depletion by

exerting a selective pressure against EBVpositive cells in favour of cells that can survive

without EBV.302 In other words, if the oncogenic role of the virus is restricted to abrogating

apoptosis, BL tumours should become EBVindependent following the acquisition of

mutations affecting apoptosis. Given the highly proliferative nature of BL, I would expect a

rapid transition between the EBVpositive and EBVnegative subclones, which may have

been witnessed in at least one case.303 Accordingly, the existence of EBVpositive

tumours that also bear mutations affecting apoptosis suggests that additional oncogenic

roles are played by EBV in BL pathogenesis.

The clear genetic and molecular distinctions between EBVpositive and EBVnegative BL

identified in this thesis reveal a multifaceted role for the virus in Burkitt lymphomagenesis

and shed new light on mechanisms behind EBV carcinogenicity (Figure 4.7). Based on

my results, it may be more accurate to describe BL tumours as EBVdependent or

EBVindependent. Importantly, tumour EBV status appears to be a more clinically relevant

criterion for BL classification given the pathogenic differences and associated implications

for treatment. This reliance on EBV gene expression represents a potential vulnerability

and nominates EBV as a therapeutic target. These data motivate the development of

methods for targeting EBV, including EBV vaccines, smallmolecule inhibitors, or drugs

that trigger lytic gene expression to elicit an immune response.304–306

93

Figure 4.6: Expected outcome from spontaneous loss of EBV during cell division depending onthe role played by the virus. Figure created with BioRender.com.

4.6 Hitandrun hypothesis

The idea of a transient reliance on EBV until somatic mutations are in place to provide the

same oncogenic benefits has been proposed as the “hitandrun” mechanism.302

According to this hypothesis, some (or all) EBVnegative tumours were originally

EBVpositive. In BL, this theory has some support from work that demonstrated the

presence of subclonal EBV “traces” in what would be considered EBVnegative tumours

using standard diagnostic tests.307 Based on the data in this thesis, the acquisition of

mutations disrupting apoptosis appears insufficient to enable the transition to EBV

independence. Notably, the EBV genome copy number is not relatively lower in tumours

with these mutations, which would be expected if the tumours were undergoing the

transition at the time of biopsy (data not shown). A potential limitation is that insufficient

time has elapsed since the acquisition of these mutations. That being said, I do not

observe a difference in the VAF of SSMs affecting TP53 or USP7 based on tumour EBV

status. In other words, the mutations have had enough time to become clonal, and despite

this, EBV was not lost to an appreciable degree. These data suggest that EBV confers a

94

Figure 4.7: Putative model for BL pathogenesis. On their own, MYC translocations are expectedto trigger apoptosis. Alternatively, if mutations disrupt apoptosis (e.g. TP53 mutations) before theMYC translocation, this can give rise to an EBVnegative BL precursor cell. My data show thatEBV can act in place of mutations affecting apoptosis. Furthermore, the observed increase inAICDA activity associated with EBV infection is expected to promote the formation of MYCtranslocations. Altogether, this can give rise to an EBVnegative BL precursor cell. The existenceof EBVpositive tumours with mutations affecting apoptosis indicates other roles played by EBV.The possibility of a hitandrun mechanism, whereby BL cells acquire mutations that obviate theneed for EBV and subsequently lose EBV from the cell population, remains an open question. *,other genetic lesions can disrupt apoptosis. Figure created with BioRender.com.

growth advantage that goes beyond abrogating apoptosis and inducing AICDAmediated

mutagenesis. For example, EBV may be regulating other important pathways such as the

BCRPI3KAKT signalling axis via miRNAmediated repression of PTEN.308

Since the hitandrun hypothesis has been proposed, it was recognized that devising a

strategy to demonstrate the former presence—and ideally, implication—of EBV in an

EBVnegative tumour was going to be challenging.302 This question could be resolved by

tracking the evolution of the tumour during the transition to EBV independence. The

experimental design adopted in this study is not amenable for this approach because bulk

sequencing prevents the assignment of mutations to EBVpositive or EBVnegative

95

subclones. However, newer technologies, such as singlecell sequencing, might offer a

means to overcome this limitation. For example, the use of singlecell RNAseq could

reveal heterogeneous EBV gene expression that would not be observable using bulk

sequencing. It is conceivable that a small subset of EBVinfected BL cells express

oncogenic EBV proteins other than EBNA1 to promote tumour growth, potentially by

transiently inducing cell cycle progression or modulating the microenvironment. This

pattern could easily be missed using bulk RNAseq, especially given the high expression

of some cellular genes including MYC. Critically, singlecell DNA sequencing could

provide key insight into the chronology of BL progression. This approach could detect a

minor EBVpositive clone in an otherwise EBVnegative tumour and allow the genetic

comparison of these subclones. Any acquired molecular alterations could reveal the steps

required for BL to evolve beyond its reliance on EBV and minimize detection by the

immune system. Although clearly beyond the scope of this thesis, the resolution of

whether (and how) EBV participates in hitandrun oncogenesis remains an open and

enticing question in this field and may be resolved with emerging genomic

technologies.

96

Bibliography1. Grande BM, Gerhard DS, Jiang A, Griner NB, Abramson JS, Alexander TB, et al.

Genomewide discovery of somatic coding and noncoding mutations in pediatricendemic and sporadic Burkitt lymphoma. Blood. 2019Jan;blood–2018–09–871418.

2. Rowe M, Kelly GL, Bell AI, Rickinson AB. Burkitt’s lymphoma: the Rosetta Stonedeciphering EpsteinBarr virus biology. Semin Cancer Biol. 2009Dec;19(6):377–88.

3. Poirel HA, Ambrosio MR, Piccaluga PP, Leoncini L. Pathology and MolecularPathogenesis of Burkitt Lymphoma and Lymphoblastic Lymphoma. In: Lenz G,Salles G, editors. Agressive Lymphomas. Cham: Springer InternationalPublishing; 2019. pp. 75–94.

4. Burkitt D. A sarcoma involving the jaws in African children. Br J Surg. 1958Nov;46(197):218–23.

5. O’Conor GT, Davies JNP. Malignant tumors in African children: With special referenceto malignant lymphoma. J Pediatr. 1960 Apr;56(4):526–35.

6. Burkitt D, O’Conor GT. Malignant lymphoma in African children. I. A clinical syndrome.Cancer. 1961 Mar;14(2):258–69.

7. Orem J, Mbidde EK, Lambert B, Sanjose S de, Weiderpass E. Burkitt’s lymphoma inAfrica, a review of the epidemiology and etiology. Afr Health Sci. 2007Sep;7(3):166–75.

8. Stefan DC, Lutchman R. Burkitt lymphoma: epidemiological features and survival in aSouth African centre. Infect Agent Cancer. 2014 Jun;9:19.

9. Pannone G, Zamparese R, Pace M, Pedicillo MC, Cagiano S, Somma P, et al. The roleof EBV in the pathogenesis of Burkitt’s Lymphoma: an Italian hospital basedsurvey. Infect Agent Cancer. 2014 Oct;9(1):34.

10. Seldam REJT, Cooke R, Atkinson L. Childhood lymphoma in the territories of papuaand new guinea. Vol. 19, Cancer. 1966. pp. 437–46.

11. Burkitt D. A ”tumour safari” in East and Central Africa. Br J Cancer. 1962Sep;16:379–86.

12. Burkitt D. Determining the climatic limitations of a children’s cancer common in Africa.Br Med J. 1962 Oct;2(5311):1019–23.

13. Burkitt D. A Lymphoma Syndrome in African Children. Royal College of Surgeons ofEngland; 1961.

14. Burkitt DP, Davies JNP. Lymphoma syndrome in Uganda and tropical Africa. MedPress. 1961;245:367–9.

97

15. Harris RJ. Aetiology of Central African Lymphomata. Br Med Bull. 1964May;20:149–53.

16. Dalldorf G. Lymphomas of African children with different forms or environmentalinfluences. JAMA. 1962 Sep;181:1026–8.

17. Burkitt DP. Charles S. Mott Award. The discovery of Burkitt’s lymphoma. Vol. 51,Cancer. 1983. pp. 1777–86.

18. Epstein MA, Achong BG, Barr YM. Virus Particles in Cultured Lymphoblasts fromBurkitt’s Lymphoma. Lancet. 1964 Mar;283(7335):702–3.

19. Epstein MA, Achong BG, Pope JH. Virus in cultured lymphoblasts from a New GuineaBurkitt lymphoma. Br Med J. 1967 Apr;2(5547):290–1.

20. Henle G, Henle W, Diehl V. Relation of Burkitt’s tumorassociated herpestype virus toinfectious mononucleosis. Proc Natl Acad Sci U S A. 1968Jan;59(1):94–101.

21. Henle G, Henle W. Immunofluorescence in cells derived from Burkitt’s lymphoma. JBacteriol. 1966 Mar;91(3):1248–56.

22. Levy JA, Henle G. Indirect immunofluorescence tests with sera from African childrenand cultured Burkitt lymphoma cells. J Bacteriol. 1966 Jul;92(1):275–6.

23. Piriou E, Asito AS, Sumba PO, Fiore N, Middeldorp JM, Moormann AM, et al. Earlyage at time of primary EpsteinBarr virus infection results in poorly controlled viralinfection in infants from Western Kenya: clues to the etiology of endemic Burkittlymphoma. J Infect Dis. 2012 Mar;205(6):906–13.

24. deThé G, Geser A, Day NE, Tukei PM, Williams EH, Beri DP, et al. Epidemiologicalevidence for causal relationship between EpsteinBarr virus and Burkitt’slymphoma from Ugandan prospective study. Nature. 1978Aug;274(5673):756–61.

25. Burkitt DP. Etiology of Burkitt’s Lymphoma—an Alternative Hypothesis to a VectoredVirus. J Natl Cancer Inst. 1969 Jan;42(1):19–28.

26. Morrow RH, Kisuule A, Pike MC, Smith PG. Burkitt’s Lymphoma in the MengoDistricts of Uganda: Epidemiologic Features and Their Relationship to Malaria. JNatl Cancer Inst. 1976 Mar;56(3):479–83.

27. Williams AO. Haemoglobin genotypes, ABO blood groups, and Burkitt’s tumour. JMed Genet. 1966 Sep;3(3):177–9.

28. Pike MC, Morrow RH, Kisuule A, Mafigiri J. Burkitt’s lymphoma and sickle cell trait. BrJ Prev Soc Med. 1970 Feb;24(1):39–41.

29. Moormann AM, Snider CJ, Chelimo K. The company malaria keeps: how coinfectionwith EpsteinBarr virus leads to endemic Burkitt lymphoma. Curr Opin Infect Dis.2011 Oct;24(5):435–41.

98

30. Emmanuel B, Kawira E, Ogwang MD, Wabinga H, Magatti J, Nkrumah F, et al. AfricanBurkitt lymphoma: agespecific risk and correlations with malaria biomarkers. AmJ Trop Med Hyg. 2011 Mar;84(3):397–401.

31. Burkitt D, Wright D. Geographical and tribal distribution of the African lymphoma inUganda. Br Med J. 1966 Mar;1(5487):569–73.

32. O’conor GT. Malignant lymphoma in African children. II. A pathological entity. Cancer.1961 Mar;14(2):270–83.

33. O’conor GT, Rappaport H, Smith EB. Childhood Lymphoma Resembling ”BurkittTumor” In the United States. Cancer. 1965 Apr;18:411–7.

34. Doll DC, List AF. Burkitt’s lymphoma in a homosexual. Lancet. 1982May;1(8279):1026–7.

35. Ziegler JL, Drew WL, Miner RC, Mintz L, Rosenbaum E, Gershow J, et al. Outbreak ofBurkitt’slike lymphoma in homosexual men. Lancet. 1982Sep;2(8299):631–3.

36. WhangPeng J, Lee EC, Sieverts H, Magrath IT. Burkitt’s lymphoma in AIDS:cytogenetic study. Blood. 1984 Apr;63(4):818–22.

37. Gong JZ, Stenzel TT, Bennett ER, Lagoo AS, Dunphy CH, Moore JO, et al. BurkittLymphoma Arising in Organ Transplant Recipients: A Clinicopathologic Study ofFive Cases. Am J Surg Pathol. 2003 Jun;27(6):818–27.

38. Robertson ES, editor. Burkitt’s Lymphoma. Springer, New York, NY; 2013.

39. Swerdlow SH, Campo E, Harris NL, Jaffe ES, Pileri SA, Stein H, et al., editors. WHOClassification of Tumours of Haematopoietic and Lymphoid Tissues. Revised 4thedition. Lyon, France: International Agency for Research on Cancer; 2017. (WHOclassification of tumours; vol. 2).

40. Dave SS, Fu K, Wright GW, Lam LT, Kluin P, Boerma EJ, et al. Molecular Diagnosisof Burkitt’s Lymphoma. N Engl J Med. 2006 Jun;354(23):2431–42.

41. Swerdlow SH, Campo E, Pileri SA, Harris NL, Stein H, Siebert R, et al. The 2016revision of the World Health Organization classification of lymphoid neoplasms.Blood. 2016 May;127(20):2375–90.

42. Magrath I, Adde M, Shad A, Venzon D, Seibel N, Gootenberg J, et al. Adults andchildren with small noncleavedcell lymphoma have a similar excellent outcomewhen treated with the same chemotherapy regimen. J Clin Oncol. 1996Mar;14(3):925–34.

43. Adde M, Shad A, Venzon D, Arndt C, Gootenberg J, Neely J, et al. Additionalchemotherapy agents improve treatment outcome for children and adults withadvanced Bcell lymphomas. Semin Oncol. 1998 Apr;25(2 Suppl 4):33–9;discussion 45–8.

44. Patte C, Auperin A, Michon J, Behrendt H, Leverger G, Frappaz D, et al. The SociétéFrançaise d’Oncologie Pédiatrique LMB89 protocol: highly effective multiagent

99

chemotherapy tailored to the tumor burden and initial response in 561 unselectedchildren with Bcell lymphomas and L3 leukemia. Blood. 2001Jun;97(11):3370–9.

45. Costa LJ, Xavier AC, Wahlquist AE, Hill EG. Trends in survival of patients with Burkittlymphoma/leukemia in the USA: an analysis of 3691 cases. Blood. 2013Jun;121(24):4861–6.

46. Magrath IT. Treatment of Burkitt lymphoma in children and adults: Lessons fromAfrica. Curr Hematol Malig Rep. 2006 Dec;1(4):230–40.

47. Molyneux EM, Rochford R, Griffin B, Newton R, Jackson G, Menon G, et al. Burkitt’slymphoma. Lancet. 2012 Mar;379(9822):1234–44.

48. Buckle G, Maranda L, Skiles J, Ong’echa JM, Foley J, Epstein M, et al. Factorsinfluencing survival among Kenyan children diagnosed with endemic Burkittlymphoma between 2003 and 2011: A historical cohort study. Int J Cancer. 2016Sep;139(6):1231–40.

49. Magrath I. Epidemiology: clues to the pathogenesis of Burkitt lymphoma. Br JHaematol. 2012 Mar;156(6):744–56.

50. Mbulaiteye SM, Talisuna AO, Ogwang MD, McKenzie FE, Ziegler JL, Parkin DM.African Burkitt’s lymphoma: could collaboration with HIV1 and malariaprogrammes reduce the high mortality rate? Lancet. 2010May;375(9726):1661–3.

51. JokoFru WY, Parkin DM, Borok M, Chokunonga E, Korir A, Nambooze S, et al.Survival from Childhood Cancers in Eastern Africa: A Populationbased registrystudy. Int J Cancer. 2018 Jul;

52. Harif M, Barsaoui S, Benchekroun S, Bouhas R, Doumbé P, Khattab M, et al.Treatment of Bcell lymphoma with LMB modified protocols in Africa–report of theFrenchAfrican Pediatric Oncology Group (GFAOP). Pediatr Blood Cancer. 2008Jun;50(6):1138–42.

53. Ngoma T, Adde M, Durosinmi M, Githang’a J, Aken’Ova Y, Kaijage J, et al. Treatmentof Burkitt lymphoma in equatorial Africa using a simple threedrug combinationfollowed by a salvage regimen for patients with persistent or recurrent disease. BrJ Haematol. 2012 Sep;158(6):749–62.

54. Dunleavy K, Roschewski M, Abramson JS, Link B, Parekh S, Jagadeesh D, et al.RiskAdapted Therapy in Adults with Burkitt Lymphoma: Updated Results of aMulticenter Prospective Phase II Study of DAEPOCHR. Hematol Oncol. 2017Jun;35:133–4.

55. Sweetenham JW, Pearce R, Taghipour G, Blaise D, Gisselbrecht C, Goldstone AH.Adult Burkitt’s and Burkittlike nonHodgkin’s lymphoma–outcome for patientstreated with highdose therapy and autologous stemcell transplantation in firstremission or at relapse: results from the European Group for Blood and MarrowTransplantation. J Clin Oncol. 1996 Sep;14(9):2465–72.

100

56. Jacobson C, LaCasce A. How I treat Burkitt lymphoma in adults. Blood. 2014Nov;124(19):2913–20.

57. Murphy K. Janeway’s immunobiology. 9th edition. New York, NY : GarlandScience/Taylor & Francis Group, LLC; 2017.

58. Klein U, Klein G, EhlinHenriksson B, Rajewsky K, Küppers R. Burkitt’s lymphoma is amalignancy of mature B cells expressing somatically mutated V region genes. MolMed. 1995 Jul;1(5):495–505.

59. Chapman CJ, Mockridge CI, Rowe M, Rickinson AB, Stevenson FK. Analysis of VHgenes used by neoplastic B cells in endemic Burkitt’s lymphoma shows somatichypermutation and intraclonal heterogeneity. Blood. 1995Apr;85(8):2176–81.

60. Tamaru J, Hummel M, Marafioti T, Kalvelage B, Leoncini L, Minacci C, et al. Burkitt’slymphomas express VH genes with a moderate number of antigenselectedsomatic mutations. Am J Pathol. 1995 Nov;147(5):1398–407.

61. Victora GD, DominguezSola D, Holmes AB, Deroubaix S, DallaFavera R,Nussenzweig MC. Identification of human germinal center light and dark zonecells and their relationship to human Bcell lymphomas. Blood. 2012Sep;120(11):2240–8.

62. Pasqualucci L, Neumeister P, Goossens T, Nanjangud G, Chaganti RS, Küppers R, etal. Hypermutation of multiple protooncogenes in Bcell diffuse largecelllymphomas. Nature. 2001 Jul;412(6844):341–6.

63. Peters A, Storb U. Somatic hypermutation of immunoglobulin genes is linked totranscription initiation. Immunity. 1996 Jan;4(1):57–65.

64. Fukita Y, Jacobs H, Rajewsky K. Somatic hypermutation in the heavy chain locuscorrelates with transcription. Immunity. 1998 Jul;9(1):105–14.

65. Pavri R, Gazumyan A, Jankovic M, Di Virgilio M, Klein I, AnsarahSobrinho C, et al.Activationinduced cytidine deaminase targets DNA at sites of RNA polymerase IIstalling by interaction with Spt5. Cell. 2010 Oct;143(1):122–33.

66. Basso K, DallaFavera R. Germinal centres and B cell lymphomagenesis. Nat RevImmunol. 2015 Mar;15(3):172–84.

67. Dang CV, O’Donnell KA, Zeller KI, Nguyen T, Osthus RC, Li F. The cMyc target genenetwork. Semin Cancer Biol. 2006 Aug;16(4):253–64.

68. Meyer N, Penn LZ. Reflecting on 25 years with MYC. Nat Rev Cancer. 2008Dec;8(12):976–90.

69. Evan GI, Wyllie AH, Gilbert CS, Littlewood TD, Land H, Brooks M, et al. Induction ofapoptosis in fibroblasts by cmyc protein. Cell. 1992 Apr;69(1):119–28.

70. Ci W, Polo JM, Cerchietti L, Shaknovich R, Wang L, Yang SN, et al. The BCL6transcriptional program features repression of multiple oncogenes in primary Bcells and is deregulated in DLBCL. Blood. 2009 May;113(22):5536–48.

101

71. DominguezSola D, Victora GD, Ying CY, Phan RT, Saito M, Nussenzweig MC, et al.The protooncogene MYC is required for selection in the germinal center andcyclic reentry. Nat Immunol. 2012 Nov;13(11):1083–91.

72. Manolov G, Manolova Y. Marker band in one chromosome 14 from Burkittlymphomas. Nature. 1972 May;237(5349):33–4.

73. Jarvis JE, Ball G, Rickison AB, Epstein MA. Cytogenetic studies on humanlymphoblastoid cell lines from Burkitt’s lymphomas and other sources. Int JCancer. 1974 Dec;14(6):716–21.

74. Zech L, Haglund U, Nilsson K, Klein G. Characteristic chromosomal abnormalities inbiopsies and lymphoidcell lines from patients with burkitt and nonburkittlymphomas. Int J Cancer. 1976 Jan;17(1):47–56.

75. Taub R, Kirsch I, Morton C, Lenoir G, Swan D, Tronick S, et al. Translocation of thecmyc gene into the immunoglobulin heavy chain locus in human Burkittlymphoma and murine plasmacytoma cells. Proc Natl Acad Sci U S A. 1982Dec;79(24):7837–41.

76. DallaFavera R, Bregni M, Erikson J, Patterson D, Gallo RC, Croce CM. Humancmyc onc gene is located on the region of chromosome 8 that is translocated inBurkitt lymphoma cells. Proc Natl Acad Sci U S A. 1982 Dec;79(24):7824–7.

77. Adams JM, Harris AW, Pinkert CA, Corcoran LM, Alexander WS, Cory S, et al. Thecmyc oncogene driven by immunoglobulin enhancers induces lymphoidmalignancy in transgenic mice. Nature. 1985;318(6046):533–8.

78. Schüler F, Hirt C, Dölken G. Chromosomal translocation t(14;18) in healthyindividuals. Semin Cancer Biol. 2003 Jun;13(3):203–9.

79. Pelicci PG, Knowles DM 2nd, Magrath I, DallaFavera R. Chromosomal breakpointsand structural alterations of the cmyc locus differ in endemic and sporadic formsof Burkitt lymphoma. Proc Natl Acad Sci U S A. 1986 May;83(9):2984–8.

80. Shiramizu B, Barriga F, Neequaye J, Jafri A, DallaFavera R, Neri A, et al. Patterns ofchromosomal breakpoint locations in Burkitt’s lymphoma: relevance to geographyand EpsteinBarr virus association. Blood. 1991 Apr;77(7):1516–26.

81. Kovalchuk AL, AnsarahSobrinho C, Hakim O, Resch W, Tolarová H, Dubois W, et al.Mouse model of endemic Burkitt translocations reveals the longrange boundariesof Igmediated oncogene deregulation. Proc Natl Acad Sci U S A. 2012Jul;109(27):10972–7.

82. Neri A, Barriga F, Knowles DM, Magrath IT, DallaFavera R. Different regions of theimmunoglobulin heavychain locus are involved in chromosomal translocations indistinct pathogenetic forms of Burkitt lymphoma. Proc Natl Acad Sci U S A. 1988Apr;85(8):2748–52.

83. Basso K, Frascella E, Zanesco L, Rosolen A. Improved longdistance polymerasechain reaction for the detection of t(8;14)(q24;q32) in Burkitt’s lymphomas. Am JPathol. 1999 Nov;155(5):1479–85.

102

84. Burmeister T, Schwartz S, Horst HA, Rieder H, Gökbuget N, Hoelzer D, et al.Molecular heterogeneity of sporadic adult Burkitttype leukemia/lymphoma asrevealed by PCR and cytogenetics: correlation with morphology, immunology andclinical features. Leukemia. 2005 Aug;19(8):1391–8.

85. Busch K, Keller T, Fuchs U, Yeh RF, Harbott J, Klose I, et al. Identification of twodistinct MYC breakpoint clusters and their association with various IGH breakpointregions in the t(8;14) translocations in sporadic Burkittlymphoma. Leukemia.2007 Aug;21(8):1739–51.

86. Burmeister T, Molkentin M, Schwartz S, Gökbuget N, Hoelzer D, Thiel E, et al.Erroneous class switching and false VDJ recombination: molecular dissection oft(8;14)/MYCIGH translocations in Burkitttype lymphoblastic leukemia/Bcelllymphoma. Mol Oncol. 2013 Aug;7(4):850–8.

87. Robbiani DF, Bothmer A, Callen E, ReinaSanMartin B, Dorsett Y, Difilippantonio S,et al. AID is required for the chromosomal breaks in cmyc that lead to cmyc/IgHtranslocations. Cell. 2008 Dec;135(6):1028–38.

88. Magrath I. The Pathogenesis of Burkitt’s Lymphoma. In: Vande Woude GF, Klein G,editors. Advances in Cancer Research. Academic Press; 1990. pp.133–270.

89. Pasqualucci L. Molecular pathogenesis of germinal centerderived B cell lymphomas.Immunol Rev. 2019 Mar;288(1):240–61.

90. Gaidano G, Ballerini P, Gong JZ, Inghirami G, Neri A, Newcomb EW, et al. p53mutations in human lymphoid malignancies: association with Burkitt lymphomaand chronic lymphocytic leukemia. Proc Natl Acad Sci U S A. 1991Jun;88(12):5413–7.

91. Eischen CM, Weber JD, Roussel MF, Sherr CJ, Cleveland JL. Disruption of theARF–Mdm2–p53 tumor suppressor pathway in Mycinduced lymphomagenesis.Genes Dev. 1999 Oct;13(20):2658–69.

92. Schmitt CA, McCurrach ME, Stanchina E de, WallaceBrodeur RR, Lowe SW.INK4a/ARF mutations accelerate lymphomagenesis and promotechemoresistance by disabling p53. Genes Dev. 1999 Oct;13(20):2670–7.

93. Lindstrom MS, Klangby U, Wiman KG. p14ARF homozygous deletion or MDM2overexpression in Burkitt lymphoma lines carrying wild type p53. Oncogene. 2001Apr;20(17):2171–7.

94. Schmitz R, Young RM, Ceribelli M, Jhavar S, Xiao W, Zhang M, et al. Burkittlymphoma pathogenesis and therapeutic targets from structural and functionalgenomics. Nature. 2012 Oct;490(7418):116–20.

95. GiulinoRoth L, Wang K, MacDonald TY, Mathew S, Tam Y, Cronin MT, et al. Targetedgenomic sequencing of pediatric Burkitt lymphoma identifies recurrent alterationsin antiapoptotic and chromatinremodeling genes. Blood. 2012Dec;120(26):5181–4.

103

96. Love C, Sun Z, Jima D, Li G, Zhang J, Miles R, et al. The genetic landscape ofmutations in Burkitt lymphoma. Nat Genet. 2012 Dec;44(12):1321–5.

97. Richter J, Schlesner M, Hoffmann S, Kreuz M, Leich E, Burkhardt B, et al. Recurrentmutation of the ID3 gene in Burkitt lymphoma identified by integrated genome,exome and transcriptome sequencing. Nat Genet. 2012Dec;44(12):1316–20.

98. Schmitz R, Ceribelli M, Pittaluga S, Wright GW, Staudt LM. Oncogenic mechanisms inBurkitt lymphoma. Cold Spring Harb Perspect Med. 2014Feb;4(2):a014282–2.

99. DominguezSola D, Kung J, Holmes AB, Wells VA, Mo T, Basso K, et al. The FOXO1Transcription Factor Instructs the Germinal Center Dark Zone Program. Immunity.2015 Dec;43(6):1064–74.

100. Sander S, Chu VT, Yasuda T, Franklin A, Graf R, Calado DP, et al. PI3 Kinase andFOXO1 Transcription Factor Activity Differentially Control B Cells in the GerminalCenter Light and Dark Zones. Immunity. 2015 Dec;43(6):1075–86.

101. Muppidi JR, Schmitz R, Green JA, Xiao W, Larsen AB, Braun SE, et al. Loss ofsignalling via Gα13 in germinal centre Bcellderived lymphoma. Nature. 2014Dec;516(7530):254–8.

102. Lu C, Allis CD. SWI/SNF complex in cancer. Nat Genet. 2017 Jan;49(2):178–9.

103. Ditton HJ, Zimmer J, Kamp C, RajpertDe Meyts E, Vogt PH. The AZFa gene DBY(DDX3Y) is widely transcribed but the protein is limited to the male germ cells bytranslation control. Hum Mol Genet. 2004 Oct;13(19):2333–41.

104. Jiang L, Gu ZH, Yan ZX, Zhao X, Xie YY, Zhang ZG, et al. Exome sequencingidentifies somatic mutations of DDX3X in natural killer/Tcell lymphoma. NatGenet. 2015 Sep;47(9):1061–6.

105. ShannonLowe C, Rickinson A. The Global Landscape of EBVAssociated Tumors.Front Oncol. 2019;9:713.

106. Cohen JI, Fauci AS, Varmus H, Nabel GJ. EpsteinBarr virus: an important vaccinetarget for cancer prevention. Sci Transl Med. 2011 Nov;3(107):107fs7.

107. Young LS, Rickinson AB. EpsteinBarr virus: 40 years on. Nat Rev Cancer. 2004Oct;4(10):757–68.

108. Werner J, Henle G, Pinto CA, Haff RF, Henle W. Establishment of continuouslymphoblast cultures from leukocytes of gibbons (Hylobates lar). Int J Cancer.1972 Nov;10(3):557–67.

109. Baer R, Bankier AT, Biggin MD, Deininger PL, Farrell PJ, Gibson TJ, et al. DNAsequence and expression of the B958 EpsteinBarr virus genome. Nature.1984;310(5974):207–11.

104

110. Rowe M, Rowe DT, Gregory CD, Young LS, Farrell PJ, Rupani H, et al. Differences inB cell growth phenotype reflect novel patterns of EpsteinBarr virus latent geneexpression in Burkitt’s lymphoma cells. EMBO J. 1987 Sep;6(9):2743–51.

111. Price AM, Luftig MA. To be or not IIb: a multistep process for EpsteinBarr viruslatency establishment and consequences for B cell tumorigenesis. PLoS Pathog.2015 Mar;11(3):e1004656.

112. Kelly G, Bell A, Rickinson A. Epstein–Barr virus–associated Burkittlymphomagenesis selects for downregulation of the nuclear antigen EBNA2. NatMed. 2002 Oct;8(10):1098–104.

113. Kieff E, Rickinson AB. In Fields Virology Vol. 2 (eds. Knipe DM & Howley PM)2511–2573. Lippincott Williams & Wilkins; 2001.

114. Humme S, Reisbach G, Feederle R, Delecluse HJ, Bousset K, Hammerschmidt W,et al. The EBV nuclear antigen 1 (EBNA1) enhances B cell immortalization severalthousandfold. Proc Natl Acad Sci U S A. 2003 Sep;100(19):10989–94.

115. Takada K, Horinouchi K, Ono Y, Aya T, Osato T, Takahashi M, et al. An EpsteinBarrvirusproducer line Akata: establishment of the cell line and analysis of viral DNA.Virus Genes. 1991 Apr;5(2):147–56.

116. Shimizu N, TanabeTochikura A, Kuroiwa Y, Takada K. Isolation of EpsteinBarr virus(EBV)negative cell clones from the EBVpositive Burkitt’s lymphoma (BL) lineAkata: malignant phenotypes of BL cells are dependent on EBV. J Virol. 1994Sep;68(9):6069–73.

117. Chodosh J, Holder VP, Gan YJ, Belgaumi A, Sample J, Sixbey JW. Eradication oflatent EpsteinBarr virus by hydroxyurea alters the growthtransformed cellphenotype. J Infect Dis. 1998 May;177(5):1194–201.

118. Komano J, Sugiura M, Takada K. EpsteinBarr virus contributes to the malignantphenotype and to apoptosis resistance in Burkitt’s lymphoma cell line Akata. JVirol. 1998 Nov;72(11):9150–6.

119. Ruf IK, Rhyne PW, Yang H, Borza CM, HuttFletcher LM, Cleveland JL, et al.Epsteinbarr virus regulates cMYC, apoptosis, and tumorigenicity in Burkittlymphoma. Mol Cell Biol. 1999 Mar;19(3):1651–60.

120. Kennedy G, Komano J, Sugden B. EpsteinBarr virus provides a survival factor toBurkitt’s lymphomas. Proc Natl Acad Sci U S A. 2003Nov;100(24):14269–74.

121. Wilson JB, Bell JL, Levine AJ. Expression of EpsteinBarr virus nuclear antigen1induces B cell neoplasia in transgenic mice. EMBO J. 1996Jun;15(12):3117–26.

122. Brady G, Macarthur GJ, Farrell PJ. EpsteinBarr virus and Burkitt lymphoma.Postgrad Med J. 2008 Jul;84(993):372–7.

105

123. Araujo I, Foss HD, Hummel M, Anagnostopoulos I, Barbosa HS, Bittencourt A, et al.Frequent expansion of EpsteinBarr virus (EBV) infected cells in germinal centresof tonsils from an area with a high incidence of EBVassociated lymphoma. JPathol. 1999 Feb;187(3):326–30.

124. Babcock GJ, Hochberg D, ThorleyLawson AD. The expression pattern ofEpsteinBarr virus latent genes in vivo is dependent upon the differentiation stageof the infected B cell. Immunity. 2000 Oct;13(4):497–506.

125. Komano J, Maruo S, Kurozumi K, Oda T, Takada K. Oncogenic role of EpsteinBarrvirusencoded RNAs in Burkitt’s lymphoma cell line Akata. J Virol. 1999Dec;73(12):9827–31.

126. Ruf IK, Rhyne PW, Yang C, Cleveland JL, Sample JT. EpsteinBarr virus small RNAspotentiate tumorigenicity of Burkitt lymphoma cells independently of an effect onapoptosis. J Virol. 2000 Nov;74(21):10223–8.

127. Nanbo A, Inoue K, AdachiTakasawa K, Takada K. EpsteinBarr virus RNA confersresistance to interferonalphainduced apoptosis in Burkitt’s lymphoma. EMBO J.2002 Mar;21(5):954–65.

128. Kitagawa N, Goto M, Kurozumi K, Maruo S, Fukayama M, Naoe T, et al.EpsteinBarr virusencoded poly(A)() RNA supports Burkitt’s lymphoma growththrough interleukin10 induction. EMBO J. 2000 Dec;19(24):6742–50.

129. Ogden CA, Pound JD, Batth BK, Owens S, Johannessen I, Wood K, et al. Enhancedapoptotic cell clearance capacity and B cell survival factor production byIL10activated macrophages: implications for Burkitt’s lymphoma. J Immunol.2005 Mar;174(5):3015–23.

130. Wahlgren M, Abrams JS, Fernandez V, Bejarano MT, Azuma M, Torii M, et al.Adhesion of Plasmodium falciparuminfected erythrocytes to human cells andsecretion of cytokines (IL1beta, IL1RA, IL6, IL8, IL10, TGF beta, TNF alpha,GCSF, GMCSF. Scand J Immunol. 1995 Dec;42(6):626–36.

131. Lyke KE, Burges R, Cissoko Y, Sangare L, Dao M, Diarra I, et al. Serum levels of theproinflammatory cytokines interleukin1 beta (IL1beta), IL6, IL8, IL10, tumornecrosis factor alpha, and IL12(p70) in Malian children with severe Plasmodiumfalciparum malaria and matched uncomplicated malaria or healthy controls. InfectImmun. 2004 Oct;72(10):5630–7.

132. Leucci E, Onnis A, Cocco M, De Falco G, Imperatore F, Giuseppina A, et al. Bcelldifferentiation in EBVpositive Burkitt lymphoma is impaired at posttranscriptionallevel by miRNAaltered expression. Int J Cancer. 2010 Mar;126(6):1316–26.

133. Vereide DT, Seto E, Chiu YF, Hayes M, Tagawa T, Grundhoff A, et al. Epstein–Barrvirus maintains lymphomas via its miRNAs. Oncogene. 2013Mar;33(10):1258–64.

106

134. Piccaluga PP, Navari M, De Falco G, Ambrosio MR, Lazzi S, Fuligni F, et al.Virusencoded microRNA contributes to the molecular profile of EBVpositiveBurkitt lymphomas. Oncotarget. 2016 Jan;7(1):224–40.

135. Bornkamm GW. EpsteinBarr virus and its role in the pathogenesis of Burkitt’slymphoma: an unresolved issue. Semin Cancer Biol. 2009Dec;19(6):351–65.

136. Souza TA, Stollar BD, Sullivan JL, Luzuriaga K, ThorleyLawson DA. Influence ofEBV on the peripheral blood memory B cell compartment. J Immunol. 2007Sep;179(5):3153–60.

137. Gil Y, LevyNabot S, Steinitz M, Laskov R. Somatic mutations and activationinducedcytidine deaminase (AID) expression in established rheumatoid factorproducinglymphoblastoid cell line. Mol Immunol. 2007 Jan;44(4):494–505.

138. Epeldegui M, Hung YP, McQuay A, Ambinder RF, MartınezMaza O. Infection ofhuman B cells with EpsteinBarr virus results in the expression of somatichypermutationinducing molecules and in the accrual of oncogene mutations. MolImmunol. 2007 Feb;44(5):934–42.

139. Bellan C, Lazzi S, Hummel M, Palummo N, Santi M de, Amato T, et al.Immunoglobulin gene analysis reveals 2 distinct cells of origin for EBVpositiveand EBVnegative Burkitt lymphomas. Blood. 2005 Aug;106(3):1031–6.

140. Kim JH, Kim WS, Park C. EpsteinBarr virus latent membrane protein 1 increasesgenomic instability through Egr1mediated upregulation of activationinducedcytidine deaminase in Bcell lymphoma. Leuk Lymphoma. 2013Sep;54(9):2035–40.

141. Kalchschmidt JS, BashfordRogers R, Paschos K, Gillman ACT, Styles CT, Kellam P,et al. EpsteinBarr virus nuclear protein EBNA3C directly induces expression ofAID and somatic mutations in B cells. J Exp Med. 2016 May;213(6):921–8.

142. Kurth J, Hansmann ML, Rajewsky K, Küppers R. EpsteinBarr virusinfected B cellsexpanding in germinal centers of infectious mononucleosis patients do notparticipate in the germinal center reaction. Proc Natl Acad Sci U S A. 2003Apr;100(8):4730–5.

143. Tobollik S, Meyer L, Buettner M, Klemmer S, Kempkes B, Kremmer E, et al.EpsteinBarr virus nuclear antigen 2 inhibits AID expression during EBVdrivenBcell growth. Blood. 2006 Dec;108(12):3859–64.

144. Neri A, Barriga F, Inghirami G, Knowles DM, Neequaye J, Magrath IT, et al.EpsteinBarr virus infection precedes clonal expansion in Burkitt’s and acquiredimmunodeficiency syndromeassociated lymphoma. Blood. 1991Mar;77(5):1092–5.

145. Kirchmaier AL, Sugden B. Plasmid maintenance of derivatives of oriP ofEpsteinBarr virus. J Virol. 1995 Feb;69(2):1280–3.

107

146. Nanbo A, Sugden A, Sugden B. The coupling of synthesis and partitioning of EBV’splasmid replicon is revealed in live cells. EMBO J. 2007Oct;26(19):4252–62.

147. Jerusalem C, Jap P, Eling W. Virus Induced Malignant Lymphome in MiceDependent on a RES “Conditioned” by Chronic Parasitic Infection (P. Berghei). In:Di Luzio NR, Flemming KBP, editors. The Reticuloendothelial System and ImmunePhenomena: Proceedings of the Ludwig Aschoff Memorial Meeting of theReticuloendothelial Society, Freiburg, Germany, August 1970. Boston, MA:Springer US; 1971. pp. 391–9.

148. Torgbor C, Awuah P, Deitsch K, Kalantari P, Duca KA, ThorleyLawson DA. Amultifactorial role for P. falciparum malaria in endemic Burkitt’s lymphomapathogenesis. PLoS Pathog. 2014 May;10(5):e1004170.

149. Wilmore JR, Asito AS, Wei C, Piriou E, Sumba PO, Sanz I, et al. AID expression inperipheral blood of children living in a malaria holoendemic region is associatedwith changes in B cell subsets and EpsteinBarr virus. Int J Cancer. 2015Mar;136(6):1371–80.

150. Bosch CA van den. Is endemic Burkitt’s lymphoma an alliance between threeinfections and a tumour promoter? Lancet Oncol. 2004 Dec;5(12):738–46.

151. Chêne A, Donati D, GuerreiroCacais AO, Levitsky V, Chen Q, Falk KI, et al. Amolecular link between malaria and EpsteinBarr virus reactivation. PLoS Pathog.2007 Jun;3(6):e80.

152. Donati D, Mok B, Chêne A, Xu H, Thangarajh M, Glas R, et al. Increased B cellsurvival and preferential activation of the memory compartment by a malariapolyclonal B cell activator. J Immunol. 2006 Sep;177(5):3035–44.

153. Whittle HC, Brown J, Marsh K, Blackman M, Jobe O, Shenton F. The effects ofPlasmodium falciparum malaria on immune control of B lymphocytes in Gambianchildren. Clin Exp Immunol. 1990 May;80(2):213–8.

154. Whittle HC, Brown J, Marsh K, Greenwood BM, Seidelin P, Tighe H, et al. Tcellcontrol of Epstein–Barr virusinfected B cells is lost during P. falciparum malaria.Nature. 1984 Nov;312(5993):449–50.

155. Moss DJ, Burrows SR, Castelino DJ, Kane RG, Pope JH, Rickinson AB, et al. Acomparison of EpsteinBarr virusspecific Tcell immunity in malariaendemic andnonendemic regions of Papua New Guinea. Int J Cancer. 1983Jun;31(6):727–32.

156. Lam KM, Syed N, Whittle H, Crawford DH. Circulating EpsteinBarr viruscarrying Bcells in acute malaria. Lancet. 1991 Apr;337(8746):876–8.

157. Donati D, Espmark E, Kironde F, Mbidde EK, Kamya M, Lundkvist A, et al.Clearance of circulating EpsteinBarr virus DNA in children with acute malariaafter antimalaria treatment. J Infect Dis. 2006 Apr;193(7):971–7.

108

158. Morrow RH Jr. Epidemiological evidence for the role of falciparum malaria in thepathogenesis of Burkitt’s lymphoma. IARC Sci Publ. 1985;(60):177–86.

159. GiulinoRoth L, Wang K, MacDonald TY, Mathew S, Tam Y, Cronin MT, et al.Targeted genomic sequencing of pediatric Burkitt lymphoma identifies recurrentalterations in antiapoptotic and chromatinremodeling genes. Blood. 2012Dec;120(26):5181–4.

160. Wagener R, Aukema SM, Schlesner M, Haake A, Burkhardt B, Claviez A, et al. ThePCBP1 gene encoding poly(rC) binding protein I is recurrently mutated in Burkittlymphoma. Genes Chromosomes Cancer. 2015 Sep;54(9):555–64.

161. Abate F, Ambrosio MR, Mundo L, Laginestra MA, Fuligni F, Rossi M, et al. DistinctViral and Mutational Spectrum of Endemic Burkitt Lymphoma. PLoS Pathog. 2015Oct;11(10):e1005158.

162. Oduor CI, Kaymaz Y, Chelimo K, Otieno JA, Ong’echa JM, Moormann AM, et al.Integrative microRNA and mRNA deepsequencing expression profiling inendemic Burkitt lymphoma. Vol. 17, BMC Cancer. 2017.

163. Kaymaz Y, Oduor CI, Yu H, Otieno JA, Ong’echa JM, Moormann AM, et al.Comprehensive Transcriptome and Mutational Profiling of Endemic BurkittLymphoma Reveals EBV Type–Specific Differences. Mol Cancer Res. 2017May;15(5):563–76.

164. Bouska A, Bi C, Lone W, Zhang W, Kedwaii A, Heavican T, et al. Adult highgradeBcell lymphoma with Burkitt lymphoma signature: genomic features and potentialtherapeutic targets. Blood. 2017 Oct;130(16):1819–31.

165. López C, Kleinheinz K, Aukema SM, Rohde M, Bernhart SH, Hübschmann D, et al.Genomic and transcriptomic changes complement each other in the pathogenesisof sporadic Burkitt lymphoma. Nat Commun. 2019 Mar;10(1):1459.

166. Ennishi D, Jiang A, Boyle M, Collinge B, Grande BM, BenNeriah S, et al. DoubleHitGene Expression Signature Defines a Distinct Subgroup of Germinal CenterBCellLike Diffuse Large BCell Lymphoma. J Clin Oncol. 2019Jan;37(3):190–201.

167. Sha C, Barrans S, Cucco F, Bentley MA, Care MA, Cummin T, et al. Molecularhighgrade B cell lymphoma: defining a poor risk group requiring differentapproaches to therapy. J Clin Oncol. 2019 Jan;37(3):202–13.

168. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka:accurate somatic smallvariant calling from sequenced tumor–normal samplepairs. Bioinformatics. 2012 Jul;28(14):1811–7.

169. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al.Mutational heterogeneity in cancer and the search for new cancerassociatedgenes. Nature. 2013 Jul;499(7457):214–8.

109

170. Arthur SE, Jiang A, Grande BM, Alcaide M, Cojocaru R, Rushton CK, et al.Genomewide discovery of somatic regulatory variants in diffuse large Bcelllymphoma. Nat Commun. 2018 Oct;9(1):4001.

171. Kretzmer H, Bernhart SH, Wang W, Haake A, Weniger MA, Bergmann AK, et al.DNA methylome analysis in Burkitt and follicular lymphomas identifies differentiallymethylated regions linked to somatic mutation and transcriptional control. NatGenet. 2015 Nov;47(11):1316–25.

172. Furukawa T, Kuboki Y, Tanji E, Yoshida S, Hatori T, Yamamoto M, et al.Wholeexome sequencing uncovers frequent GNAS mutations in intraductalpapillary mucinous neoplasms of the pancreas. Sci Rep. 2011 Nov;1:161.

173. Lyons J, Landis CA, Harsh G, Vallar L, Grünewald K, Feichtinger H, et al. Two Gprotein oncogenes in human endocrine tumors. Science. 1990Aug;249(4969):655–9.

174. Leiserson MDM, Wu HT, Vandin F, Raphael BJ. CoMEt: a statistical approach toidentify combinations of mutually exclusive alterations in cancer. Genome Biol.2015 Aug;16:160.

175. Leiserson M, Wu HT, Vandin F, Raphael B. CoMEt: A Statistical Approach to IdentifyCombinations of Mutually Exclusive Alterations in Cancer. 2015.

176. Jiang Y, Soong TD, Wang L, Melnick AM, Elemento O. Genomewide detection ofgenes targeted by nonIg somatic hypermutation in lymphoma. PLoS One. 2012Jul;7(7):e40332.

177. Bachl J, Carlson C, GraySchopfer V, Dessing M, Olsson C. Increased transcriptionlevels induce higher mutation rates in a hypermutating cell line. J Immunol. 2001Apr;166(8):5051–7.

178. Carramusa L, Contino F, Ferro A, Minafra L, Perconti G, Giallongo A, et al. ThePVT1 oncogene is a Myc protein target that is overexpressed in transformedcells. J Cell Physiol. 2007 Nov;213(2):511–8.

179. Puente XS, Beà S, ValdésMas R, Villamor N, GutiérrezAbril J, MartınSubero JI, etal. Noncoding recurrent mutations in chronic lymphocytic leukaemia. Nature.2015 Oct;526(7574):519–24.

180. Alexandrov LB, NikZainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al.Signatures of mutational processes in human cancer. Nature. 2013Aug;500(7463):415–21.

181. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, NikZainal S, et al.Clocklike mutational processes in human somatic cells. Nat Genet. 2015Dec;47(12):1402–7.

182. Xu JL, Davis MM. Diversity in the CDR3 region of V(H) is sufficient for most antibodyspecificities. Immunity. 2000 Jul;13(1):37–45.

110

183. Yassai MB, Naumov YN, Naumova EN, Gorski J. A clonotype nomenclature for Tcell receptors. Immunogenetics. 2009 Jul;61(7):493–502.

184. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, etal. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods.2015 May;12(5):380–1.

185. Bolotin DA, Poslavsky S, Davydov AN, Frenkel FE, Fanchi L, Zolotareva OI, et al.Antigen receptor repertoire profiling from RNAseq data. Nat Biotechnol. 2017Oct;35(10):908–11.

186. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The SequenceAlignment/Map format and SAMtools. Bioinformatics. 2009Aug;25(16):2078–9.

187. Li H. Aligning sequence reads, clone sequences and assembly contigs withBWAMEM. 2013 Mar; Available from: http://arxiv.org/abs/1303.3997

188. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing ofNGS alignment formats. Bioinformatics. 2015 Jun;31(12):2032–4.

189. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast andbiasaware quantification of transcript expression. Nat Methods. 2017Apr;14(4):417–9.

190. Butterfield YS, Kreitzman M, Thiessen N, Corbett RD, Li Y, Pang J, et al. JAGuaR:junction alignments to genome for RNAseq reads. PLoS One. 2014Jul;9(7):e102398.

191. Hezaveh K, Kloetgen A, Bernhart SH, Mahapatra KD, Lenze D, Richter J, et al.Alterations of microRNA and microRNAregulated messenger RNA expression ingerminal center Bcell lymphomas determined by integrative sequencing analysis.Haematologica. 2016 Nov;101(11):1380–9.

192. Marçais G, Kingsford C. A fast, lockfree approach for efficient parallel counting ofoccurrences of kmers. Bioinformatics. 2011 Mar;27(6):764–70.

193. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. Thevariant call format and VCFtools. Bioinformatics. 2011 Aug;27(15):2156–8.

194. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The EnsemblVariant Effect Predictor. Genome Biol. 2016 Jun;17(1):122.

195. GonzalezPerez A, LopezBigas N. Functional impact bias reveals cancer drivers.Nucleic Acids Res. 2012 Nov;40(21):e169.

196. Mularoni L, Sabarinathan R, DeuPons J, GonzalezPerez A, LópezBigas N.OncodriveFML: a general framework to identify coding and noncoding regionswith cancer driver mutations. Genome Biol. 2016 Jun;17(1):128.

197. Tamborero D, GonzalezPerez A, LopezBigas N. OncodriveCLUST: exploiting thepositional clustering of somatic mutations to identify cancer genes. Bioinformatics.2013 Sep;29(18):2238–44.

111

http://arxiv.org/abs/1303.3997

198. Zhao H, Sun Z, Wang J, Huang H, Kocher JP, Wang L. CrossMap: a versatile toolfor coordinate conversion between genome assemblies. Bioinformatics. 2014Apr;30(7):1006–7.

199. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. Thehuman genome browser at UCSC. Genome Res. 2002 Jun;12(6):996–1006.

200. Jones E, Oliphant T, Peterson P. SciPy: Open source scientific tools for Python.2001.

201. Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. Efficient ”pythonic” access to FASTAfiles using pyfaidx. PeerJ PrePrints; PeerJ Inc. 2015 Apr. Report No.: e1196.

202. Alexandrov LB, NikZainal S, Wedge DC, Campbell PJ, Stratton MR. Decipheringsignatures of mutational processes operative in human cancer. Cell Rep. 2013Jan;3(1):246–59.

203. Chen X, SchulzTrieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al.Manta: rapid detection of structural variants and indels for germline and cancersequencing applications. Bioinformatics. 2016 Apr;32(8):1220–2.

204. Larson D, abelhj, Chiang C, AbhijitBadve, Eldred J, Morton D. halllab/svtools:svtools v0.3.2. 2017.

205. Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza:allelespecific copy number and mutation profiles from tumor sequencing data.Ann Oncol. 2015;26:64–70.

206. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomicfeatures. Bioinformatics. 2010 Mar;26(6):841–2.

207. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. TheUCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004Jan;32(Database issue):D493–6.

208. Soneson C, Love MI, Robinson MD. Differential analyses for RNAseq:transcriptlevel estimates improve genelevel inferences. F1000Res. 2015Dec;4:1521.

209. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersionfor RNAseq data with DESeq2. Genome Biol. 2014;15(12):550.

210. Chu A, Robertson G, Brooks D, Mungall AJ, Birol I, Coope R, et al. Largescaleprofiling of microRNAs for The Cancer Genome Atlas. Nucleic Acids Res. 2016Jan;44(1):e3.

211. Kozomara A, GriffithsJones S. miRBase: annotating high confidence microRNAsusing deep sequencing data. Nucleic Acids Res. 2014 Jan;42(Databaseissue):D68–73.

212. Kozomara A, GriffithsJones S. miRBase: integrating microRNA annotation anddeepsequencing data. Nucleic Acids Res. 2011 Jan;39(Databaseissue):D152–7.

112

213. GriffithsJones S, Saini HK, Dongen S van, Enright AJ. miRBase: tools for microRNAgenomics. Nucleic Acids Res. 2008 Jan;36(Database issue):D154–8.

214. GriffithsJones S, Grocock RJ, Dongen S van, Bateman A, Enright AJ. miRBase:microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006Jan;34(Database issue):D140–4.

215. GriffithsJones S. The microRNA Registry. Nucleic Acids Res. 2004Jan;32(Database issue):D109–11.

216. R Core Team. R: A Language and Environment for Statistical Computing. Vienna,Austria: R Foundation for Statistical Computing; 2017.

217. Davis TL. argparse: Command Line Optional and Positional Argument Parser.2018.

218. Waggott D, Haider S, C. Boutros P. bedr: Genomic Region Processing using ToolsSuch as ’BEDTools’, ’BEDOPS’ and ’Tabix’. 2017.

219. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration ofgenomic datasets with the R/Bioconductor package biomaRt. Nat Protoc.2009;4:1184–91.

220. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, et al. BioMart andBioconductor: a powerful link between biological databases and microarray dataanalysis. Bioinformatics. 2005;21:3439–40.

221. Xie Y. bookdown: Authoring Books and Technical Documents with R Markdown.2018.

222. Xie Y. bookdown: Authoring Books and Technical Documents with R Markdown.Boca Raton, Florida: Chapman; Hall/CRC; 2016.

223. Robinson D. broom: Convert Statistical Analysis Objects into Tidy Data Frames.2017.

224. Gu Z, Gu L, Eils R, Schlesner M, Brors B. circlize implements and enhances circularvisualization in R. Bioinformatics. 2014;30:2811–2.

225. Wilke CO. cowplot: Streamlined Plot Theme and Plot Annotations for ’ggplot2’.2017.

226. Dowle M, Srinivasan A. data.table: Extension of ‘data.frame‘. 2018.

227. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersionfor RNAseq data with DESeq2. Genome Biol. 2014;15:550.

228. Wickham H, Francois R, Henry L, Müller K. dplyr: A Grammar of Data Manipulation.2017.

229. Wickham H. feather: R Bindings to the Feather ’API’. 2016.

230. Gohel D. flextable: Functions for Tabular Reporting. 2018.

113

231. Wickham H. forcats: Tools for Working with Categorical Variables (Factors).2017.

232. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al.Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol.2013;9.

233. Clarke E, SherrillMix S. ggbeeswarm: Categorical Scatter (Violin Point) Plots.2017.

234. Attali D, Baker C. ggExtra: Add Marginal Histograms to ’ggplot2’, and More ’ggplot2’Enhancements. 2018.

235. Wickham H. ggplot2: Elegant Graphics for Data Analysis. SpringerVerlag New York;2009.

236. Slowikowski K. ggrepel: Repulsive Text and Label Geoms for ’ggplot2’. 2017.

237. AhlmannEltze C. ggsignif: Significance Brackets for ’ggplot2’. 2017.

238. Henry L, Wickham H, Chang W. ggstance: Horizontal ’ggplot2’ Components.2016.

239. Hahne F, Ivanek R. Statistical Genomics: Methods and Protocols. In: Mathé E, DavisS, editors. New York, NY: Springer New York; 2016. pp. 335–51.

240. Xie Y. knitr: A GeneralPurpose Package for Dynamic Report Generation in R.2018.

241. Xie Y. Dynamic Documents with R and knitr. 2nd ed. Boca Raton, Florida: Chapman;Hall/CRC; 2015.

242. Xie Y. knitr: A Comprehensive Tool for Reproducible Research in R. In: Stodden V,Leisch F, Peng RD, editors. Implementing Reproducible Computational Research.Chapman; Hall/CRC; 2014.

243. Wild F. lsa: Latent Semantic Analysis. 2015.

244. Mayakonda A, Koeffler PH. Maftools: Efficient analysis, visualization andsummarization of MAF files from largescale cohort based cancer studies. BioRxiv.2016;

245. Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum byincorporating continuous wavelet transformbased pattern matching. Vol. 22,Bioinformatics. 2006. pp. 2059–65.

246. Bengtsson H. matrixStats: Functions that Apply to Rows and Columns of Matrices(and to Vectors). 2018.

247. Kolde R. pheatmap: Pretty Heatmaps. 2015.

248. Gerds TA, Ozenne B. Publish: Format Output of Various Routines in a Suitable Wayfor Reports and Publication. 2018.

114

249. Henry L, Wickham H. purrr: Functional Programming Tools. 2018.

250. Neuwirth E. RColorBrewer: ColorBrewer Palettes. 2014.

251. Wickham H, Hester J, Francois R. readr: Read Rectangular Text Data. 2017.

252. Wickham H, Bryan J. readxl: Read Excel Files. 2017.

253. Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, SalibianBarrera M, etal. robustbase: Basic Robust Statistics. 2016.

254. Todorov V, Filzmoser P. An ObjectOriented Framework for Robust MultivariateAnalysis. J Stat Softw. 2009;32(3):1–47.

255. Wickham H. tidyverse: Easily Install and Load ’Tidyverse’ Packages. 2017.

256. Soneson C, Love MI, Robinson MD. Differential analyses for RNAseq:transcriptlevel estimates improve genelevel inferences. F1000Res. 2015;4.

257. Garnier S. viridis: Default Color Maps from ’matplotlib’. 2018.

258. Lung ML, Cheung AKL, Dai W, Leong MML, Tsao GSW. EpsteinBarr virus infectionsuppresses the DNA repair mechanisms in nasopharyngeal epithelial cells viareduction of the H3K4me3 mark. New Orleans, LA: 107th Annual Meeting of theAmerican Association for Cancer Research; American Association for CancerResearch; 2016.

259. Cho SW, Xu J, Sun R, Mumbach MR, Carter AC, Chen YG, et al. Promoter oflncRNA Gene PVT1 Is a TumorSuppressor DNA Boundary Element. Cell. 2018May;173(6):1398–1412.e22.

260. Li M, Chen D, Shiloh A, Luo J, Nikolaev AY, Qin J, et al. Deubiquitination of p53 byHAUSP is an important pathway for p53 stabilization. Nature. 2002Apr;416(6881):648–53.

261. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al.HighResolution CRISPR Screens Reveal Fitness Genes and GenotypeSpecificCancer Liabilities. Cell. 2015 Dec;163(6):1515–26.

262. Holowaty MN, Frappier L. HAUSP/USP7 as an EpsteinBarr virus target. BiochemSoc Trans. 2004 Nov;32(Pt 5):731–2.

263. Lindner HA. Deubiquitination in virus infection. Virology. 2007Jun;362(2):245–56.

264. Forte E, Luftig MA. MDM2dependent inhibition of p53 is required for EpsteinBarrvirus Bcell growth transformation and infectedcell survival. J Virol. 2009Mar;83(6):2491–9.

265. Renouf B, Hollville E, Pujals A, Tétaud C, Garibal J, Wiels J. Activation of p53 byMDM2 antagonists has differential apoptotic effects on EpsteinBarr virus(EBV)positive and EBVnegative Burkitt’s lymphoma cells. Leukemia. 2009Sep;23(9):1557–63.

115

266. Morin RD, MendezLago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, et al.Frequent mutation of histonemodifying genes in nonHodgkin lymphoma. Nature.2011 Jul;476(7360):298–303.

267. Nascimento EM, Cox CL, MacArthur S, Hussain S, Trotter M, Blanco S, et al. Theopposing transcriptional functions of Sin3a and cMyc are required to maintaintissue homeostasis. Nat Cell Biol. 2011 Nov;13(12):1395–405.

268. Nishiyama M, Skoultchi AI, Nakayama KI. Histone H1 recruitment by CHD8 isessential for suppression of the Wntβcatenin signaling pathway. Mol Cell Biol.2012 Jan;32(2):501–12.

269. Wilson BG, Roberts CWM. SWI/SNF nucleosome remodellers and cancer. Nat RevCancer. 2011 Jun;11(7):481–92.

270. Lunning MA, Green MR. Mutation of chromatin modifiers; an emerging hallmark ofgerminal center Bcell lymphomas. Blood Cancer J. 2015 Oct;5:e361.

271. Kadoch C, Crabtree GR. Mammalian SWI/SNF chromatin remodeling complexesand cancer: Mechanistic insights gained from human genomics. Sci Adv. 2015Jun;1(5):e1500447.

272. Nagl NG Jr, Wang X, Patsialou A, Van Scoy M, Moran E. Distinct mammalianSWI/SNF chromatin remodeling complexes with opposing roles in cellcyclecontrol. EMBO J. 2007 Feb;26(3):752–63.

273. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition ofnative chromatin for fast and sensitive epigenomic profiling of open chromatin,DNAbinding proteins and nucleosome position. Nat Methods. 2013Dec;10(12):1213–8.

274. Fujiwara S, Baek S, Varticovski L, Kim S, Hager GL. High Quality ATACSeq DataRecovered from Cryopreserved Breast Cell Lines and Tissue. Sci Rep. 2019Jan;9(1):516.

275. Farmer H, McCabe N, Lord CJ, Tutt ANJ, Johnson DA, Richardson TB, et al.Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy.Nature. 2005 Apr;434(7035):917–21.

276. Hoffman GR, Rahal R, Buxton F, Xiang K, McAllister G, Frias E, et al. Functionalepigenetics approach identifies BRM/SMARCA2 as a critical synthetic lethal targetin BRG1deficient cancers. Proc Natl Acad Sci U S A. 2014Feb;111(8):3128–33.

277. Helming KC, Wang X, Wilson BG, Vazquez F, Haswell JR, Manchester HE, et al.ARID1B is a specific vulnerability in ARID1Amutant cancers. Nat Med. 2014Mar;20(3):251–4.

278. Santen GWE, Aten E, Sun Y, Almomani R, Gilissen C, Nielsen M, et al. Mutations inSWI/SNF chromatin remodeling complex gene ARID1B cause CoffinSirissyndrome. Nat Genet. 2012 Mar;44(4):379–80.

116

279. Deciphering Developmental Disorders Study. Largescale discovery of novel geneticcauses of developmental disorders. Nature. 2015 Mar;519(7542):223–8.

280. Tishkoff SA, Williams SM. Genetic analysis of African populations: human evolutionand complex disease. Nat Rev Genet. 2002 Aug;3(8):611–21.

281. Bhat NM, Bieber MM, Chapman CJ, Stevenson FK, Teng NN. Human antilipid Amonoclonal antibodies bind to human B cells and the i antigen on cord red bloodcells. J Immunol. 1993 Nov;151(9):5011–21.

282. Spellerberg MB, Chapman CJ, Mockridge CI, Isenberg DA, Stevenson FK. Dualrecognition of lipid A and DNA by human antibodies encoded by the V H421gene: A possible link between infection and lupus. Hum Antibodies.1995;6(2):52–6.

283. Baptista MJ, Calpe E, Fernandez E, Colomo L, CardesaSalzmann TM, AbrisquetaP, et al. Analysis of the IGHV region in Burkitt’s lymphomas supports a germinalcenter origin and a role for superantigens in lymphomagenesis. Leuk Res. 2014Apr;38(4):509–15.

284. Amato T, Abate F, Piccaluga P, Iacono M, Fallerini C, Renieri A, et al. ClonalityAnalysis of Immunoglobulin Gene Rearrangement by NextGenerationSequencing in Endemic Burkitt Lymphoma Suggests Antigen Drive Activation ofBCR as Opposed to Sporadic Burkitt Lymphoma. Am J Clin Pathol. 2016Jan;145(1):116–27.

285. Lombardo KA, Coffey DG, Morales AJ, Carlson CS, Towlerton AMH, Gerdts SE, etal. Highthroughput sequencing of the Bcell receptor in African Burkitt lymphomareveals clues to pathogenesis. Blood Adv. 2017 Mar;1(9):535–44.

286. Martorelli D, Guidoboni M, De Re V, Muraro E, Turrini R, Merlo A, et al. IGKV3proteins as candidate ”offtheshelf” vaccines for kappalight chainrestricted BcellnonHodgkin lymphomas. Clin Cancer Res. 2012 Aug;18(15):4080–91.

287. Miller G. Immortalization of human lymphocytes by EpsteinBarr virus. Yale J BiolMed. 1982 May;55(34):305–10.

288. Bornkamm GW. EpsteinBarr virus and the pathogenesis of Burkitt’s lymphoma:more questions than answers. Int J Cancer. 2009 Apr;124(8):1745–55.

289. Okazaki IM, Hiai H, Kakazu N, Yamada S, Muramatsu M, Kinoshita K, et al.Constitutive expression of AID leads to tumorigenesis. J Exp Med. 2003May;197(9):1173–81.

290. Ramiro AR, Jankovic M, Eisenreich T, Difilippantonio S, ChenKiang S, MuramatsuM, et al. AID is required for cmyc/IgH chromosome translocations in vivo. Cell.2004 Aug;118(4):431–8.

291. Unniraman S, Zhou S, Schatz DG. Identification of an AIDindependent pathway forchromosomal translocations between the Igh switch region and Myc. NatImmunol. 2004 Nov;5(11):1117–23.

117

292. Pasqualucci L, Bhagat G, Jankovic M, Compagno M, Smith P, Muramatsu M, et al.AID is required for germinal centerderived lymphomagenesis. Nat Genet. 2008Jan;40(1):108–12.

293. Takizawa M, Tolarová H, Li Z, Dubois W, Lim S, Callen E, et al. AID expressionlevels determine the extent of cMyc oncogenic translocations and the incidence ofB cell tumor development. J Exp Med. 2008 Sep;205(9):1949–57.

294. Robbiani DF, Deroubaix S, Feldhahn N, Oliveira TY, Callen E, Wang Q, et al.Plasmodium Infection Promotes Genomic Instability and AIDDependent B CellLymphoma. Cell. 2015 Aug;162(4):727–37.

295. Riley KJ, Rabinowitz GS, Yario TA, Luna JM, Darnell RB, Steitz JA. EBV and humanmicroRNAs cotarget oncogenic and apoptotic viral and human genes duringlatency. EMBO J. 2012 May;31(9):2207–21.

296. Lin X, Tsai MH, Shumilov A, Poirey R, Bannert H, Middeldorp JM, et al. TheEpsteinBarr Virus BART miRNA Cluster of the M81 Strain Modulates MultipleFunctions in Primary B Cells. PLoS Pathog. 2015 Dec;11(12):e1005344.

297. Kang D, Skalsky RL, Cullen BR. EBV BART MicroRNAs Target MultipleProapoptotic Cellular Genes to Promote Epithelial Cell Survival. PLoS Pathog.2015 Jun;11(6):e1004979.

298. Kim H, Choi H, Lee SK. EpsteinBarr Virus MicroRNA miRBART205p SuppressesLytic Induction by Inhibiting BADMediated caspase3Dependent Apoptosis. JVirol. 2016 Feb;90(3):1359–68.

299. Harold C, Cox D, Riley KJ. EpsteinBarr viral microRNAs target caspase 3. Virol J.2016 Aug;13:145.

300. Reisman D, Yates J, Sugden B. A putative origin of replication of plasmids derivedfrom EpsteinBarr virus is composed of two cisacting components. Mol Cell Biol.1985 Aug;5(8):1822–32.

301. Sugden B, Marsh K, Yates J. A vector that replicates as a plasmid and can beefficiently selected in Blymphoblasts transformed by EpsteinBarr virus. Mol CellBiol. 1985 Feb;5(2):410–3.

302. Ambinder RF. Gammaherpesviruses and ”HitandRun” oncogenesis. Am J Pathol.2000 Jan;156(1):1–3.

303. Trivedi P, Zhang QJ, Chen F, Minarovits J, Ekman M, Biberfeld P, et al. Parallelexistence of EpsteinBarr virus (EBV) positive and negative cells in a sporadiccase of Burkitt lymphoma. Oncogene. 1995 Aug;11(3):505–10.

304. Snijder J, Ortego MS, Weidle C, Stuart AB, Gray MD, McElrath MJ, et al. AnAntibody Targeting the Fusion Machinery Neutralizes DualTropic Infection andDefines a Site of Vulnerability on EpsteinBarr Virus. Immunity. 2018Apr;48(4):799–811.e9.

118

305. Messick TE, Smith GR, Soldan SS, McDonnell ME, Deakyne JS, Malecka KA, et al.Structurebased design of smallmolecule inhibitors of EBNA1 DNA binding blocksEpsteinBarr virus latent infection and tumor growth. Sci Transl Med. 2019Mar;11(482).

306. Lee J, Kosowicz JG, Hayward SD, Desai P, Stone J, Lee JM, et al. PharmacologicActivation of Lytic EpsteinBarr Virus Gene Expression Without Virion Production.J Virol. 2019 Jul;

307. Razzouk BI, Srinivas S, Sample CE, Singh V, Sixbey JW. EpsteinBarr Virus DNArecombination and loss in sporadic Burkitt’s lymphoma. J Infect Dis. 1996Mar;173(3):529–35.

308. Ambrosio MR, Navari M, Di Lisio L, Leon EA, Onnis A, Gazaneo S, et al. TheEpstein Barrencoded BART63p microRNA affects regulation of cell growth andimmuno response in Burkitt lymphoma. Infect Agent Cancer. 2014 Apr;9:12.

309. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: theNCBI database of genetic variation. Nucleic Acids Res. 2001Jan;29(1):308–11.

310. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysisof proteincoding genetic variation in 60,706 humans. Nature. 2016Aug;536(7616):285–91.

119

Appendix A

Supplemental Data FileDescription:

Supplemental Table 1. Patient metadata. Clinical and molecular characteristics of thediscovery and validation cases. ICGC metadata are not republished here.

Supplemental Table 2. Simple somatic mutations in the discovery cohort. The mutationsare restricted to exonic and splice regions. Unless a mutation affected a BLassociatedgene and was nonsynonymous, we excluded all mutations with a minor allele fractiongreater than 10−4 according to dbSNP or ExAC.309,310 With the exception of the first twocolumns, this table follows The Cancer Genome Atlas (TCGA) Mutation AnnotationFormat (MAF).

Supplemental Table 3. Simple somatic mutations in the validation cohort. This tablefollows the same criteria as Supplemental Table 2.

Supplemental Table 4. Somatic copy number variations in the discovery cohort. With theexception of the first two columns, this table follows the segments output format bySequenza.205

Supplemental Table 5. Somatic structural variations in the discovery cohort. With theexception of the first two columns, this table follows the BEDPE output format by thesvtools vcftobedpe tool, which converted Manta VCF files.203,204

Supplemental Table 6. Noncoding mutation peaks.

Supplemental Table 7. Significantly mutated genes. This table shows the methods thatidentified each gene as significantly mutated (1) or not (0).

Supplemental Table 8. Mutation status for BLassociated genes and pathways. Thistable considers all mutations types displayed in Figure 2.4 (minus the ICGC cases).

Supplemental Table 9. Fisher’s exact tests on mutation prevalence. This table containsthe underlying counts of mutated and unmutated cases that were used in comparing themutation prevalence between disease subtypes (i.e. tumor EBV status, clinical variantstatus, and EBV genome type).

Filename:

GrandeBruno_Supplemental_Tables.xlsx

120

Appendix B

Mutation (Lollipop) PlotsThis appendix contains mutation plots (also known as lollipop plots) for everyBLassociated gene (BLG) that beared somatic nonsynonymous SSMs in the discoverycohort. The following plots were generated using the ProteinPaint tool by St. JudeChildren’s Research Hospital. Mutations detected in BL (N = 106 cases) and DLBCL (N =153 cases) genomes are shown above and below the gene model, respectively.

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

Pro

tein

leng

th20

040

060

080

010

0012

0014

0016

00

200

400

600

800

1000

1200

1400

1600

Load

ing

...SM

AR

CA

4

Q19

4*X

531_

splic

e

L773PL783PT814ME821KD881GT910ME920K

2R

973W

2R

973Q

4

P974SS1155IG1162VR1189Q

3R1192CE1212delG1232S

R12

43W

Load

ing

...B

urki

tt ly

mph

oma

23 m

utat

ions

1 di

seas

e

R70

4WR

973Q

L109

2HN

1223

HN

1223

T

2Lo

adin

g ...

Diff

use

larg

e B

-cel

l lym

phom

a5

mut

atio

ns1

dise

ase

Fork

head

_NFo

rkhe

ad N

-term

inal

regi

on

QLQ

QLQ

HS

Ado

mai

n in

hel

icas

es a

nd a

ssoc

iate

d w

ith S

AN

T do

mai

ns

BR

Kdo

mai

n in

tran

scrip

tion

and

CH

RO

MO

dom

ain

helic

ases

SN

F2_N

SN

F2 fa

mily

N-te

rmin

al d

omai

n

DE

XD

cD

EA

D-li

ke h

elic

ases

sup

erfa

mily

. A d

iver

se fa

mily

of p

rote

ins

invo

lved

...

othe

rAT

P bi

ndin

g si

te [c

hem

ical

bin

ding

]

othe

rpu

tativ

e M

g++

bind

ing

site

[ion

bin

ding

]

Hel

icas

e_C

Hel

icas

e co

nser

ved

C-te

rmin

al d

omai

n

othe

rnu

cleo

tide

bind

ing

regi

on [c

hem

ical

bin

ding

]

othe

rAT

P-b

indi

ng s

ite [c

hem

ical

bin

ding

]

SnA

CS

nf2-

ATP

coup

ling,

chr

omat

in re

mod

ellin

g co

mpl

ex

Bro

mo_

SN

F2L2

Bro

mod

omai

n, S

NF2

L2-li

ke s

ubfa

mily

, spe

cific

to a

nim

als.

SN

F2L2

(SN

F2- .

..

othe

rac

etyl

lysi

ne b

indi

ng s

ite

MIS

SE

NS

EN

ON

SE

NS

EP

RO

TEIN

DE

LS

PLI

CE

Som

atic

143

144

145

146

147